Learning Smart Morphological Paradigms
The words of most languages change in different ways in different contexts. For example, nouns have different forms for singular and plural and can be inflected in different grammatical cases. Similarly, adjectives can agree in gender with the noun that has been modified. The way in which words change can be moreover quite complicated in some languages. This proposal is about using Machine Learning to learn how a word would change in all of its forms. The algorithm should take as input one or more of the forms and predict the rest. The training data for each language should be a large lexicon which lists all forms for about 50000 - 80000 words.
From the classifier it should be possible to synthesise a program in the functional language GF. In GF there are already manually written functions (paradigms) for the morphology of many languages. The goal is to see to what extend the creation of those functions can be automated. All the functions structurally look like decision trees, and therefore using decision tree based learning sounds like a promising direction.
For more about "smart paradigms" read the introduction of this paper. The rest of the article is interesting but not relevant to the project.