Machine learning for Morphological reinflection

Potential supervisors: 
Description: 

Background

Morphology is the study of words and their structure. The main unit of analysis in morphology is the morpheme, which is the smallest meaning bearing component of a word. For example, the word “untied” consists of the morphemes “un-” (indicating negation), “tie” (the root word) and “-d” (indicating past tense).

Languages with a rich morphology can be a challenge to NLP systems that rely on word frequency, due to words appearing in many different word forms. NLP tools for generating and analysing morphological forms can therefore be an important part of the NLP pipeline.

Morphological reinflection is the task of predicting how a given word is inflected in one form given a different form of that word. For example, given the source form release and target features Participle and Present, the task is to predict the target form releasing.

Project description

The SIGMORPHON interest group on Morphology published a Shared Task on morphological reinflection in 2016-2018. (https://arxiv.org/abs/1810.07125). In the Shared Task, the problem is unsupervised learning of morphological inflection. The systems are given training data, which consists of words tagged with their morphological features, and are evaluated on how well they reinflect unseen words. The aim of the project is to solve the morphological reinflection problem given the data provided in the shared tasks and evaluate and compare the results to the systems that entered the competition. You do not need to know the language ahead of time – the problem is about figuring out the inflection rules from the training data. You choose the methods depending on your experience and interests.

Possible approaches include (but are not limited to):

  • Linear optimization and Integer linear programming
  • Recurrent neural networks
  • Encoder-decoder models
  • Convolutional neural networks
  • Classification models

Potential supervisor: Ann Lillieström (annl@chalmers.se)

Date range: 
October, 2021 to October, 2023