Sharing Dictionaries between Wiktionary, Wikidata, and GF

Potential supervisors: 

Wiktionary shows word inflection as HTML tables. It is available for 182 languages currently, but the coverage varies. Wikidata is a graph database that is intended to store all data used in Wikpedia and also Wiktionary. Grammatical Framework (GF) is a programming language for grammars. It has morphological functions tor over 40 languages.

The task in this Master's thesis is to define conversions between these formats, with the goal of building a comprehensive and consistent database of word inflections in the world's languages, available in all these formats and many others, and usable for NLP tasks such as text analysis and machine translation. The project is a part of the Abstract Wikipedia initiative, whose goal is to automatically generate translations of Wikipedia articles from Wikidata.

Date range: 
October, 2021 to October, 2026