Understanding an author by Topic Modelling

Potential supervisors: 

Topic Modelling (TM) is associated with Big Data and regularly used to "mine" huge corpora for interesting semantic patterns. Not so much attention has been given to the possibilities of using TM on what can be called "Medium size corpora” as a complement to traditional humanistic methods. The purpose of this project is to develop a ”work environment” for use of TM on such medium sized corpora.

The project consists of the following steps, each presenting their own challenges:
1. To implement an efficient and flexible method for TM. This step involves:

  • a study of different possible algorithm
  • choice of programming language
  • implementation
  • system design that allows for flexible use of the algorithm

2. The create an interface to the implementation that is user-friendly-enough to be used by a reasonably computer-proficient humanist. This step involves understanding the humanistic research process, and exploring how Topic Modelling can function as a complement to traditional humanist methods. To make TM useful, it is probably necessary to:

  •  design tools for suitable pre-processing of text
  •  make it easy to run TM with different parameter values, possibly in “batch mode”
  • design tools for exploration of results, in a way that makes the results useful for answering humanistic research questions.

3. To test the work environment on a few medium sized corpora consisting of the collected works of some well-known philosophers. These corpora will be provided by the supervisor. This last step will be conducted in collaboration with a humanistic scholar. Possible questions that are to be studied in this last step are:

  • What is this author writing about?
  • Has her focus changed at some moments? When? From what to what? Does the computer assisted identification of “shift of focus” correspond to how humanistic scholars think the author has gone through different phases?
  •  Where and when is topic X discussed in the works of the author?
  • How does work A relate to works B, C, D?
  • When was interesting topics X, Y, Z introduced for the first time?
  • Has the author changed her way of talking about topic X?

This project requires:

  • Capacity to understand the mathematical/statistical ideas of Topic Modelling
  • Skills in system design and programming
  • An interest in humanistic scholarship

The project will be supervised by Civ. Ing., PhD. Sverker Lundin (sverker.lundin@gmail.com) in collaboration with a researcher from the department of Computer Science and Engineering.


Date range: 
November, 2018 to November, 2023