Analysis of gender bias in Swedish texts
In this GENIE-financed project we will use state of the art natural language processing (NLP) technologies for investigating gender bias in different text genres:
In particular we will analyse texts written in Swedish at Chalmers and GU, and we are currently collecting all kinds of text data in different categories – course and programme descriptions, blog and news texts, interviews, job advertisments, meeting protocols, social media, etc.
There has been some research about unconscious gender bias in texts, mainly done for English, and in the context of job recruiting – see more about previous work in the project description above. For Swedish however, very little research have been done, so there should be plenty of different possibilities for you.
Here are some possible research questions that you could be inspired from:
- Are different occupations (e.g., engineer, nurse) or subjects (e.g., arts, sociology, engineering) described as being more associated with one gender? If so, how is that manifested ?
- Are different text styles (e.g., journalistic, academic) more associated with one gender? How is that manifested?
- Are there any good indicators of the genderedness of a text, on a lexical, syntactic, semantic, or other, level?
- Within Chalmers or GU, is there any gender bias in different types of texts (e.g., web texts, job ads, applications, student essays) or subjects (e.g., chemistry, architecture, astronomy)?
- Is there a correlation between the genderedness of program descriptions and recruitment material, and the gender balance in their corresponding study program?
- Are there any systematic differences in genderedness between Swedish and English texts that are (1) translations of each other, or (2) from the same genre?
Available training and evaluation data
We are collaborating with Språkbanken Text at the department of Swedish, and they have lots of different Swedish texts in different genres, including journalistic texts, governmental texts, student essays, novels, social media, online chats, and many more. In total Språkbanken has collected around 13 billions words, which is one of the largest non-English text collections in the world. In addition to the material from Språkbanken, we are currently collecting official and semi-official documents from Chalmers and GU.
- One or more courses in machine learning, such as Applied Machine Learning (DAT340/DIT865) or Algorithms for Machine Learning and Inference (TDA231/DIT381)
- Courses or experience in natural language processing, text classification or computational linguistics are recommended
- Courses in mathematical or statistical modeling are recommended
Useful skills and knowledge
- Text processing: to be able to clean textual data, convert between different text formats, and prepare texts for analysis
- Data science: experience in relevant frameworks for text analysis and machine learning
- Some knowledge about gender issues and gender bias