Automatization and decision making within a machine learning framework for drug discovery

Potential supervisors: 

Two new paradigms have emerged in AstraZeneca and the pharmaceutical industry as a whole to increase the productivity in drug design: The AI-augmented molecular design paradigm, and speeding up the Design-Make-Test-Analyze (DMTA) cycle through automation. The AI-augmented molecular design utilizes generative models for sampling the chemical space [1], which is estimated to consist of up to  drug-like molecules [2]. The automation of the DMTA cycle takes advantage of automated laboratories and machine learning to make, test and analyze molecules without human intervention [3].

This progress can enable a new drug design paradigm: autonomous drug design. In such paradigm, AI-augmented molecular design is used to suggest molecules by using limited prior information. It is only possible to make a few molecules since each experiment is costly and time-consuming. Thus, the autonomous drug design system needs to decide, with minimal human supervision, on which generated molecules to make. Subsequently, the automated laboratory tries to make the selected molecules and, if successful, test their properties. Using the newly acquired information, the system updates its knowledge to better steer the generation of molecules towards a desired area of the chemical space.

To explore different decision-making strategies at AstraZeneca for different settings in the DMTA cycle, a framework for simulating the different steps in the DMTA cycle is developed in collaboration with Chalmers University. In such framework the test step provides the ground truth and it is therefore essential that this works as expected in all scenarios.  Two possible directions, which will provide further improvement of the ongoing/previous work, can be considered for this project:

  • Stress testing the test step of the DMTA framework, e.g., by creating artificial contradicting data artificial and/or added different noise functions using e.g. gaussian processes for model generations to model different nonlinear relationships
  • Exploring different strategies to determine which molecules to make next



This thesis will be performed at the Molecular AI group of AstraZeneca in Mölndal, and the students will be provided the necessary equipment and a desk in the group. The Molecular AI group is a world-leading group that helps chemists make better decisions faster by harnessing the power of AI and machine learning.

Number of students: Preferably 2

Required background for student:

  • Knowledge/interest in statistics, machine learning
  • Good Python programming skills
  • Being motivated, creative, focused and has problem-solving skills

Supervisors: Morteza Haghir Chehreghani (academic supervisor), Hampus Gummesson Svensson (industrial supervisor) /


1.  Blaschke, T., et al., REINVENT 2.0: an AI tool for de novo drug design. 2020. 60(12): p. 5918-5922.

2.  Reymond, J.-L.J.A.o.C.R., The chemical space project. 2015. 48(3): p. 722-730.

3.  Christensen, M., et al., Data-science driven autonomous process optimization. 2021. 4(1): p. 1-12.


Date range: 
September, 2021 to September, 2023