Robust layout rules for parsing indentation-sensitive programming languages

Potential supervisors: 
Description: 

Modern programming languages like Haskell and Python use indentation to structure code. Haskell's layout rules give quite a bit of freedom to the programmer to arrange her code, at the expense that programs are not stable under some simple operations such as renaming an identifier. Consider the Haskell snippet:

  f arg = do thing1
             thing2

Renaming arg to argument will produce the text

  f argument = do thing1
             thing2

which is no longer valid Haskell code.

Haskell has so-called layout keywords such as do which start a new block. The column of the token after the layout keyword is the reference for grouping declarations into the block: the block closes if the first token on a line is indented less than the reference token. This makes the first snippet parse as

  f arg = do { thing1
             ; thing2 }

whereas the second is parsed as

  f argument = do { thing1 }
             thing2

The following rules make layout robust with regard to renaming identifiers (such as arg to argument):

  1. The layout keyword must be the last token on the line, or

  2. the layout keyword that starts a new block spanning several lines can only be preceded by other keywords or punctuation characters on the same line.

The task of this Master's thesis is to design a framework for layout-sensitive grammars and a parser generator. Traditionally, layout is handled on the level of lexical analysis, but this thesis should investigate a handling on the level of parsing.

Prerequisites:

  • Strong interest in programming language design and implementation
  • Excellent knowledge of Haskell and excellent Haskell implementation skills (including monadic programming)
  • DAT151/DIT231 Programming Language Technology

References:

Suggested supervisor: Andreas Abel (http://www.cse.chalmers.se/~abela)

Date range: 
October, 2017 to October, 2022