Macros in a typed language with syntax: state of the art

Eugene Burmako

Today I've read two fundamental papers on Nemerle macros: "Syntax-Extending and Type-Reflecting Macros in an Object-Oriented Language" (referred to as "the thesis") and "Язык Немерле, часть 5" (referred to as "the article", sorry, folks, it's in Russian). Also I had a fruitful conversation with VladD2, the principal developer of Nemerle (kudos, Vlad, for your desire to help and very detailed explanations). Some preliminary findings:

1) Quasiquotation is very elegant. It only takes three intuitive language constructs (quotation, antiquotation, splicing, see the section 8.3 of the thesis for the summary and the section "Квази-цитирование" of the article for the details) to significantly simplify parsing and generation of abstract syntax trees. I was especially impressed by the usage of antiquotes and splices together with pattern-matching. Personally for me, the beauty of quasiquotations is especially striking after the troubles with composability of LINQ expression trees.

2) However, there's a painfully familiar problem with LINQ that still persists. How do we lift (i.e. transform to an AST) code from the outer world? Staging adepts take it easy - they lift all functions and values that have staged types (explicitly in tagful style or implicitly in tagless style, see the previous link to find out what this means) and ignore everything else. This comes at a price of syntactic overhead of extra typing (pun intended) + potential performance degradation due to possibly tagful AST representation (take a look here for more details), though the interop with the outer world is kinda ok. With macros we have to bite the bullet - in most cases one needs explicit antiquotations, and this imposes overhead at every callsite that references external code that one wants to lift. There's also a non-obvious dilemma of whether we allow macros to see values from their definition scopes (see the section 2.3.3 of the thesis).

3) Also it's completely unclear how to implement orthogonal codegeneration via macros in a statically typed OO-language. Firstly, macros most likely need to be moved to a separate unit of compilation. Secondly, when writing a macro we have a choice: to bear with untyped ASTs and have a possibility to generate new types, or to enjoy typed ASTs at the cost of locked typespace (don't even think about what happens if we add type inference to the mix). Thirdly, object-orientedness pours even more fuel to the fire since we get a completely heterogeneous concept of classes that do not compose with functions + there's inheritance. This all brings us to a classic nightmare familiar from stateful programming, when operations are randomly prohibited or allowed depending on the phase of the Moon. After all, it's not surprising that F# designers did not implement code generating macros and only went for read-only quasiquotations.

4) Regarding error messages and debugging for the code generated during macro expansion - it's not as bad as it seems. Along with generating executable code we can also generate surrogate source code for macro expansions (we do have an AST after all) which can be used to produce sane debug info. Of course, the original line numbers will be corrupted (yep, we can restore them with lenses, but stepping in the debugger will still be crazy) and there's a glitch with stack traces, but that's at least something, huh? Staging guys have much harder time here (e.g. see the slide #15 of the LMS presentation).

5) Moreover, it's desirable for macros in a language with syntax to have some control over the parser (e.g. see the section 6 of the thesis and also the "Genuine, Full-power, Hygienic Macro System for a Language with Syntax" tech report). This enables the programmer to implement such useful thingies as custom modifiers for declarations (think, "async method" or "method ... precondition ...") or even completely new syntactic constructs. Frankly speaking, this is useful, but not mandatory: the prior can be implemented with annotations that trigger macros for their annotees, while the latter is kinda already in Scala (e.g. see the canonic implementation of the "while" loop).

All in all, at the moment the concept of macros in mainstream programming languages looks to me like a weird mix of butterflies, unicorns, spiders and worms. Template Haskell, here I come.