Advanced domain-specific languages

upd. Since macros had made their appearance in Scala 2.10, people have been building advanced DSLs leveraging compile-time metaprogramming powers. One great example would be the work of my EPFL colleagues on a framework for deeply-embedded domain-specific languages, called YinYang.

One of the most important use cases of compile-time metaprogramming involves increasing the area of applicability of embedded domain-specific languages (eDSLs).

As it was nicely summarized in Embedded Domain-Specific Languages using Libraries and Dynamic Metaprogramming by Gilles Dubochet, sooner or later DSLs hit the limitations of even very advanced programming languages. Here's an illustrative example from Squeryl, a Scala ORM for SQL databases:

var avg: Option[Float] =
  from(grades)(g =>
    where(g.subjectId === mathId)
    compute(avg(g.scoreInPercentage))
  )

This query looks almost like a series of high-order functions applied to a collection, and, in fact, it is: everything is strongly-typed and usual stuff (e.g. closing over the mathId variable) works as expected.

However, this comes at a price. First of all, the amount of syntactic boilerplate is truly gigantic. Moreover, even Scala's type system is still not enough to serve the DSL. In order to capture the subjectId field access, Squeryl builds an auxiliary instance of type Grade, wraps it into a proxy and sifts that proxy through the query during the runtime, recording accesses to fields and building the corresponding AST.

And still, all this effort is not enough to provide seamless embedding into the host language. See that ===? This is a special surrogate operator that stands for ==, because the equality operator itself cannot be overloaded to produce desired results.

If from were a macro def, then it would get the g => ... function argument in the form of a Tree object that represents that query at the compile-time. As a consequence, one would then be able to use any syntax in the query (including non-overloadable constructs, such as the equality operator). Also, the lifting (i.e. transformation of domain-specific codes into corresponding domain ASTs) would come for free, without the need to carefully arrange the web of boilerplate to capture the intent of the user.

This brings sweet memories of coding up Conflux, a framework for programming CUDA in pure C#. Since the computational architecture of massively parallel devices is incompatible with von Neumann architecture, I didn't even have a luxury of symbolic evaluation that is employed in Squeryl. Instead I went for writing a decompiler for .NET bytecode. If I were programming in a language with macros, that effort would be completely unnecessary. For example, take a look how a similar goal is achieved by the NUDA library written in Nemerle.