Research projects

Eugene Burmako
05.11.2012

It's this time of the year at EPFL. We sit down together and come up with the research projects we'd like to propose to our students and to the entire Scala community. This fall our list is rich on macro-related ideas, which are listed below. For the full list of the projects offered by our lab refer to http://lamp.epfl.ch/teaching/projects.

Optimizing collections with macros

High-order functions such as map, filter and foreach provide a very convenient way to work with collections, but this convenience often comes at a cost of performance. Typical use cases for collection processing involve anonymous functions, which usually compile down to JVM anonymous classes introducing noticeable runtime overhead.

With the advent of macros in Scala 2.10, the programmer can control how exactly the compiler is compiling invocations of given methods. By declaring collections methods as macros, it becomes possible to transform their usages into plain while loops, bringing the performance back to normal.

Within this project your work will be to: 1) devise a mini-benchmark to measure the state of the art performance of collections, 2) rewrite popular collection methods as macros, 3) assess the performance improvements.

A further extension to the project baseline is collection operation folding. It is quite common to see code like: list zipWithIndex filter { case (cls, idx) => … } map (_._1) which creates a new collection at each operation. It would be great if we could fuse the operations into a single one. This wouldn’t work if the partial result is stored in a variable, but many invocations are just one-liners which perform all the operations one after the other.

Offered by Eugene Burmako.

Using reified types for specialization

Generic code increases programmer productivity as it increases code reuse. For example, the LinkedList abstraction can be used in many contexts, from storing a list of numbers to storing representations of files on the disk. Unfortunately this comes at the expense of performance, as the generic code needs a common representation for all types. The common representation is usually a pointer to heap data. But for value types, such as integers, bytes and even value classes (see SIP-15) this leads to significant overheads, as they need to be allocated as objects on the heap and then pointed to, breaking data locality and adding an extra indirection.

To overcome this performance loss, many programming languages and virtual machines perform code specialization, which creates a copy of the generic class for each value type and adapts the data representation.

Scala does not require type parameters to be known at compile time, thus allowing truly-generic code to be generated. In this case, type information is not necessary and will not available in the generated bytecode. But this prohibits programmers from using specialized code from generic code, which might be desirable.

Scala can overcome the loss of type information in generic classes by attaching types in so-called ClassTags (formerly known as Manifests). This information can be used to dispatch to the correct specialized class implementation, and can be made either as a compiler plugin or as a macro. Either implementation is fine, as long as the syntactic overhead is reasonable. For this project one should research what is the best implementation and prepare a set of benchmarks that clearly show the performance hit of dispatching based on the ClassTag.

Offered by Vlad Ureche.

Macro-based class system

One of the important techniques enabled by compile-time metaprogramming is modular language extensibility. With macros, functionality that is typically provided as a hardcoded set of language features can be introduced piece-wise, as a part of the standard library. This is applicable even for such a fundamental functionality as class system [1].

For this project you will: 1) define a baseline subset of Scala that would be used to host the class system, 2) implement a macro-based desugaring which maps object-oriented concepts onto the baseline (an example of such a desugaring can be found in [2]).

The project is especially interesting in the context of the dependent object types research [3], which aims to provide a foundation of Scala. Macros can separate the core calculus of DOT from the rest of the language, keeping the heart of the language pristine.

References:
[1] Scheme with Classes, Mixins, and Traits http://www.ccs.neu.edu/racket/pubs/asplas06-fff.pdf
[2] Types and Programming Languages http://www.cis.upenn.edu/~bcpierce/tapl/
[3] Dependent Object Types, Towards a foundation for Scala’s type system http://lampwww.epfl.ch/~amin/dot/fool.pdf

Offered by Eugene Burmako.

Macro-based continuations

Starting with version 2.10, Scala features a lightweight alternative to compiler plugins - macros, which democratize compile-time metaprogramming. Several important problems such as code reification have already been solved with macros, and we’re continuously looking into ways to simplify the compiler by implementing its parts as macros and moving them into the standard library.

Continuations in Scala [1] are implemented by a series of type-directed transformations, which involve the typechecker and span two additional compilation phases. There is a conjecture that this logic can be packed into one or several macros, which gives rise to this project.

This project involves: 1) understanding how the CPS transform works currently, 2) devising a way to express non-local code rewritings with macros, possibly extending the current macro system, 3) implementing the CPS transform with macros.

References:
[1] Implementing First-Class Polymorphic Delimited Continuations by a Type-Directed Selective CPS-Transform http://lampwww.epfl.ch/~rompf/continuations-icfp09.pdf

Offered by Eugene Burmako.

Multi-stage programming with macros

Multi-stage programming (staging) [1] is a promissing aproach for developing high-performance programs with a high-level programming model. LMS [2] (type-driven staging) complements staging with type-safety and modularaity through the use of abstract data types. This approach is great for library/DSL developers but library/DSL users often face complex type errors that are hard to understand.

This project aims to use metaprogramming with Scala macros to provide simple interface for library/DSL users while giving the full abstract data type approach to library/DSL developers. Each LMS program will be pre-processed with a macro that replaces method calls with their DSL representatives. This will allow simple type checking (without abstract types) and thus improve the interface for numerous DSLs. Additionaly the macro will analyze static methods imported into the DSL and try to convert them into the DSL representation. For example, calling `Math.pow(x, 3)` will be automatically imported into the DSL if its interface allows for all its building blocks.

References:
[1] Multi-stage programming: http://www.cs.rice.edu/~taha/MSP/
[2] Lightweight Modular Staging: A Pragmatic Approach to. Runtime Code Generation and Compiled DSLs http://infoscience.epfl.ch/record/150347/files/gpce63-rompf.pdf

Offered by Vojin Jovanovic.

Interpreter for abstract syntax trees

Macros introduced in Scala 2.10 can be viewed as compiler plugins, but unlike full-fledged compiler plugins, they are much more convenient to write and maintain. This makes metaprogramming in Scala accessible for a wide range of users, increasing the number of programming problems that can be solved in a concise and maintainable way.

Yet there are still some hurdles that make metaprogramming in Scala more complicated than it should be. The most annoying of these obstacles is the separate compilation restriction, which requires macro implementations and macro usages to be compiled separately. This restriction is reasonable, because to call a macro implementation during compilation, that implementation must be in executable form. Nevertheless it’s possible to do better.

Scala compiler represents programs (including macro implementations) as abstract syntax trees, which go through the compilation pipeline and eventually end up as executable Java bytecodes. The idea of this project is to shortcut the path from trees to executables by writing an interpreter.

Your tasks in this project will be to: 1) determine whether the entire Scala language can be faithfully interpreted, 2) implement the interpreter and hook it into the macro engine, 3) evaluate the performance of interpretation in comparison with the performance of compiled code.

Offered by Eugene Burmako.