WebAssembly – native code IR for the web

Right now various developers are getting excited about WebAssembly, which is a new IR (intermediate representation) designed for supporting languages like C and C++ (and Loci) with native-like performance. Most notably the intention is to remove the parsing overhead of asm.js (a subset of Javascript designed to be a target for native compilation), open up new low level functionality and stop using Javascript as the ‘assembly of the web’.

There’s been a long-held desire for something like this and it finally seems to be happening, given the broad support from browser vendors.

From my point of view the most interesting part of their documentation is the comparison to LLVM IR, in which they say:

The LLVM compiler infrastructure has a lot to recommend it: it has an existing intermediate representation (LLVM IR) and binary encoding format (bitcode). It has code generation backends targeting many architectures is actively developed and maintained by a large community. […] However the goals and requirements that LLVM was designed to meet are subtly mismatched with those of WebAssembly.

WebAssembly has several requirements and goals for its IR and binary encoding:

  • Portability: The IR must be the same for every machine architecture.
  • Stability: The IR and binary encoding must not change over time (or change only in ways that can be kept backward-compatible).

[…]

LLVM IR is meant to make compiler optimizations easy to implement, and to represent the constructs and semantics required by C, C++, and other languages on a large variety of operating systems and architectures. This means that by default the IR is not portable (the same program has different representations for different architectures) or stable (it changes over time as optimization and language requirements change). […] LLVM’s binary format (bitcode) was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility […]

This is a very insightful comparison of WebAssembly versus LLVM IR. A lot of people have wanted LLVM IR to be a portable and stable representation, but that directly contradicts its goals to be excellent for performing optimisations and for splitting frontends/backends. There was a mailing list discussion in 2011 about what LLVM IR is (“LLVM IR is a compiler IR”) and the use cases it does and doesn’t support.

The new model seems to be using WebAssembly to communicate code across machines/platforms/architectures but on the machines themselves a translation to/from LLVM IR seems likely (of course platforms can use whatever internal representation they prefer). Logically then, work is beginning on a new WebAssembly back-end for LLVM.

Fortunately, now that we have WebAssembly, it looks like LLVM IR can be left to focus on its core objectives. So while WebAssembly is standardised as a stable, portable and backwards compatible representation for communicating programs between machines, LLVM IR can be continuous modified/improved to enable state-of-the-art optimisation and further develop the capabilities of front-ends and back-ends.

The Internet discovers Loci

So it seems that in the last couple of days postings have appeared about Loci on Y Combinator and Reddit (first and second).

It’s really awesome to see that developers are interested in Loci and various questions have been asked about the language, which will help me to improve the documentation to better explain how the language works and the reasons behind the design decisions. The 1.2 release of Loci will be appearing in the next couple of weeks; it looks likely that there’ll be a desperate rush next week to get some great features into the compiler, and this release also includes some optimisation of Semantic Analysis (whereas 1.1 involved an enormous performance improvement to Code Generation).

If you’re reading this and you have a question, a suggestion and/or constructive criticism of the language, I recommend raising an issue on the GitHub repository and I’ll do my best to respond promptly; at some point I’ll probably set up a better system for discussion/questions (I’m thinking about a forum on the Loci website).

Locic 1.1 released!

So the second version of the Loci Compiler Tools is now available (see Loci Compiler), with the main new features being:

  • Switching from C++-like template expansion to use Template Generators (to allow templated APIs across module boundaries)
  • Module imports and exports
  • scope(success), scope(failure) and scope(exit)
  • noexcept
  • Type-templated functions/methods
  • Type aliases
  • assert and unreachable statements
  • Implicit and explicit casts between types using templated methods
  • Standard library memory allocators and smart pointers
  • Standard library containers
  • Standard library strings
  • Vastly improved performance, particularly for Code Generation.
  • A larger set of examples and updates to examples to demonstrate newly implemented features.
  • Significantly improved documentation in reStructuredText using Sphinx, which can generate multiple output formats including HTML and PDF.
  • A much larger set of integrated tests to check both accept and reject cases, as well as testing the standard library.

The release was delayed slightly from the mid-August estimate in order to add support for LLVM 3.5 (so that LLVM 3.3, 3.4 and 3.5 are all supported as backends for Locic), which was initially scheduled for release on the 25th August 2014.

LLVM 3.5’s release has since been re-scheduled for the start of September, so the Locic 1.1 release was modified and tested for compatibility with LLVM 3.5 RC3 (pulled from SVN), which is expected to be near-identical to the actual LLVM 3.5 release.