WebAssembly – native code IR for the web

Right now various developers are getting excited about WebAssembly, which is a new IR (intermediate representation) designed for supporting languages like C and C++ (and Loci) with native-like performance. Most notably the intention is to remove the parsing overhead of asm.js (a subset of Javascript designed to be a target for native compilation), open up new low level functionality and stop using Javascript as the ‘assembly of the web’.

There’s been a long-held desire for something like this and it finally seems to be happening, given the broad support from browser vendors.

From my point of view the most interesting part of their documentation is the comparison to LLVM IR, in which they say:

The LLVM compiler infrastructure has a lot to recommend it: it has an existing intermediate representation (LLVM IR) and binary encoding format (bitcode). It has code generation backends targeting many architectures is actively developed and maintained by a large community. […] However the goals and requirements that LLVM was designed to meet are subtly mismatched with those of WebAssembly.

WebAssembly has several requirements and goals for its IR and binary encoding:

  • Portability: The IR must be the same for every machine architecture.
  • Stability: The IR and binary encoding must not change over time (or change only in ways that can be kept backward-compatible).

[…]

LLVM IR is meant to make compiler optimizations easy to implement, and to represent the constructs and semantics required by C, C++, and other languages on a large variety of operating systems and architectures. This means that by default the IR is not portable (the same program has different representations for different architectures) or stable (it changes over time as optimization and language requirements change). […] LLVM’s binary format (bitcode) was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility […]

This is a very insightful comparison of WebAssembly versus LLVM IR. A lot of people have wanted LLVM IR to be a portable and stable representation, but that directly contradicts its goals to be excellent for performing optimisations and for splitting frontends/backends. There was a mailing list discussion in 2011 about what LLVM IR is (“LLVM IR is a compiler IR”) and the use cases it does and doesn’t support.

The new model seems to be using WebAssembly to communicate code across machines/platforms/architectures but on the machines themselves a translation to/from LLVM IR seems likely (of course platforms can use whatever internal representation they prefer). Logically then, work is beginning on a new WebAssembly back-end for LLVM.

Fortunately, now that we have WebAssembly, it looks like LLVM IR can be left to focus on its core objectives. So while WebAssembly is standardised as a stable, portable and backwards compatible representation for communicating programs between machines, LLVM IR can be continuous modified/improved to enable state-of-the-art optimisation and further develop the capabilities of front-ends and back-ends.

Leave a Reply

Your email address will not be published.