Building LLVM and Clang

Often it’s useful to build the latest LLVM and Clang from source code, rather than relying on often out-of-date packages available in repositories. This build process can be done quickly and easily with CMake, but it’s worth taking care to ensure the right options are passed.

The following commands will build LLVM and Clang to be installed in /usr/bin (and /usr/lib, etc.). Note that the ‘-j’ option tells make how many recipes may execute at once; use this to your advantage to dramatically reduce the build time on multi-core machines!

The following directory structure is assumed:

(I.e. both LLVM and Clang have been downloaded and moved to /llvm/llvm-src and /clang/clang-src respectively.)

LLVM

Clang

 

Use Unsigned Fixed Size Types

In the header stdint.h you can find a bunch of fixed size types, signed and unsigned, from 8 bits up to 64 bits. Their names are int8_t, uint8_t, int16_t, uint16_t etc. where the number clearly indicates the number of bits. These types are designed to solve the problem that the normal primitive types (char, short, int, long) don’t have a fixed size and therefore can have different sizes on different platforms.

So whenever reading from a file or from a socket (or in numerous other cases), for portability reasons, do not use the standard types and do use fixed size types. Also, take care about signed versus unsigned integer types; unsigned is generally what you’re looking for.

This whole issue was brought up when a friend of mine was reading an unsigned 32 bit value from a file (stored in big-endian byte order) – it’s actually harder than it seems. Here’s something resembling the key aspect of the code:

Unfortunately, this code is incorrect. Even worse, it will work in some cases and not in others. If the values from 0 to 127 were stored, it would work. However, a stored value of 128 produces a result of 4294967168. This value is particularly large…in fact, it comes come to the maximum size of a 32 bit unsigned integer (4294967296). There are some other ranges of values at which it works (256 works fine, for example), and others beyond them at which it does not.

The problem revolves around the use of char – not only is it not guaranteed to be 8 bits in size, but on most platforms it defaults to signed (to add insult to injury, the language standard allows char to be signed on some platforms and unsigned on others). As a result, each of the four char values in the array is being interpreted as a signed value and so the program goes completely mad when asked to convert from these char values to uint32_t values. As far as I understand it, and have seen myself, the conversion from a signed char to an unsigned int goes as follows:

  1. Convert the signed char to a signed int (i.e. extend the type to the correct size)
  2. Reinterpret the signed int as an unsigned int

Reinterpreting isn’t a problem; it doesn’t modify the value stored. In fact, for unsigned values the first stage also doesn’t modify the value. However, conversion from signed char to signed int causes a sign extension (loads of ’1’s are added at the start). If your value is interpreted as a signed positive value (i.e. unsigned values from 0 to 127), you’re okay, but if you move into the negative realm you suffer from this sign extension. Hence, your value suddenly has a lot of ’1’s and therefore is very large from an unsigned perspective.

The solution? Use unsigned fixed size types. Specifically, read an array of uint8_t values. Or, even better:

The ntohl function converts a 32 bit int from network order (always big-endian) to host order (could be either – most computers generally use little-endian unfortunately).

Village Campaign for Superfast Internet

I’ve started a website for a campaign to bring superfast internet to my local village. It’s a new (and quite different) project for me…

Feel free to contact me if you’re interested (e.g. you’re looking to start your own local campaign).

Concurrent Programming Talk Slides

I recently gave a talk on ‘Concurrent Programming’, for which the slides can be found here.

The abstract for the talk was:

In this talk I’ll discuss the fundamental problems facing the development of concurrent applications and common approaches that aim to solve them. I’ll propose a solution that is based on the best of these approaches, and consider its wider implications. Finally, I’ll examine how it should be used in real applications.

Threads vs Events

As I’ve been programming OpenP2P, I’ve iterated over a few styles of programming – a process which consumed a lot of time, but taught me a lot.

Originally, I built OpenP2P on top of boost::asio to make use of its asynchronous features, which I had supposed would make my programming life easier, as well providing greater efficiency. The style was effectively based upon the Actor model, as deployed by languages such as Erlang, for which I would have a thread managing the state for each object – although in this case the ‘thread’ was an abstraction and the system as a whole was based on a small number of threads on which methods would be cooperatively queued and executed. In other words, it was an event-based system.

The problems became obvious as the number of boost::shared_ptr’s required increased dramatically, as I tried to manage all the state of the abstract threads I had created, and I felt that I needed garbage collection to be able to safely free all the data. RAII on the stack, a very useful technique in C++, quickly became useless as state must now all exist on the heap.

Then I discovered this.

The paper argues that historically threading packages have provided poor performance, but this isn’t inherent to the threading model and is only an artifact of these implementations. It then goes on to display that more modern threading implementations can achieve excellent performance. Such conclusions are not isolated, as demonstrated by this presentation which makes a similar argument, focusing particularly on Java.IO vs Java.NIO.

The point is that threading packages have many advantages, making it easier to understand the code and to manage state on the stack (utilising RAII), and therefore that it isn’t wise to switch systems to the event-based architecture in pursuit of a temporary performance increase. In fact, Linux’s NPTL shows that that time is over.

So, after comparing the two approaches, I decided to change OpenP2P to a synchronous thread-based architecture, where operations such as socket.receive() would no longer take a callback, which would be called possibly at some time in the future from some thread, but would simply take the necessary parameters and block, returning to the current thread when it completes, returning a value indicating error (or throwing an exception). If you want to do other things simultaneously -> spawn a new thread. The only thing I ended up adding was a facility for easily canceling any operations, for example when a signal is raised from another thread or when a timer completes.

An event-based architecture is obviously useful for some purposes, such as a GUI, although I can see potential advantages in an approach of using a separate thread for each event, to prevent the GUI from blocking if one of the event handling routines decides to do something like open a file, which should generally be fast, but in this case happens to be on an external hard drive that takes 10 seconds or so to wake up. However, creating threads clearly has to be efficient (or otherwise we might have to keep threads around for new events), and this could potentially generate a large number, as well as creating all sorts of synchronisation concerns since events are likely to raise other events. Naturally, it all comes down to whether you can rely on your event handlers to cooperate.