Use Unsigned Fixed Size Types

In the header stdint.h you can find a bunch of fixed size types, signed and unsigned, from 8 bits up to 64 bits. Their names are int8_t, uint8_t, int16_t, uint16_t etc. where the number clearly indicates the number of bits. These types are designed to solve the problem that the normal primitive types (char, short, int, long) don’t have a fixed size and therefore can have different sizes on different platforms.

So whenever reading from a file or from a socket (or in numerous other cases), for portability reasons, do not use the standard types and do use fixed size types. Also, take care about signed versus unsigned integer types; unsigned is generally what you’re looking for.

This whole issue was brought up when a friend of mine was reading an unsigned 32 bit value from a file (stored in big-endian byte order) – it’s actually harder than it seems. Here’s something resembling the key aspect of the code:

uint32_t convert32(char * bytes){
    return (uint32_t(bytes[0]) << 24) |
        (uint32_t(bytes[1]) << 16) |
        (uint32_t(bytes[2]) << 8) |
        uint32_t(bytes[3]);
}

uint32_t readUint32(FILE * fp){
    char bytes[4];
    fread(bytes, sizeof(char), 4, fp);
    return convert32(bytes);
}

Unfortunately, this code is incorrect. Even worse, it will work in some cases and not in others. If the values from 0 to 127 were stored, it would work. However, a stored value of 128 produces a result of 4294967168. This value is particularly large…in fact, it comes come to the maximum size of a 32 bit unsigned integer (4294967296). There are some other ranges of values at which it works (256 works fine, for example), and others beyond them at which it does not.

The problem revolves around the use of char – not only is it not guaranteed to be 8 bits in size, but on most platforms it defaults to signed (to add insult to injury, the language standard allows char to be signed on some platforms and unsigned on others). As a result, each of the four char values in the array is being interpreted as a signed value and so the program goes completely mad when asked to convert from these char values to uint32_t values. As far as I understand it, and have seen myself, the conversion from a signed char to an unsigned int goes as follows:

  1. Convert the signed char to a signed int (i.e. extend the type to the correct size)
  2. Reinterpret the signed int as an unsigned int

Reinterpreting isn’t a problem; it doesn’t modify the value stored. In fact, for unsigned values the first stage also doesn’t modify the value. However, conversion from signed char to signed int causes a sign extension (loads of ’1’s are added at the start). If your value is interpreted as a signed positive value (i.e. unsigned values from 0 to 127), you’re okay, but if you move into the negative realm you suffer from this sign extension. Hence, your value suddenly has a lot of ’1’s and therefore is very large from an unsigned perspective.

The solution? Use unsigned fixed size types. Specifically, read an array of uint8_t values. Or, even better:

uint32_t v = 0;
fread(&v, 1, 4, fp);
v = ntohl(v);

The ntohl function converts a 32 bit int from network order (always big-endian) to host order (could be either – most computers generally use little-endian unfortunately).