BTC#: Endianness

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Digital Signature Code

Next: SEC Serialisation »

This post is some background on how data is serialised byte-by-byte. If you already know your big end from your little end, you can skip it.

ComputingBasics.png

One of things we need to contend with when we write down (or “serialise”) the big long numbers that Bitcoin uses, is the problem of “endianness.”

Different microprocessors read and write their data in different ways, much like people do. Some human languages are written left-to-right, like English and Russian, others are written right-to-left, like Hebrew and Arabic. Likewise, some cultures write their dates day-month-year, others write year-month-day.

LanguagesAndDates.png

Britain and Europe tend to write their dates starting with the smallest unit, the day, and work their way up the largest unit, the year. In the language of computer architecture, we would call this a “little-endian” date, because it starts at the little end. China and Korea use “big-endian” dates. (In the U.S. and the Philippines, people start at the middle end. That’s cool. You do you.)

Religious Eggs-tremism

The names “big-endian” and “little-endian” come from Gulliver’s Travels and refer to the subject of a religious war between two groups of Lilliputians, who can’t agree on which end boiled eggs should be eaten from. The sacred text says that “all true believers shall break their eggs at the convenient end.” Opinions differ violently on which end is most convenient. The story is a satire on people who don’t believe in “you do you.”

EggConvenientEnd.png

Motorola vs Intel

When the first microprocessors appeared on the market, opinions differed on the best way to store multi-byte numbers. The big two microprocessors released in 1974 were the Motorola 6800 and the Intel 8080, ancestor of the x86 processor family that powered the PC revolution. Both were 8-bit processors, meaning that they handled data in chunks of 8 binary digits.

Storing numbers bigger than 255, the biggest value that can be stored in 8 bits, required multiple bytes. When multi-byte numbers were written to and read from memory, the 6800 stored the most significant bit, the “big end,” first. This is similar to how we write a multi-digit numbers: thousands first, then hundreds, then tens, then units.

Intel took the opposite approach, storing the least significant bit first.The Motorola way seems more natural to us but Intel engineers believed that their way had advantages. Operations like multiplication often process numbers from the smaller end first.

In 1974, Motorola and Intel systems didn’t need to talk to each other so it didn’t matter. Today, now that everything talks to everything, it’s crucial to know which format data is stored and transmitted in.

MotorolaIntelEndian.png

Big Integers in C#

The first place we care about this is when we need to create C# BigIntegers. If we have an array of bytes that we want to turn in a BigInteger object, we need to know whether that byte array is big-endian or little-endian. We also need to know that the BigInteger constructor that takes a byte array expects the bytes in little-endian format.

To make this simpler, I’ve created an extension method on byte[] to create a BigInteger based on a given format.

public static BigInteger ToBigInteger(this byte[] buffer, ByteArrayFormat format = ByteArrayFormat.LittleEndianSigned)
{
    if ((format & ByteArrayFormat.BigEndian) == ByteArrayFormat.BigEndian)
    {
        buffer = buffer.Reverse().ToArray();
    }
    if ((format & ByteArrayFormat.Unsigned) == ByteArrayFormat.Unsigned)
    {
        buffer = buffer.PostfixZeroes(1);
    }
    return new BigInteger(buffer);
}

This reverses the array if we’re told that it’s in big-endian form and then takes care of the sign of the number if we want to ensure that it’s positive.

Endianness in Bitcoin

In Bitcoin this problem is particularly pressing because different pieces of data are stored in different formats. For example, DER signatures are stored in big-endian form, whereas transaction hashes are stored in little-endian.

There’s no easy way to know which is which. We just need to read the documentation. Helper methods like the one above will get used a lot.

« Previous: Digital Signature Code

Next: SEC Serialisation »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s