BTC#: DER Serialisation

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: SEC Serialisation

Next: Base 58 Encoded Addresses »

DER Format for Signatures

DER format does a similar job to SEC format, encoding a pair of 256-bit numbers. There’s no compressed format because there’s no way to derive a signature’s s-component from the r-value.

DER format is a little more complicated than SEC because it allows for the encoding of negative numbers, with the sign of the number encoded by the most significant bit. This means that, to ensure all numbers are positive, a leading zero is added to all numbers that have that bit set.

The optional leading zero makes the size of the r- and s- fields vary, which means that the format needs to include length information for each component and for the total length of the serialised data.

DerSerialisation.png

Prefixing and postfixing zeroes to byte array was something I’ve found myself doing a few times. In the spirit of keeping the code DRY (“Don’t Repeat Yourself”), I’ve added byte array extension method to do this.

public static byte[] PrefixZeroes(this byte[] buffer, int count)
{
    if (count == 0) return buffer;

    var result = new byte[buffer.Length + count];
    buffer.CopyTo(result, count);
    return result;
}

The method to serialise signatures to DER format is on the Signature class.

public byte[] ToDerFormat()
{
    var rBytes = R.ToByteArray(ByteArrayFormat.BigEndianUnsigned);
    if (rBytes[0] > 127)
    {
        rBytes = rBytes.PrefixZeroes(1);
    }
    var sBytes = S.ToByteArray(ByteArrayFormat.BigEndianUnsigned);
    if (sBytes[0] > 127)
    {
        sBytes = sBytes.PrefixZeroes(1);
    }

    var totalLength = rBytes.Length + sBytes.Length + 6;
    var derBuffer = new byte[totalLength];
    derBuffer[0] = Serialisation.DER_START_MARKER;
    derBuffer[1] = (byte)(totalLength - 2);
    derBuffer[2] = Serialisation.DER_INTEGER_MARKER;
    derBuffer[3] = (byte)rBytes.Length;
    rBytes.CopyTo(derBuffer, 4);
    derBuffer[rBytes.Length + 4] = Serialisation.DER_INTEGER_MARKER;
    derBuffer[rBytes.Length + 5] = (byte)sBytes.Length;
    sBytes.CopyTo(derBuffer, rBytes.Length + 6);

    return derBuffer;
}

The code is a little more involved than the Python code in the book but it’s not complex. In C# we need to explicitly cast integers to bytes and I’ve used a buffer of the right size to copy the values into rather than concatenating as we go.

« Previous: SEC Serialisation

Next: Base 58 Encoded Addresses »

 

BTC#: SEC Serialisation

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Endianness

Next: DER Serialisation »

So far, we’ve just created objects like public keys and signatures and manipulated them in memory. Soon, we’ll want to serialise them to the network or to disk as we construct scripts and transactions.

BitcoinMechanics.png

SEC for Public Keys

The first type of serialisation Programming Bitcoin introduces is the SEC format used for public keys. The uncompressed format is very straightforward: a marker byte to indicate that it’s uncompressed followed by the x- and y-values of the public key in big-endian hexadecimal.

SecSerialisation.png

The compressed format takes advantage of the fact that the elliptic curve is symmetrical. If you know the x-value, you can narrow the y-value down to two possibilities, one even and one odd. The compressed serialisation consists of a marker byte indicating the parity (the even- or odd-ness) of the y-value followed by the x-value serialised as above.

I’ve added a SerialisationFormat enum to hold the different serialisation rules we’ll come across and a static Serialisation class to hold constants for things like the values of the marker bytes.

public enum SerialisationFormat
{
    Uncompressed = 0,
    Compressed = 1,
}
public static class Serialisation
{
    public const byte SEC_MARKER_COMPRESSED_EVEN = 2;
    public const byte SEC_MARKER_COMPRESSED_ODD = 3;
    public const byte SEC_MARKER_UNCOMPRESSED = 4;
}

The serialisation code is on the PublicKey class and makes use of the Endianness helpers from the previous post.

public byte[] ToSecFormat(SerialisationFormat format)
{
    return (format & SerialisationFormat.Compressed) == SerialisationFormat.Compressed ?
        ToCompressedSecFormat() :
        ToUncompressedSecFormat();
}

private byte[] ToUncompressedSecFormat()
{
    return (new byte[] { Serialisation.SEC_MARKER_UNCOMPRESSED })
        .Concat(Point.X.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .Concat(Point.Y.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .ToArray();
}

private byte[] ToCompressedSecFormat()
{
    var markerByte = Point.Y.Value % 2 == 0 ?
        Serialisation.SEC_MARKER_COMPRESSED_EVEN :
        Serialisation.SEC_MARKER_COMPRESSED_ODD;

    return (new byte[] { markerByte })
        .Concat(Point.X.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .ToArray();
}

SEC serialisation is very simple. Deserialising the uncompressed form is also very simple because all the necessary data is already present. Deserialising the compressed form is a little bit harder because the y-value needs to be derived. One necessary extra is the ability to find the square root of a finite field element. The book shows the derivation but the resulting code is quite simple.

public FiniteFieldElement Sqrt()
{
    return this.Pow((Order + 1) / 4);
}

The Parse method is a static method that returns a PublicKey.

public static PublicKey Parse(byte[] secBuffer)
{
    var marker = secBuffer[0];

    if(marker < Serialisation.SEC_MARKER_MIN || marker > Serialisation.SEC_MARKER_MAX)
    {
        throw new ArgumentException($"Invalid marker byte. {marker:x2} supplied.");
    }

    var curve = Secp256k1.Curve;
    FiniteFieldElement x, y;

    if (marker == Serialisation.SEC_MARKER_UNCOMPRESSED)
    {
        if (secBuffer.Length != Serialisation.SEC_LENGTH_UNCOMPRESSED)
        {
            throw new ArgumentException($"Uncompressed SEC buffer (prefix {marker:x2}) must be {Serialisation.SEC_LENGTH_UNCOMPRESSED} bytes long. {secBuffer.Length} bytes supplied.");
        }
        x = new FiniteFieldElement(
            secBuffer.Segment(1, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
            Secp256k1.P
        );
        y = new FiniteFieldElement(
            secBuffer.Segment(33, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
            Secp256k1.P
        );
        return new PublicKey(new EllipticCurveFiniteFieldPoint(x, y, curve));
    }

    if (secBuffer.Length != Serialisation.SEC_LENGTH_COMPRESSED)
    {
        throw new ArgumentException($"Compressed SEC buffer (prefix {marker:x2}) must be {Serialisation.SEC_LENGTH_COMPRESSED} bytes long. {secBuffer.Length} bytes supplied.");
    }
    x = new FiniteFieldElement(
        secBuffer.Segment(1, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
        Secp256k1.P
    );
    var alpha = x.Pow(3) + curve.B;
    var beta = alpha.Sqrt();
    y = (marker == 2 ^ beta.Value % 2 == 0) ?
        new FiniteFieldElement(Secp256k1.P - beta.Value, Secp256k1.P) :
        beta;
    return new PublicKey(new EllipticCurveFiniteFieldPoint(x, y, curve));
}

The code checks the marker byte and the length of the buffer supplied – 65 bytes when uncompressed, 33 when compressed. If that’s OK, it segments the byte array into the appropriate pieces and deserialises them into BigInteger objects. For compressed data it then works out what the y-value is and returns the new public key.

« Previous: Endianness

Next: DER Serialisation »

 

BTC#: Endianness

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Digital Signature Code

Next: SEC Serialisation »

This post is some background on how data is serialised byte-by-byte. If you already know your big end from your little end, you can skip it.

ComputingBasics.png

One of things we need to contend with when we write down (or “serialise”) the big long numbers that Bitcoin uses, is the problem of “endianness.”

Different microprocessors read and write their data in different ways, much like people do. Some human languages are written left-to-right, like English and Russian, others are written right-to-left, like Hebrew and Arabic. Likewise, some cultures write their dates day-month-year, others write year-month-day.

LanguagesAndDates.png

Britain and Europe tend to write their dates starting with the smallest unit, the day, and work their way up the largest unit, the year. In the language of computer architecture, we would call this a “little-endian” date, because it starts at the little end. China and Korea use “big-endian” dates. (In the U.S. and the Philippines, people start at the middle end. That’s cool. You do you.)

Religious Eggs-tremism

The names “big-endian” and “little-endian” come from Gulliver’s Travels and refer to the subject of a religious war between two groups of Lilliputians, who can’t agree on which end boiled eggs should be eaten from. The sacred text says that “all true believers shall break their eggs at the convenient end.” Opinions differ violently on which end is most convenient. The story is a satire on people who don’t believe in “you do you.”

EggConvenientEnd.png

Motorola vs Intel

When the first microprocessors appeared on the market, opinions differed on the best way to store multi-byte numbers. The big two microprocessors released in 1974 were the Motorola 6800 and the Intel 8080, ancestor of the x86 processor family that powered the PC revolution. Both were 8-bit processors, meaning that they handled data in chunks of 8 binary digits.

Storing numbers bigger than 255, the biggest value that can be stored in 8 bits, required multiple bytes. When multi-byte numbers were written to and read from memory, the 6800 stored the most significant bit, the “big end,” first. This is similar to how we write a multi-digit numbers: thousands first, then hundreds, then tens, then units.

Intel took the opposite approach, storing the least significant bit first.The Motorola way seems more natural to us but Intel engineers believed that their way had advantages. Operations like multiplication often process numbers from the smaller end first.

In 1974, Motorola and Intel systems didn’t need to talk to each other so it didn’t matter. Today, now that everything talks to everything, it’s crucial to know which format data is stored and transmitted in.

MotorolaIntelEndian.png

Big Integers in C#

The first place we care about this is when we need to create C# BigIntegers. If we have an array of bytes that we want to turn in a BigInteger object, we need to know whether that byte array is big-endian or little-endian. We also need to know that the BigInteger constructor that takes a byte array expects the bytes in little-endian format.

To make this simpler, I’ve created an extension method on byte[] to create a BigInteger based on a given format.

public static BigInteger ToBigInteger(this byte[] buffer, ByteArrayFormat format = ByteArrayFormat.LittleEndianSigned)
{
    if ((format & ByteArrayFormat.BigEndian) == ByteArrayFormat.BigEndian)
    {
        buffer = buffer.Reverse().ToArray();
    }
    if ((format & ByteArrayFormat.Unsigned) == ByteArrayFormat.Unsigned)
    {
        buffer = buffer.PostfixZeroes(1);
    }
    return new BigInteger(buffer);
}

This reverses the array if we’re told that it’s in big-endian form and then takes care of the sign of the number if we want to ensure that it’s positive.

Endianness in Bitcoin

In Bitcoin this problem is particularly pressing because different pieces of data are stored in different formats. For example, DER signatures are stored in big-endian form, whereas transaction hashes are stored in little-endian.

There’s no easy way to know which is which. We just need to read the documentation. Helper methods like the one above will get used a lot.

« Previous: Digital Signature Code

Next: SEC Serialisation »

BTC#: Digital Signatures

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: The Bitcoin Curve

Next: Digital Signature Code »

CryptographyBasicsTitle

Your digital identity is a private key, a secret number than only you know. It’s a bit like a PIN on your credit card but much, much longer. One of the major security flaws with credit cards is that you need to reveal your secret to authorise your spending.

CreditCardAliceBobEve.png

Digital signatures prove that you know a secret without revealing the secret. Only a person in possession of the secret can produce a particular digital signature and the signature can’t be produced without knowing the secret. In other words, a signature authenticates a message and its origin can’t be denied.

A digital signature also incorporates details of the message that was signed, so the message can’t be tampered with without invalidating the signature.

How Signatures Work

A signature is built up from the signatory’s private key, a hash of the document being signed, and an ephemeral key that is used for a single message. Of these, only the document hash is publicly known. The other two are protected by an irreversible elliptic curve multiplication. The signature is made up of two components, r and s.

SignatureDataFlow.png

To verify that a signature is valid, the receiver needs the signature, the document hash, and the signatory’s public key. The signature’s r-component is derived from the ephemeral key and the s-component is composed in such a way that we can confirm that it was created using the document hash and the two secrets, without exposing either of the secrets.

The verification algorithm is an equation in which all the parts cancel out if the correct document hash and public key are provided. If the signature was created using a different private key than the one used to create the public key, the equation doesn’t balance, meaning the signature is not authentic. If the signature was created using a different document hash, the equation doesn’t balance, meaning that the document has been tampered with.

SignatureEquation.png

Programming Bitcoin provides a baffling analogical explanation of digital signature verification involving William Tell, inscriptions on arrow heads, and x-ray machines, which can be skipped, but do go through the equations and their derivations. If you want to go direct to the source, it’s NIST’s Digital Signature Standard.

« Previous: The Bitcoin Curve

Next: Digital Signature Code »

BTC#: The Bitcoin Curve

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: BTC#: Elliptic Curves over Finite Fields

Next: BTC#: Digital Signatures »

The trick to breaking elliptic curve cryptography is reversing scalar multiplication of points on the curve. When we’ve got curves defined over small finite fields it’s easy: just work out the complete set of multiples (the members of the finite cyclic group) and look up the answer.

That’s why the prime used to define the finite field, and therefore the number of elements in the group, is mindbogglingly big.

The parameters of the secp256k1 curve used by Bitcoin are on the Bitoin Wiki.

EllipticCurveReal

I’ve copied those into a static Secp256k1 class. (There is a post on big integers in C# to jog your memory.)

public static class Secp256k1
    {
        // Highest order bit parsed as sign, so lead with 00.
        public static BigInteger P { get; } = BigInteger.Parse("00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F", NumberStyles.AllowHexSpecifier);

        public static BigInteger A = BigInteger.Zero;

        public static BigInteger B = new BigInteger(7);

        public static BigInteger Gx { get; } = BigInteger.Parse("0079BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798", NumberStyles.AllowHexSpecifier);

        public static BigInteger Gy { get; } = BigInteger.Parse("00483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8", NumberStyles.AllowHexSpecifier);

        public static BigInteger N { get; } = BigInteger.Parse("00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141", NumberStyles.AllowHexSpecifier);
    }

I’ve also added a couple of extra constants to that class. These use the coefficients A and B and the prime P to create a curve object and the generator point coordinates Gx and Gy to create a generator point object.

public static EllipticCurveOverFiniteField Curve { get; } = new EllipticCurveOverFiniteField(A, B, P);

public static EllipticCurveFiniteFieldPoint G = Curve.GetPoint(Gx, Gy);

A couple of unit tests make sure that everything is defined properly.

[TestMethod]
public void Curve_GeneratorPointIsOnCurve()
{
    Assert.IsTrue(Secp256k1.Curve.PointIsOnCurve(Secp256k1.Gx, Secp256k1.Gy));
}

[TestMethod]
public void GeneratorPointHasOrderN()
{
    Assert.IsTrue((Secp256k1.N * Secp256k1.G).AtInfinity);
}

Note that the multiplication N x G, which checks the order of the generator point, would take vastly longer than the age of the universe to calculate by repeated addition but takes milliseconds when done by binary expansion.

This approach is slightly different to the one taken in the book, so compare them and use whichever you prefer.

« Previous: BTC#: Elliptic Curves over Finite Fields

Next: BTC#: Digital Signatures »

BTC#: Elliptic Curves over Finite Fields

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Elliptic Curve Point Addition

Next: The Bitcoin Curve »

Coding for elliptic curves over finite fields is the target of our work so far. In Programming Bitcoin, Song just passes FieldElement objects into Point’s constructor and lets the Python interpreter’s type inference do the rest. The mathematical code in the Point class just calls the operator functions defined on FieldElement instead of the integer versions.

In the strictly typed world of C#, we don’t get that luxury. There’s no interface or other generic type constraint common to our FiniteFieldElement and Int32, so I’ve written a new class, EllipticCurveFiniteFieldPoint. (Get the repo.)

Continue reading “BTC#: Elliptic Curves over Finite Fields”