BTC#: Verifying Transactions

April 3, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Spending with Combined Scripts

There are several things that a node does to verify a transaction. Programming Bitcoin covers two of them in detail: making sure that coins aren’t being created out of nothing and checking that signature scripts correctly unlock the value they’re spending.

Fee Calculation

To ensure that coins aren’t being created out of nothing we check the value of the inputs compared to the value of the outputs on a transaction. The value of the inputs must be equal to or greater than the value of the outputs. Where they’re not equal, the difference is a fee collected by the miners as an incentive for their work.

Continue reading “BTC#: Verifying Transactions” →

BTC#: Spending with Combined Scripts

March 28, 2019April 3, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Stack Machine Operations

Next: Verifying Transactions »

The simplest Bitcoin transactions have outputs that pay to a public key. That is, they have a locking script (or “pubkey script”) on the output that consists of a public key supplied by the recipient of the funds and the OP_CHECKSIG command.

Let’s say that Alice pays funds to Bob’s public key. To spend the resulting output, or UTXO, Bob needs to provide the magic number that will unlock his funds.

For a payment to a public key, that magic number is his digital signature, derived from the same private key that he used to generate the public key he gave to Alice.

The OP_CHECKSIG command pops two elements off the stack. It assumes the first to be a public key and it assumes the next is the signature.

public static bool OP_CHECKSIG(Stack<byte[]> stack, BigInteger z) {
    var pkBytes = stack.Pop();
    var pk = PublicKey.Parse(pkBytes);
    var sigBytes = stack.Pop();
    var sig = Signature.Parse(sigBytes);
    stack.Push(EncodeNumber(pk.Verify(sig, z) ? 1 : 0));
    return true;
}

This operation also takes an argument z for the document hash, which needs to match the document hash that was used to generate the signature.

To validate a transaction, the signature script on the input is concatenated with the pubkey script on the output and then the combined script is executed. The unlocking script on the input contains the signature that the locking script from the previous output needs to validate.

The two scripts are combined by adding an addition operator to the script class. This takes the two sets of commands and concatenates them into a new script.

public static Script operator +(Script a, Script b)
{
    return new Script(a.Commands.Concat(b.Commands).ToList());
}

You may have noticed the EncodeNumber() call in the OP_CHECKSIG method. This deals with some of details of how numbers are serialised onto the stack. The details are in the book and the code is in the repo.

To execute scripts rather than individual commands we need to add an Execute() method to the StackMachine and supply it with the script.

The StackMachine has a dictionary of functions keyed by op-code, so we can pull an op-code off the stack, pull its function reference from the dictionary, and execute it.

private static readonly Dictionary<byte, Func<Stack<byte[]>, bool>> Operations = new Dictionary<byte, Func<Stack<byte[]>, bool>>
{
    { 0, OP_0 },
    { 118, OP_DUP },
    { 169, OP_HASH160 },
    { 172, OP_CHECKSIG }
};

The Execute method takes each command of the stack in turn, determines whether it’s an element or an operation and then either pushes the element to the stack or executes the operation. (This is a simplified version to illustrate the basics. It doesn’t yet handle complexities like conditionals.)

public bool Execute(Script script, BigInteger docHash)
{
    var stack = new Stack<byte[]>();

    foreach (var command in script.Commands)
    {
        if (command.Length > 1 || Operations[command[0]] == null)
        {
            stack.Push(command);
            continue;
        }
        var operation = Operations[command[0]];
        if (operation == OP_IF || operation == OP_NOTIF)
        {
            // handle command list for conditionals
        }
        else if (operation == OP_TOALTSTACK || operation == OP_FROMALTSTACK)
        {
            // handle alternate stack
        }
        else if (operation == OP_CHECKSIG || operation == OP_CHECKSIGVERIFY ||
            operation == OP_CHECKMULTISIG || operation == OP_CHECKMULTISIGVERIFY)
        {
            // handle document hash
        }
        else
        {
            var result = operation(stack);
        }
    }
    return (stack.Count > 0 && DecodeNumber(stack.Peek()) > 0);
}

The result of the execution is a Boolean value indicating whether the script was successful. In the case of OP_CHECKSIG, if the signature was valid the method returns true and transaction validation is successful. Bob can spend his money.

Programming Bitcoin goes on to describe locking and unlocking Pay to Pubkey Hash (p2pkh) scripts, which are more secure than Pay to Public Key scripts and have a shorter locking script. Their workings are similar.

The book also touches on Pay to Script Hash scripts variations introduced with Segwit, which we’ll cover later.

« Previous: Stack Machine Operations

Next: Verifying Transactions »

BTC#: Stack Machine Operations

March 28, 2019March 28, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: The Stack

Next: Stack Machine Operations »

The Bitcoin stack machine operates on a stack of commands. Those commands can take two forms – elements and operations. Elements are numbers, or you can think of them as byte arrays. They could be numbers like zero and one or things like signatures and public keys. Operations are the instructions to the stack machine to do something. Typical operations remove (“pop”) an item from the stack, do something with it, and push a transformed value or a result back onto the stack.

BitcoinMechanics

I’ve created a StackMachine class that will have an Execute method, that we’ll come to later, and a collection of other static methods that perform the various stack machine operations.

Continue reading “BTC#: Stack Machine Operations” →

BTC#: The Stack

March 21, 2019March 28, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Script Parsing

Next: Stack Machine Operations »

Before getting into the content of the chapter on Bitcoin script, let’s have a quick refresher on the stack.

ComputingBasics

Bitcoin Script is a stack-oriented language. So just remind me again exactly what a stack is and how it works…

A stack is a last-in-first-out (LIFO) filing system. It’s like a stack of papers and books on a desk that have been piled up in whatever order they’ve been used. The oldest thing is at the bottom and the most recent thing is at the top.

Continue reading “BTC#: The Stack” →

BTC#: Script Parsing

March 21, 2019March 21, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Transaction Objects

Next: BTC#: The Stack »

Last time, in the Script.Parse method I just stored a raw byte array of the right length. Now I want to get into what the script parsing should look like.

Test First

First, here’s a test method to show what we’re aiming at. I start with the script serialised as hex. I want to parse it into a Script object and be able to read what the script actually says. This is the expected text in the code below: four op-codes and a public key hash element.

[TestMethod]
public void Parse_outputCommandText()
{
    var scriptHex = "1976a914ab0c0b2e98b1ab6dbf67d4750b0a56244948a87988ac";
    var reader = new BinaryReader(new MemoryStream(scriptHex.GetBytesFromHex()));

    var script = Script.Parse(reader);
    var expectedText = "OP_DUP OP_HASH160 ab0c0b2e98b1ab6dbf67d4750b0a56244948a879 OP_EQUALVERIFY OP_CHECKSIG";
    var actualText = script.ToString();

    Assert.AreEqual(expectedText, actualText);
}

The Parse Loop

The first byte is the length of the script. In this example it’s 0x19, or 25 in decimal. The Parse loop then runs until the correct number of bytes have been consumed from the stream.

public static Script Parse(BinaryReader reader)
{
    var scriptLength = (int)reader.ReadVarInt();
    var initialPosition = reader.BaseStream.Position;
    if (initialPosition + scriptLength > reader.BaseStream.Length)
    {
        throw new ParsingException($"Script parsing failed. Script length ({scriptLength} bytes) too long.");
    }

    var currentPosition = initialPosition;
    var commands = new List<byte[]>();
    while (currentPosition < initialPosition + scriptLength)
    {
        var currentValue = reader.ReadByte();
        if (currentValue >= 1 && currentValue <= 75)
        {
            commands.Add(reader.ReadBytes(currentValue));
        }
        else if (currentValue == (byte)OpCode.OP_PUSHDATA1)
        {
            var dataLength = reader.ReadByte();
            commands.Add(reader.ReadBytes(dataLength));
        }
        else if (currentValue == (byte)OpCode.OP_PUSHDATA2)
        {
            var dataLength = reader.ReadInt16();
            commands.Add(reader.ReadBytes(dataLength));
        }
        else
        {
            commands.Add(new byte[] { currentValue });
        }

        currentPosition = reader.BaseStream.Position;
    }
    var bytesConsumed = currentPosition - initialPosition;
    if (bytesConsumed != scriptLength)
    {
        throw new ParsingException($"Script parsing failed. {bytesConsumed} bytes consumed. Script length was {scriptLength}.");
    }

    return new Script(commands);
}

The Parse loop reads a byte, decides whether it’s dealing with an op-code or single-byte element, or a multi-byte element. If it’s a multi-byte element it reads the correct number of bytes. Then the op-code or element is added to the command list.

Making It Readable

To turn this into something human-readable, I’ve overridden the ToString() method. To render the op-codes I’ve created an OpCode enum that matches names to numbers. We can use the Enum.GetName method to get the op-code name when we render the script to text.

public enum OpCode
{
    OP_PUSHDATA1 = 76,
    OP_PUSHDATA2 = 77,
    OP_DUP = 118,
    OP_EQUALVERIFY = 136,
    OP_HASH160 = 169,
    OP_CHECKSIG = 172
}

There are many more values than this, which appear in the full code.

The ToString method works its way through the command list and renders op-code names if they appear in the enum or byte arrays encoded as hexadecimal for the other elements.

public override string ToString()
{
    var scriptBuilder = new StringBuilder();
    foreach (var command in Commands)
    {
        string commandText;
        if (command.Length == 1)
        {
            var opCodeName = Enum.GetName(typeof(OpCode), command[0]);
            commandText = string.IsNullOrEmpty(opCodeName) ? $"{command[0]:x2}" : opCodeName;
        }
        else
        {
            commandText = command.EncodeAsHex();
        }
        scriptBuilder.Append($"{commandText} ");
    }

    return scriptBuilder.ToString().TrimEnd();
}

Serialisation

The final piece of the puzzle is the Serialise method, which writes the entire script back out into a binary stream.

public void Serialise(BinaryWriter writer)
{
    var rawBytes = new List<byte>();
    foreach (var command in Commands)
    {
        if (command.Length > 1)
        {
            if (command.Length < 75)
            {
                rawBytes.Add((byte)command.Length);
            }
            else if (command.Length < 256)
            {
                rawBytes.Add((byte)OpCode.OP_PUSHDATA1);
                rawBytes.Add((byte)command.Length);
            }
            else if (command.Length <= 520)
            {
                rawBytes.Add((byte)OpCode.OP_PUSHDATA2);
                rawBytes.AddRange(BitConverter.GetBytes((short)command.Length));
            }
        }
        rawBytes.AddRange(command);
    }

    writer.WriteVarInt((ulong)rawBytes.Count);
    writer.Write(rawBytes.ToArray());
}

For multi-byte elements there’s some conditional code based on the length of the element, making sure that the PUSHDATA op-codes are put in correctly. Otherwise, the single-byte command is added directly to the buffer.

Once the script buffer is complete, its length is written to the stream as a variable-width integer, followed by the buffered byte array.

« Previous: Transaction Objects

Next: The Stack »

BTC#: Transaction Objects

March 20, 2019March 21, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Base 58 Encoded Addresses

Next: Script Parsing »

Now for the real meat. The work we’ve done so far, the maths and the serialisation, has provided the scaffolding for what we really want to do, which is transfer value.

We talked earlier about how there are no “coins” in Bitcoin. What we’re spending are Unspent Transaction Outputs (UTXOs). A transaction is a collection of inputs, which point back to previous UTXOs and a collection of outputs, which become the new UTXOs.

Continue reading “BTC#: Transaction Objects” →

BTC#: Base 58 Encoded Addresses

March 18, 2019March 20, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: DER Serialisation

Next: Transaction Objects »

BitcoinMechanics

Base 58 Encoding

Base 58 encoding is a compromise between hexadecimal, i.e. base 16 encoding, which can store 4 bits per character, and base 64 encoding, which can store 6 bits per character but can be confusing for humans to read. Base 58 tries to remove the confusion by eliminating characters that get mixed up, like O and 0 and 1 and l.

Base 58 seems slightly unnatural because we’ve worked in base 10 since we were toddlers and in powers to two – base 2, base 16, and base 64 – since we learned about computers. Despite feeling odd, the principle is the same. We cycle through a 58-character alphabet and when that loops over we move to the next column. Instead of ones, tens, and hundreds, we have ones, 58s, and 58-squareds, etc.

The Base 58 alphabet is held as a constant on the Serialisation class.

public const string BASE58_ALPHABET = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

That class also contains the method to encode.

public static string EncodeAsBase58(byte[] buffer)
{
    var base58 = new StringBuilder();
    var leadingZeroesCount = buffer.TakeWhile(b => b == 0).Count();
    base58.Append('1', leadingZeroesCount);

    var number = buffer.ToBigInteger(ByteArrayFormat.BigEndianUnsigned);
    var base58CharBuffer = new StringBuilder(); //Least significant character first
    while (number > 0)
    {
        base58CharBuffer.Append(BASE58_ALPHABET[(int)number.Mod(58)]);
        number = number / 58;
    }

    base58.Append(base58CharBuffer.ToString().Reverse().ToArray());
    return base58.ToString();
}

The algorithm involves converting the byte array to a BigInteger and then repeatedly dividing by 58 and using the remainders to look up the character to go into each column. In contrast to the code in the book, the base 58 string here is built up in the opposite order, so needs to be reversed once we have all the characters.

Base 58 with Checksum

Part of the motivation for base 58 encoding was to allow addresses to be copied by hand, from printed text, or read out loud. To help prevent mistakes, the Base58Check format is used. This hashes the data prior to encoding and adds the first four bytes of the hash at the end. If there’s an error in copying, the hash won’t match and an address with a type in it won’t be accepted as valid.

The method for Base58Check is also on the serialisation class. The Base58CheckType enum indicates which leading bytes should be added based on what’s being encoded.

public static string EncodeAsBase58Check(byte[] addressBytes, Base58CheckType type)
{
    var prefix = GetBase58CheckPrefix(type);
    var prefixedLength = prefix.Length + addressBytes.Length;
    var addressBuffer = new byte[prefixedLength + 4];

    prefix.CopyTo(addressBuffer, 0);
    addressBytes.CopyTo(addressBuffer, prefix.Length);

    var checkBytes = Hash.DoubleSha256(addressBuffer.Segment(0, prefixedLength)).Segment(0, 4);
    checkBytes.CopyTo(addressBuffer, prefixedLength);

    return EncodeAsBase58(addressBuffer);
}

The method generates the checksum hash, concatenates all the pieces and then base-58 encodes the result.

Addresses and Hash 160

Bitcoin payments could be made to public keys but generally are not. The more common payment destination is an address derived from the public key.

Addresses are shorter than SEC format public keys because they take the SEC formatted public key and hash it using SHA-256 followed by another hash algorithm called RIPEMD-160, which results in 20 bytes rather than 33. This combination is known as Hash160.

I’ve created a Hash160 implementation on the static Hash class using C#’s implementation of RIPEMD-160.

public static byte[] Hash160(byte[] buffer)
{
    var sha256 = SHA256.Create();
    var ripemd160 = RIPEMD160.Create();
    return ripemd160.ComputeHash(sha256.ComputeHash(buffer));
}

The code to create a Bitcoin address is on the PublicKey class. It’s very short as it simply composes pieces we’ve already looked at.

public string ToBase58Address(SerialisationFormat format)
{
    var type = (format & SerialisationFormat.TestNet) > 0 ?
        Base58CheckType.TESTNET_ADDRESS : Base58CheckType.MAINNET_ADDRESS;
    return Serialisation.EncodeAsBase58Check(Hash.Hash160(ToSecFormat(format)), type);
}

There’s a header byte to indicate which network the address belongs to and then a Base58Check encoded hash of the SEC formatted public key.

Wallet Import Formatted Private Keys

Wallet Import Format (WIF) is used to serialise private keys. Generally you don’t want to do this because the keys should remain secret and they shouldn’t be left lying around. There are times, though, when you may need to transfer them from one place to another.

WIF is very similar to address format we saw above with the header bytes and the Base58Check encoding.

Prior to now, the private key was stored as a raw number inside a KeyPair object but, since were adding behaviour, I’ve broken it out into its own class.

public PrivateKey(BigInteger secret)
{
    Secret = secret;
}

public string ToWifFormat(SerialisationFormat format)
{
    var type = (format & SerialisationFormat.TestNet) > 0 ?
        Base58CheckType.PRIVATE_KEY_WIF_TESTNET : Base58CheckType.PRIVATE_KEY_WIF;
    var wifBuffer = Secret.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32);
    if ((format & SerialisationFormat.Compressed) > 0)
    {
        wifBuffer = wifBuffer.Concat(new byte[] { 1 }).ToArray();
    }
    return Serialisation.EncodeAsBase58Check(wifBuffer, type);
}

That’s the end of the scaffolding work. We have all the maths and the serialisation code in place. The next chapters get into Bitcoin transactions and scripting.

« Previous: DER Serialisation

Next: Transaction Objects »

BTC#: DER Serialisation

March 18, 2019March 18, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: SEC Serialisation

Next: Base 58 Encoded Addresses »

DER Format for Signatures

DER format does a similar job to SEC format, encoding a pair of 256-bit numbers. There’s no compressed format because there’s no way to derive a signature’s s-component from the r-value.

DER format is a little more complicated than SEC because it allows for the encoding of negative numbers, with the sign of the number encoded by the most significant bit. This means that, to ensure all numbers are positive, a leading zero is added to all numbers that have that bit set.

The optional leading zero makes the size of the r- and s- fields vary, which means that the format needs to include length information for each component and for the total length of the serialised data.

Prefixing and postfixing zeroes to byte array was something I’ve found myself doing a few times. In the spirit of keeping the code DRY (“Don’t Repeat Yourself”), I’ve added byte array extension method to do this.

public static byte[] PrefixZeroes(this byte[] buffer, int count)
{
    if (count == 0) return buffer;

    var result = new byte[buffer.Length + count];
    buffer.CopyTo(result, count);
    return result;
}

The method to serialise signatures to DER format is on the Signature class.

public byte[] ToDerFormat()
{
    var rBytes = R.ToByteArray(ByteArrayFormat.BigEndianUnsigned);
    if (rBytes[0] > 127)
    {
        rBytes = rBytes.PrefixZeroes(1);
    }
    var sBytes = S.ToByteArray(ByteArrayFormat.BigEndianUnsigned);
    if (sBytes[0] > 127)
    {
        sBytes = sBytes.PrefixZeroes(1);
    }

    var totalLength = rBytes.Length + sBytes.Length + 6;
    var derBuffer = new byte[totalLength];
    derBuffer[0] = Serialisation.DER_START_MARKER;
    derBuffer[1] = (byte)(totalLength - 2);
    derBuffer[2] = Serialisation.DER_INTEGER_MARKER;
    derBuffer[3] = (byte)rBytes.Length;
    rBytes.CopyTo(derBuffer, 4);
    derBuffer[rBytes.Length + 4] = Serialisation.DER_INTEGER_MARKER;
    derBuffer[rBytes.Length + 5] = (byte)sBytes.Length;
    sBytes.CopyTo(derBuffer, rBytes.Length + 6);

    return derBuffer;
}

The code is a little more involved than the Python code in the book but it’s not complex. In C# we need to explicitly cast integers to bytes and I’ve used a buffer of the right size to copy the values into rather than concatenating as we go.

« Previous: SEC Serialisation

Next: Base 58 Encoded Addresses »

BTC#: SEC Serialisation

March 18, 2019March 18, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Endianness

Next: DER Serialisation »

So far, we’ve just created objects like public keys and signatures and manipulated them in memory. Soon, we’ll want to serialise them to the network or to disk as we construct scripts and transactions.

SEC for Public Keys

The first type of serialisation Programming Bitcoin introduces is the SEC format used for public keys. The uncompressed format is very straightforward: a marker byte to indicate that it’s uncompressed followed by the x- and y-values of the public key in big-endian hexadecimal.

The compressed format takes advantage of the fact that the elliptic curve is symmetrical. If you know the x-value, you can narrow the y-value down to two possibilities, one even and one odd. The compressed serialisation consists of a marker byte indicating the parity (the even- or odd-ness) of the y-value followed by the x-value serialised as above.

I’ve added a SerialisationFormat enum to hold the different serialisation rules we’ll come across and a static Serialisation class to hold constants for things like the values of the marker bytes.

public enum SerialisationFormat
{
    Uncompressed = 0,
    Compressed = 1,
}

public static class Serialisation
{
    public const byte SEC_MARKER_COMPRESSED_EVEN = 2;
    public const byte SEC_MARKER_COMPRESSED_ODD = 3;
    public const byte SEC_MARKER_UNCOMPRESSED = 4;
}

The serialisation code is on the PublicKey class and makes use of the Endianness helpers from the previous post.

public byte[] ToSecFormat(SerialisationFormat format)
{
    return (format & SerialisationFormat.Compressed) == SerialisationFormat.Compressed ?
        ToCompressedSecFormat() :
        ToUncompressedSecFormat();
}

private byte[] ToUncompressedSecFormat()
{
    return (new byte[] { Serialisation.SEC_MARKER_UNCOMPRESSED })
        .Concat(Point.X.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .Concat(Point.Y.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .ToArray();
}

private byte[] ToCompressedSecFormat()
{
    var markerByte = Point.Y.Value % 2 == 0 ?
        Serialisation.SEC_MARKER_COMPRESSED_EVEN :
        Serialisation.SEC_MARKER_COMPRESSED_ODD;

    return (new byte[] { markerByte })
        .Concat(Point.X.Value.ToByteArray(ByteArrayFormat.BigEndianUnsigned, 32))
        .ToArray();
}

SEC serialisation is very simple. Deserialising the uncompressed form is also very simple because all the necessary data is already present. Deserialising the compressed form is a little bit harder because the y-value needs to be derived. One necessary extra is the ability to find the square root of a finite field element. The book shows the derivation but the resulting code is quite simple.

public FiniteFieldElement Sqrt()
{
    return this.Pow((Order + 1) / 4);
}

The Parse method is a static method that returns a PublicKey.

public static PublicKey Parse(byte[] secBuffer)
{
    var marker = secBuffer[0];

    if(marker < Serialisation.SEC_MARKER_MIN || marker > Serialisation.SEC_MARKER_MAX)
    {
        throw new ArgumentException($"Invalid marker byte. {marker:x2} supplied.");
    }

    var curve = Secp256k1.Curve;
    FiniteFieldElement x, y;

    if (marker == Serialisation.SEC_MARKER_UNCOMPRESSED)
    {
        if (secBuffer.Length != Serialisation.SEC_LENGTH_UNCOMPRESSED)
        {
            throw new ArgumentException($"Uncompressed SEC buffer (prefix {marker:x2}) must be {Serialisation.SEC_LENGTH_UNCOMPRESSED} bytes long. {secBuffer.Length} bytes supplied.");
        }
        x = new FiniteFieldElement(
            secBuffer.Segment(1, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
            Secp256k1.P
        );
        y = new FiniteFieldElement(
            secBuffer.Segment(33, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
            Secp256k1.P
        );
        return new PublicKey(new EllipticCurveFiniteFieldPoint(x, y, curve));
    }

    if (secBuffer.Length != Serialisation.SEC_LENGTH_COMPRESSED)
    {
        throw new ArgumentException($"Compressed SEC buffer (prefix {marker:x2}) must be {Serialisation.SEC_LENGTH_COMPRESSED} bytes long. {secBuffer.Length} bytes supplied.");
    }
    x = new FiniteFieldElement(
        secBuffer.Segment(1, 32).ToBigInteger(ByteArrayFormat.BigEndianUnsigned),
        Secp256k1.P
    );
    var alpha = x.Pow(3) + curve.B;
    var beta = alpha.Sqrt();
    y = (marker == 2 ^ beta.Value % 2 == 0) ?
        new FiniteFieldElement(Secp256k1.P - beta.Value, Secp256k1.P) :
        beta;
    return new PublicKey(new EllipticCurveFiniteFieldPoint(x, y, curve));
}

The code checks the marker byte and the length of the buffer supplied – 65 bytes when uncompressed, 33 when compressed. If that’s OK, it segments the byte array into the appropriate pieces and deserialises them into BigInteger objects. For compressed data it then works out what the y-value is and returns the new public key.

« Previous: Endianness

Next: DER Serialisation »

BTC#: Endianness

March 15, 2019March 18, 2019 ~ Bernard Darnton ~ Leave a comment

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Digital Signature Code

Next: SEC Serialisation »

This post is some background on how data is serialised byte-by-byte. If you already know your big end from your little end, you can skip it.

One of things we need to contend with when we write down (or “serialise”) the big long numbers that Bitcoin uses, is the problem of “endianness.”

Different microprocessors read and write their data in different ways, much like people do. Some human languages are written left-to-right, like English and Russian, others are written right-to-left, like Hebrew and Arabic. Likewise, some cultures write their dates day-month-year, others write year-month-day.

Britain and Europe tend to write their dates starting with the smallest unit, the day, and work their way up the largest unit, the year. In the language of computer architecture, we would call this a “little-endian” date, because it starts at the little end. China and Korea use “big-endian” dates. (In the U.S. and the Philippines, people start at the middle end. That’s cool. You do you.)

Religious Eggs-tremism

The names “big-endian” and “little-endian” come from Gulliver’s Travels and refer to the subject of a religious war between two groups of Lilliputians, who can’t agree on which end boiled eggs should be eaten from. The sacred text says that “all true believers shall break their eggs at the convenient end.” Opinions differ violently on which end is most convenient. The story is a satire on people who don’t believe in “you do you.”

Motorola vs Intel

When the first microprocessors appeared on the market, opinions differed on the best way to store multi-byte numbers. The big two microprocessors released in 1974 were the Motorola 6800 and the Intel 8080, ancestor of the x86 processor family that powered the PC revolution. Both were 8-bit processors, meaning that they handled data in chunks of 8 binary digits.

Storing numbers bigger than 255, the biggest value that can be stored in 8 bits, required multiple bytes. When multi-byte numbers were written to and read from memory, the 6800 stored the most significant bit, the “big end,” first. This is similar to how we write a multi-digit numbers: thousands first, then hundreds, then tens, then units.

Intel took the opposite approach, storing the least significant bit first.The Motorola way seems more natural to us but Intel engineers believed that their way had advantages. Operations like multiplication often process numbers from the smaller end first.

In 1974, Motorola and Intel systems didn’t need to talk to each other so it didn’t matter. Today, now that everything talks to everything, it’s crucial to know which format data is stored and transmitted in.

Big Integers in C#

The first place we care about this is when we need to create C# BigIntegers. If we have an array of bytes that we want to turn in a BigInteger object, we need to know whether that byte array is big-endian or little-endian. We also need to know that the BigInteger constructor that takes a byte array expects the bytes in little-endian format.

To make this simpler, I’ve created an extension method on byte[] to create a BigInteger based on a given format.

public static BigInteger ToBigInteger(this byte[] buffer, ByteArrayFormat format = ByteArrayFormat.LittleEndianSigned)
{
    if ((format & ByteArrayFormat.BigEndian) == ByteArrayFormat.BigEndian)
    {
        buffer = buffer.Reverse().ToArray();
    }
    if ((format & ByteArrayFormat.Unsigned) == ByteArrayFormat.Unsigned)
    {
        buffer = buffer.PostfixZeroes(1);
    }
    return new BigInteger(buffer);
}

This reverses the array if we’re told that it’s in big-endian form and then takes care of the sign of the number if we want to ensure that it’s positive.

Endianness in Bitcoin

In Bitcoin this problem is particularly pressing because different pieces of data are stored in different formats. For example, DER signatures are stored in big-endian form, whereas transaction hashes are stored in little-endian.

There’s no easy way to know which is which. We just need to read the documentation. Helper methods like the one above will get used a lot.

« Previous: Digital Signature Code

Next: SEC Serialisation »