BTC#: Script Parsing

Series: BTC# – Learning to Program Bitcoin in C#

« Previous: Transaction Objects

Next: BTC#: The Stack »

Last time, in the Script.Parse method I just stored a raw byte array of the right length. Now I want to get into what the script parsing should look like.

ScriptPubKeyParsing.png

Test First

First, here’s a test method to show what we’re aiming at. I start with the script serialised as hex. I want to parse it into a Script object and be able to read what the script actually says. This is the expected text in the code below: four op-codes and a public key hash element.

[TestMethod]
public void Parse_outputCommandText()
{
    var scriptHex = "1976a914ab0c0b2e98b1ab6dbf67d4750b0a56244948a87988ac";
    var reader = new BinaryReader(new MemoryStream(scriptHex.GetBytesFromHex()));

    var script = Script.Parse(reader);
    var expectedText = "OP_DUP OP_HASH160 ab0c0b2e98b1ab6dbf67d4750b0a56244948a879 OP_EQUALVERIFY OP_CHECKSIG";
    var actualText = script.ToString();

    Assert.AreEqual(expectedText, actualText);
}

The Parse Loop

The first byte is the length of the script. In this example it’s 0x19, or 25 in decimal. The Parse loop then runs until the correct number of bytes have been consumed from the stream.

public static Script Parse(BinaryReader reader)
{
    var scriptLength = (int)reader.ReadVarInt();
    var initialPosition = reader.BaseStream.Position;
    if (initialPosition + scriptLength > reader.BaseStream.Length)
    {
        throw new ParsingException($"Script parsing failed. Script length ({scriptLength} bytes) too long.");
    }

    var currentPosition = initialPosition;
    var commands = new List<byte[]>();
    while (currentPosition < initialPosition + scriptLength)
    {
        var currentValue = reader.ReadByte();
        if (currentValue >= 1 && currentValue <= 75)
        {
            commands.Add(reader.ReadBytes(currentValue));
        }
        else if (currentValue == (byte)OpCode.OP_PUSHDATA1)
        {
            var dataLength = reader.ReadByte();
            commands.Add(reader.ReadBytes(dataLength));
        }
        else if (currentValue == (byte)OpCode.OP_PUSHDATA2)
        {
            var dataLength = reader.ReadInt16();
            commands.Add(reader.ReadBytes(dataLength));
        }
        else
        {
            commands.Add(new byte[] { currentValue });
        }

        currentPosition = reader.BaseStream.Position;
    }
    var bytesConsumed = currentPosition - initialPosition;
    if (bytesConsumed != scriptLength)
    {
        throw new ParsingException($"Script parsing failed. {bytesConsumed} bytes consumed. Script length was {scriptLength}.");
    }

    return new Script(commands);
}

The Parse loop reads a byte, decides whether it’s dealing with an op-code or single-byte element, or a multi-byte element. If it’s a multi-byte element it reads the correct number of bytes. Then the op-code or element is added to the command list.

Making It Readable

To turn this into something human-readable, I’ve overridden the ToString() method. To render the op-codes I’ve created an OpCode enum that matches names to numbers. We can use the Enum.GetName method to get the op-code name when we render the script to text.

public enum OpCode
{
    OP_PUSHDATA1 = 76,
    OP_PUSHDATA2 = 77,
    OP_DUP = 118,
    OP_EQUALVERIFY = 136,
    OP_HASH160 = 169,
    OP_CHECKSIG = 172
}

There are many more values than this, which appear in the full code.

The ToString method works its way through the command list and renders op-code names if they appear in the enum or byte arrays encoded as hexadecimal for the other elements.

public override string ToString()
{
    var scriptBuilder = new StringBuilder();
    foreach (var command in Commands)
    {
        string commandText;
        if (command.Length == 1)
        {
            var opCodeName = Enum.GetName(typeof(OpCode), command[0]);
            commandText = string.IsNullOrEmpty(opCodeName) ? $"{command[0]:x2}" : opCodeName;
        }
        else
        {
            commandText = command.EncodeAsHex();
        }
        scriptBuilder.Append($"{commandText} ");
    }

    return scriptBuilder.ToString().TrimEnd();
}

Serialisation

The final piece of the puzzle is the Serialise method, which writes the entire script back out into a binary stream.

public void Serialise(BinaryWriter writer)
{
    var rawBytes = new List<byte>();
    foreach (var command in Commands)
    {
        if (command.Length > 1)
        {
            if (command.Length < 75)
            {
                rawBytes.Add((byte)command.Length);
            }
            else if (command.Length < 256)
            {
                rawBytes.Add((byte)OpCode.OP_PUSHDATA1);
                rawBytes.Add((byte)command.Length);
            }
            else if (command.Length <= 520)
            {
                rawBytes.Add((byte)OpCode.OP_PUSHDATA2);
                rawBytes.AddRange(BitConverter.GetBytes((short)command.Length));
            }
        }
        rawBytes.AddRange(command);
    }

    writer.WriteVarInt((ulong)rawBytes.Count);
    writer.Write(rawBytes.ToArray());
}

For multi-byte elements there’s some conditional code based on the length of the element, making sure that the PUSHDATA op-codes are put in correctly. Otherwise, the single-byte command is added directly to the buffer.

Once the script buffer is complete, its length is written to the stream as a variable-width integer, followed by the buffered byte array.

« Previous: Transaction Objects

Next: The Stack »

Leave a comment