Justin Van Patten: Avoiding Unnecessary byte[] Allocations in HttpContent

Over the past year and a half I’ve been actively contributing to .NET Core, mostly chipping away at low-hanging performance optimizations and other minor improvements. I thought it’d be interesting to highlight some of that work.

Today I’ll be going over an improvement I made to HttpContent.ReadAsStringAsync() to avoid unnecessary byte[] allocations when detecting the content’s encoding.

Go @justinvp, go! I was ranting about GetPreamble and byte[]s in the web stack just last week! https://t.co/mtUebGPUP0
— Joe Duffy (@xjoeduffyx) July 24, 2015

When reading the content as a string, HttpContent attempts to use the encoding specified in the Content-Type header’s charset parameter (if present), otherwise it tries to detect the encoding by looking for a byte order mark (BOM) at the start of the data, falling back to UTF8 if no BOM is detected.

The way the encoding detection was previously implemented could result in up to 4 unnecessary byte[] allocations every time the response content is read as a string.

Here’s essentially how it used to be implemented:

private static Encoding[] s_encodingsWithBom =
{
    Encoding.UTF8, // EF BB BF
    // UTF32 Must be before Unicode because its BOM is similar but longer.
    Encoding.UTF32, // FF FE 00 00
    Encoding.Unicode, // FF FE
    Encoding.BigEndianUnicode, // FE FF
};

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    byte[] preamble;
    foreach (Encoding testEncoding in s_encodingsWithBom)
    {
        preamble = testEncoding.GetPreamble();
        if (ByteArrayHasPrefix(data, dataLength, preamble))
        {
            encoding = testEncoding;
            preambleLength = preamble.Length;
            return true;
        }
    }
    
    encoding = null;
    preambleLength = 0;
    return false;
}

private static bool ByteArrayHasPrefix(byte[] byteArray, int dataLength, byte[] prefix)
{
    if (prefix == null || byteArray == null || prefix.Length > dataLength || prefix.Length == 0)
        return false;
    for (int i = 0; i < prefix.Length; i++)
    {
        if (prefix[i] != byteArray[i])
            return false;
    }
    return true;
}

TryDetectEncoding loops over a static array of Encoding instances, calling GetPreamble() on each instance to get the encoding’s BOM, and then checks to see if the data starts with that BOM. The main problem with this is the use of GetPreamble, which always allocates and returns a new byte[] each time GetPreamble is called, which means we’re potentially allocating up to 4 byte[]s each time TryDetectEncoding is called.

One simple way to avoid these allocations would be to pre-allocate and cache each of the known preamble byte[]s.

Something like:

private static readonly KeyValuePair<byte[], Encoding>[] s_preambleEncodingPairs =
{
    new KeyValuePair<byte[], Encoding>(Encoding.UTF8.GetPreamble(), Encoding.UTF8),
    // UTF32 Must be before Unicode because its BOM is similar but longer.
    new KeyValuePair<byte[], Encoding>(Encoding.UTF32.GetPreamble(), Encoding.UTF32),
    new KeyValuePair<byte[], Encoding>(Encoding.Unicode.GetPreamble(), Encoding.Unicode),
    new KeyValuePair<byte[], Encoding>(Encoding.BigEndianUnicode.GetPreamble(), Encoding.BigEndianUnicode),
};

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    foreach (KeyValuePair<byte[], Encoding> pair in s_preambleEncodingPairs)
    {
        byte[] preamble = pair.Key;
        if (ByteArrayHasPrefix(data, dataLength, preamble))
        {
            encoding = pair.Value;
            preambleLength = preamble.Length;
            return true;
        }
    }

    encoding = null;
    preambleLength = 0;
    return false;
}

This is around 2.5x faster and avoids the repeated byte[] allocations. But it’s still looping through each preamble/encoding pair to detect the encoding. Can we do better?

Here’s what I ended up with:

private const int UTF8PreambleLength = 3;
private const int UTF8PreambleFirst2Bytes = 0xEFBB;
private const byte UTF8PreambleByte2 = 0xBF;

private const int UTF32PreambleLength = 4;
private const int UTF32OrUnicodePreambleFirst2Bytes = 0xFFFE;
private const byte UTF32PreambleByte2 = 0x00;
private const byte UTF32PreambleByte3 = 0x00;

private const int UnicodePreambleLength = 2;

private const int BigEndianUnicodePreambleLength = 2;
private const int BigEndianUnicodePreambleFirst2Bytes = 0xFEFF;

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    if (dataLength >= 2)
    {
        int first2Bytes = data[0] << 8 | data[1];

        switch (first2Bytes)
        {
            case UTF8PreambleFirst2Bytes:
                if (dataLength >= UTF8PreambleLength && data[2] == UTF8PreambleByte2)
                {
                    encoding = Encoding.UTF8;
                    preambleLength = UTF8PreambleLength;
                    return true;
                }
                break;

            case UTF32OrUnicodePreambleFirst2Bytes:
                if (dataLength >= UTF32PreambleLength && data[2] == UTF32PreambleByte2 && data[3] == UTF32PreambleByte3)
                {
                    encoding = Encoding.UTF32;
                    preambleLength = UTF32PreambleLength;
                }
                else
                {
                    encoding = Encoding.Unicode;
                    preambleLength = UnicodePreambleLength;
                }
                return true;

            case BigEndianUnicodePreambleFirst2Bytes:
                encoding = Encoding.BigEndianUnicode;
                preambleLength = BigEndianUnicodePreambleLength;
                return true;
        }
    }

    encoding = null;
    preambleLength = 0;
    return false;
}

If the data is at least 2 bytes in length, treat the first two bytes as an int and switch on that value. From there, check any remaining preamble bytes, and if there is a match, set the detected encoding and preambleLength parameters and return true.

This approach is around 10.5x faster than the original implementation and avoids all unnecessary allocations.

The new implementation isn’t quite as straightforward as the original approach, but in this case the tradeoff is worth it for the reduced memory allocations and improved speed in the HTTP client stack.

Questions or comments? I'm @justinvp on Twitter.

May 9th, 2016