Extremely fast memory manipulation in .NET

Recently, I've been updating my managed memory manipulation library called BlueRain to version 2.0. This includes some drastic increases in performance (which honestly is the only reason I'm even doing it to begin with), and much of the performance gain is owed to System.Runtime.CompilerServices.Unsafe, which was released about a month ago.

What is "Unsafe" anyway?

The CompilerServices.Unsafe library is part of .NET Core. It provides a single class, and the documentation states it

Provides the System.Runtime.CompilerServices.Unsafe class, which provides generic, low-level functionality for manipulating pointers.

Which may not sound very interesting at first, until you realize that this single type opens up a ton of low-level operations that can't be expressed in C#, such as pointer operations on generic types.

Optimizing reads

Before, when we had a method such as Read<T> which reads an instance of type T from a specified memory address, we'd have to either let the Marshal deal with it, which isn't exactly fast, or we'd have to do some dereferencing magic to dereference the pointer into an object, and subsequently cast the resulting object to T.

BlueRain used to contain an optimisation for this which looked somewhere along the lines of:

If none of the code paths are taken, we'd just have the Marshal deal with it. This makes one of the hottest code paths, i.e. Read<T> relatively performant, without too much clutter.

Still, that's a lot of code just to get what is essentially dereferencing a pointer into a generic type to actually work, and perform reasonably well. The interesting thing here is, pointer operations on generic types are actually supported in IL. We just can't express them in C#.

That's where the Unsafe type comes in. Using the methods in the Unsafe class, the above code becomes:

        fixed (byte* b = buffer)
            return Unsafe.Read<T>(b);

And that's it. Even for non-primitive types - as long as T doesn't require explicit Marshalling, this'll do the trick.

So what does Unsafe.Read<T> look like under the hood then? Well, it's just a couple of lines of IL:

.method public hidebysig static !!T Read<T>(void* source) cil managed aggressiveinlining
{
        .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
        .maxstack 1
        ldarg.0
        ldobj !!T
        ret
}

There's nothing magical about it, source is simply expressed as !!T. You can imagine that being quite a bit faster than the code we had before.

Similar methods are provided for various other memory related operations - writing, casting, etc:

    public static unsafe T Read<T>(void* p)
    public static unsafe void Write<T>(void* p, T value)
    public static unsafe int SizeOf<T>()
    public static T As<T>(object obj) where T : class
    public static unsafe void* AsPointer<T>(ref T value)

In BlueRain, the use of these methods has sped up the library as a whole by a huge margin, as displayed by the benchmarks further down this post.

SizeOf

BlueRain previously obtained the size of an object in memory by calling Marshal.SizeOf. When using Unsafe, we can simply use the SizeOf IL OpCode instead. I'm not entirely sure what the differences between the two are in great detail, except that it's very likely the Marshal will contain some additional code to handle MarshalAs directives, whereas Unsafe.SizeOf will simply return the size of the value type in memory, without any bells or whistles.

Needless to say, Unsafe.SizeOf<T> is a lot faster than the Marshal's equivalent, and except when any Marshalling actually occurs, should be preferred.

What about Marshalling?

As outlined above several times, sometimes there's a need to Marshal native data in a specific way, through the use of the MarshalAs attribute.

In BlueRain, I've decided that whenever a type requires explicit marshalling, we'll choose the old, slow way, and have the entire operation run through the marshal instead. This simply means that we'll need to check whether the type we're handling contains any of these directives, and if so, have the Marshal deal with it:

But this'll deal with all types of structs just fine.

So, how fast is it?

Not entirely sure yet. It looks to be really fast though. I've written some really, really basic benchmarks that simply benchmark the Marshal vs. Unsafe:

Now of course, these benchmarks aren't very representative. You wouldn't have to run simple types such as int through the Marshal - you'd just *(int*) value it instead!

Still, BlueRain used to be really lazy and ran everything you threw at it through the Marshal, so if anything, these benchmarks are representative in terms of the performance you'll get when doing tons of reads with BlueRain v2 vs v1:

Marshal read benchmark completed - 00:00:01.2060855 elapsed.
Unsafe read benchmark completed - 00:00:00.0295344 elapsed.

Marshal write benchmark completed - 00:00:00.1731855 elapsed.
Unsafe write benchmark completed - 00:00:00.0120395 elapsed.

Yeah.

What's the catch?

Obviously, there has to be a catch. That catch is that you can't reliably use this on reference types, meaning if you're looking to use this on an array of value types, you're fresh out of luck for now. For BlueRain, this isn't a huge issue as 99% of the stuff that you'll be reading from and writing to other processes will be value types either way.

On top of that, some mild breakage of the safety boundaries the CLR provides may occur if you are being reckless. For instance, consider what would happen when you write to the backing buffer of an immutable System.String. Suddenly strings aren't so immutable anymore.

Conclusion

Either something's very, very wrong with my tests, or this stuff is insanely fast, and opens up a ton of possibilities to those who are careful and know what they're doing.

BlueRain v2 is coming soon, and though it was supposed to contain a bunch of other cool stuff, I think performance will be its biggest feature.

Thank you for reading!