Avoiding Array Pitfalls in C# .NET
Efficient usage of basic types in C#
Simple but smart strategies you can use to optimize C# code by understanding how arrays behave on the .NET runtime and choosing the correct features for the job when dealing with arrays in application hot paths.
Avoid Empty Array Allocations
In .NET, arrays are objects that can’t have their size changed after creation. In essence, that means that allocating multiple instances of empty arrays serves no real purpose. Those instances will only pollute the runtime and, at some point, will need to be collected by the GC.
These allocated instances can generate application inefficiencies if the number of allocations and deallocations is large enough.
This will eventually lead to extra garbage collection work:
// avoid allocating empty arrays
var emptyIntArray = new int[];
When an empty array is allocated, no elements can be stored in it. So why bother allocating the arrays in the first place?
The alternative: Array.Empty<T>()
Array.Empty<T>() makes usage of a static read-only property to create only one single instance of the array during the entire lifecycle of the application.
This strategy avoids having multiple empty array instances of the same type, saving memory and extra GC work collecting unneeded objects.
Here’s the syntax for it:
// gets a reference to a
// static readonly instance of an empty array
var emptyIntArray = Array.Empty<int>();
Since we now only ever deal with one single instance, we can achieve a zero allocations measurement when benchmarking Array.Empty<T>
against new []
for any number of invocations:
| Method | reqs | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------- |--------- |--------:|--------:|------:|----------:|
| Array.Empty<T> | 100 | - | - | - | - |
| new [] | 100 | 0.7648 | - | - | 2400 B |
| Array.Empty<T> | 1000 | - | - | - | - |
| new [] | 1000 | 7.6447 | - | - | 24000 B |
| Array.Empty<T> | 10000 | - | - | - | - |
| new [] | 10000 | 56.3965 | 18.7988 | - | 240000 B |
Bottom-Line: Avoid allocating new empty arrays. Use Array.Empty<T>() instead.
Use ArrayPools for large arrays
For applications that create thousands of new instances of large arrays on critical paths, it may be worth looking into using ArrayPools.
By creating new instances of large arrays on the heap, an application may experience high pressure or multiple pauses during its GC process.
ArrayPools provides a way to prevent these pauses by taking advantage of reusable arrays of a specified generic type <T>. The ArrayPool class provides us with a Rent method that retrieves an array from that pool with the specified minimum length ready to be used by the caller.
The main job of ArrayPool is to avoid GC pressure by reducing the number of large array allocations and deallocations by an application.
Here’s the syntax for it:
// gets a reference to the shared pool of arrays
var shared = ArrayPool<int>.Shared;
// requests an array of minimum size 1000
// from the pool of arrays
var rentedArray = shared.Rent(1000);
// do whatever you want with the rented array
// ...
// ...
//return the array to the pool!
shared.Return(rentedArray);
- It’s is extremely important that we return a rented from to its pool and not use it anymore. Not returning an array to the pool may decrease your application performance, and using an array that is already returned to its pool may lead to runtime errors and crashes.
Bottom-Line: Avoid allocations and deallocations of large, long-lived objects and reuse them instead.
Array methods and their LINQ Counterparts
One of the key things that you can notice when reading the .NET Roslyn compiler contribution guidelines is the following statement:
DO avoid allocations in compiler hot paths:
- DO avoid LINQ
Yet, it’s not uncommon to see LINQ methods sprinkled beautifully all over some codebases.
*And to be honest, I’m not even certain that the argument of more readable is valid in some cases, but that’s a discussion for another article! 😉
Back to arrays … You may not need methods of the LINQ namespace if you have a concrete array instance at hand.
You may want to benchmark it. But in general, using LINQ extension methods on hot paths tends to lead to code inefficiencies under high loads, mainly due to boxing and unboxing between array and interface types.
On top of that, the array class already provides many features out-of-the-box like copying items, filtering, searching, sorting, counting, etc. So if you are not familiar with its interface, it may be fun to take some time to read its source code or the official Microsoft documentation.
Here’s a measurement of the difference between the LINQ Count & Any methods against an Array.Length property:
| Method | size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |------ |-----------:|----------:|----------:|-----------:|------:|------:|------:|----------:|
| Linq.Any() | 100 | 10.1213 ns | 0.1264 ns | 0.1056 ns | 10.1113 ns | - | - | - | - |
| Array.Length | 100 | 0.0108 ns | 0.0093 ns | 0.0087 ns | 0.0097 ns | - | - | - | - |
| Linq.Count() | 100 | 10.3051 ns | 0.0966 ns | 0.0807 ns | 10.2964 ns | - | - | - | - |
| Linq.Any() | 1000 | 10.2876 ns | 0.2367 ns | 0.2631 ns | 10.1690 ns | - | - | - | - |
| Array.Length | 1000 | 0.0040 ns | 0.0059 ns | 0.0056 ns | 0.0000 ns | - | - | - | - |
| Linq.Count() | 1000 | 10.6900 ns | 0.1066 ns | 0.0945 ns | 10.6803 ns | - | - | - | - |
| Linq.Any() | 10000 | 10.6557 ns | 0.1156 ns | 0.0966 ns | 10.6588 ns | - | - | - | - |
| Array.Length | 10000 | 0.0064 ns | 0.0156 ns | 0.0131 ns | 0.0000 ns | - | - | - | - |
| Linq.Count() | 10000 | 10.3191 ns | 0.0702 ns | 0.0586 ns | 10.3230 ns | - | - | - | - |
Bottom-Line: Avoid LINQ if you don’t need it.
Single-dimensional, Multi-dimensional or Jagged
We have three distinct types of arrays in C#: single-dimensional (aka. vectors), multi-dimensional arrays, and jagged arrays.
Each has its own specific syntax and ways in which the .NET compiler will translate your code. But in summary, the differences are:
- the CLR is specifically tuned to deal with vector arrays, and the .NET compiler has specific instructions for that purpose. These will overall be more efficient than non-vector arrays (multi-dimensional).
- A multi-dimensional array will always have a standard amount of columns per row.
- A jagged array is an array of arrays. In essence, that makes it a vector that can be used instead of a multi-dimensional one. If more than one dimension is needed, a jagged array is usually the preferred way to do it.
Here’s a benchmark where I’ve compared the access speed between jagged and multi-dimensional arrays:
// Benchmarking Array Access
| Method | size | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |------ |---------------:|--------------:|---------------:|---------------:|------:|------:|------:|----------:|
| Jagged | 100 | 7.375 us | 0.1372 us | 0.3313 us | 7.254 us | - | - | - | - |
| Multi | 100 | 15.845 us | 0.4850 us | 1.3994 us | 15.289 us | - | - | - | - |
| Jagged | 1000 | 671.952 us | 12.7468 us | 14.6792 us | 672.925 us | - | - | - | - |
| Multi | 1000 | 1,657.502 us | 31.9610 us | 38.0473 us | 1,657.785 us | - | - | - | - |
| Jagged | 10000 | 74,526.753 us | 1,262.3529 us | 1,889.4304 us | 74,101.414 us | - | - | - | - |
| Multi | 10000 | 163,848.207 us | 5,095.9800 us | 15,025.6119 us | 157,635.950 us | - | - | - | - |
Bottom-Line: Choose your types wisely.
Bonus: Enough fun syntax to make everyone happy
Arrays can be declared and initialized in many different ways, just so that we have enough reasons to have productive(…or not) programming language wars with our teammates and colleagues.
Here’s the standard declaration version:
// allocates an array of size 10
int[] X = new int[10];
Implicit type local var
We can use C#’s implicitly typed var
keyword, as long as the type is specified (or inferred, see bellow…):
// allocates an array of size 10
var X = new int[10];
Initialization during creation
Arrays can be declared and initialized with a single line:
// initializing values during creation
var X = new int [] { 1, 2, 3, 4 };
A shorter way
The compiler can automatically infer the type of an array’s elements during initialization. Therefore we can skip declaring the array type and still use the var
keyword.
// a shorter initialization
var X = new [] { 1, 2, 3 };
An even shorter way
It turns out we may not need the new keyword in the end:
// the `new` keyword is not really needed
int[] X = { 1, 2, 3 };
This one tho will cause your computer to explode 🔥 in tiny pieces
Unfortunately, due to decisions of the.NET team, the following code is invalid and will raise a compilation error:
// it doesn't compile
var X = { 1, 2, 3 };
Key Takeaways
- Never, ever again, allocate an empty array unless you have a reason to. ⛔
- Look for using ArrayPool when there’s a need for large arrays on hot paths.🚀
- Evaluate the cost-benefit of using LINQ methods over native type methods. ⚖️
- Not all arrays are the same. ◾️ ◼️ ⬛️
- Choose your favorite syntax and have a couple of language wars. You only live once. ✔️
- Always measure and make your own conclusions. ⏰