To write an application, you don’t have to know very much about how .NET manages memory. To write an application that performs, it will be crucial.
In this post I will focus on the memory management / Garbage Collection aspects of a .NET application. Too many times when I have been thrown into a performance troubleshooting case, I have seen code written with no clue of what I’m about to cover in this post. At some point, enough is enough! It’s time to spread the word. 🙂
.NET and any language you use with it to generate MSIL code, be it C#, VB.NET or even BrainFuck.NET, will result in code that requires a runtime to execute. The runtime, called the Common Language Runtime or CLR, will (simplified):
- load the assemblies and ensure that dependent assemblies are loaded, too
- do JIT compiling, which is the process of translating the .NET MSIL (platform independent Intermediary Language) into machine code that runs well on the current CPU. This step may be prepared by using ngen.exe
- maintain the application’s memory and do Garbage Collection
Microsoft has a pretty good write up of how the automatic memory management works. In this post, I try to focus on the most important aspects of it.
Obviously, the memory footprint for an application depends on how much memory it uses. But it also depends on how it uses the memory. It turns out that how you use the memory impacts not only the amount of memory allocated from the operating system, it also affects how much CPU time your application spends managing it.
Memory allocation / GC basics
The typical case when an object is new-ed up, a block of memory big enough to fit the object is allocated from the managed heap. This heap is divided into three regions for reasons related to performance of the garbage collection process. These regions are called generations, so they are usually called gen 0, gen 1 and gen 2. All new objects are allocated on gen 0.
As you keep allocating objects, the gen 0 fills up. When there’s no more room in gen 0, there’s a couple of possible actions that can be taken:
- Perform a garbage collection of the full generation
- Compact the heap, or “defragment” it; this will maximize the size of the largest free block.
- Extend the size of the full generation so to provide more room and reduce the frequency of garbage collections of that generation.
The process, simplified, can be said to be: allocate on gen 0, if full, perform a garbage collect of gen 0. Any objects still referenced from active threads or static variables (directly or indirectly) are compacted and moved to gen 1. If gen 1 cannot fit the surviving objects from the garbage collection of gen 0, the same set of actions are evaluated but this time for gen 1. The last generation to move objects to is gen 2, and it will become the application’s largest heap region.
An exception to this rule, and also the central piece of this article, is the Large Object Heap.
The Large Object Heap
In any application, its developers assume certain use case(s) and optimizes for it. This applies also for Microsoft developers.
The Large Object Heap stores the objects that are larger than a specific threshold. The magic size is 85000 bytes, exactly. The assumption seem to have been:
“If someone goes through the time and trouble of allocating such a large block of memory, they will never want to throw it away immediately. Since the process of moving memory blocks around the memory is expensive, let’s allocate it in one place and do not ever move it. Do not collect it either, unless we’re doing a garbage collect of everything.”
I didn’t quote anyone I know. The quote is made up. It just feels like something similar must have been in a CLR GC developer’s head. But it means a whole lot to how an application needs to be made.
Any time an allocation is made from the Large Object Heap (LOH), the time until the next full GC moves closer. This is because when it is full, it will trigger a full garbage collect of the entire application. If you have a large application, the time it takes to complete it may be substantial.
To demonstrate this, I created an application to measure the time it takes to allocate memory blocks of varying sizes. The application simply starts a timer, allocates a bunch of objects first just to make sure that the generations are filled up with something. This memory load is the same for each test. Then this loop is repeated Count times:
for (int i = 0; i < allocCount - 1; i++)
var o = new byte[allocSize];
I set the memory load parameter to about 128MB and then I tested different values for allocSize while keeping everything else the same. This is the results:
Each .NET process has only one Garbage Collector. If have an application pool (w3wp.exe) with multiple different applications inside of it and one of them misbehaves with memory allocations, all the other applications will be paying for it when the misbehaving one causes the Garbage Collector to run too frequently.
When the Garbage Collector runs, you will see the w3wp.exe process spike up to 100% on all available cores until each core is done collecting the heap that it manages. This was more obvious prior to .NET 4.5, when the server GC froze all user threads while the GC process was underway. Then it would look something like this:
Can you spot that there’s three and almost half of a fourth garbage collection visible on all cores? This is the task manager from a ten-core web server, spending most of its time doing garbage collections. You see the CPU usage synchronized because the GC process runs on all cores, freezing all user threads. The user threads on the other hand, each process incoming web requests with their own discrete arrival time, so the spikes wouldn’t be so synchronized if the load originated from the execution of web pages.
In .NET 4.5, they launched a new server GC that makes this harder to spot. There are still many good counters available with Performance Monitor.
What does this mean?
You have to be careful when you design your application, so that your application will not routinely allocate-and-discard LOH-allocated objects.
These items occupy more than 85000 continuous bytes:
- A byte array longer than 85000 bytes (doh!)
- A string longer than approximately 42500 characters (because the string is UTF-16-encoded internally so each non-surrogate character takes two bytes. Read Jon Skeet’s article if you’d like to understand Unicode better.)
- An object array with more than approximately 10625 elements for 64-bit processes, or 21250 elements for 32-bit processes
While the list is pretty short, but the implication is wide:
- A MemoryStream with expandable capacity uses a byte-array under the hood. When the MemoryStream is written to and its capacity needs to be expanded, it starts out from 256 bytes and doubles in size each time more space is needed. A MemoryStream with its backing byte array doubling in capacity from 65536 bytes to 131072 bytes will have the new array allocated on the LOH.
- A Dictionary/HashTable with a large number of objects uses arrays in its backing storage, and these will also grow larger than the LOH threshold as you fill it with content.
- If you’re building a string make sure you use StringBuilder, if applicable
- Don’t fool yourself “simplifying” the Stream-based APIs to something that takes strings/byte arrays instead. The Cryptography APIs don’t encrypt/decrypt whole files in memory for a reason, they provide an interface that allows doing it piece by piece.
- Be careful interfacing with web services, if it means you have to build huge strings in memory. Once I was troubleshooting a site with excessive LOH usage. It turned out they had a SOLR connector that was downloading 7MB byte-arrays, converted it into strings (14MB) and then parsed out a XML Document from it. Needless to say, the application took a break every so often to free up room on Gen0, Gen1, Gen2 and the Large Object Heap.
For some reason, developers thinks it easiest to do one thing at a time:
- read the entire file to memory,
- convert the huge byte array into a doubly-huge string (the bytes are frequently UTF-8 coded, and making it into a string typically doubles the size)
- get a huge array of lines by splitting them by NewLine…
It’s smarter to use a construction like this one:
private static IEnumerable<string> ReadLineFromFile(TextReader fileReader)
while ((currentLine = fileReader.ReadLine()) != null)
yield return currentLine;
And then, when you have a huge file:
foreach(var line in ReadLineFromFile(File.OpenText("LargeTextFile.txt")))
Or as dad always says:
“When you eat an elephant, cut it into small pieces first.”