S08E18 - Measure Twice, Cut Once: Benchmarking, Hot Paths and the Chainsaw of Unsafe Code with Szymon Kulec

Embedded Player

The Modern .NET Show

S08E18 - Measure Twice, Cut Once: Benchmarking, Hot Paths and the Chainsaw of Unsafe Code with Szymon Kulec

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Summary

What separates a “regular” .NET application from one that pushes the runtime to its limits? In this episode, Szymon Kulec, Lead Developer Advocate at RavenDB, joins Jamie for a deep dive into systems programming in .NET.

Together they walk the sliding scale from everyday business code to the high-performance world inhabited by databases and libraries: pre-allocating lists, using stackalloc and spans, working with refs and “borrowed” data, and the rare cases where you might create your own thread or drop into unsafe code.

Along the way they cover the practical tooling: starting with dotTrace and dotMemory for profiling, graduating to BenchmarkDotNet for micro-benchmarking hot paths, and using load-testing tools like JMeter to find your system’s saturation point: the “elbow” on the throughput curve.

Szymon also pulls back the curtain on RavenDB’s Blittable JSON format, automatic indexing, and why the .NET team itself relies on BenchmarkDotNet to catch performance regressions release after release. The recurring theme: measure twice, cut once — profile first, optimise where it matters, and keep a benchmark suite around to prove you’re moving in the right direction.

Both avoid the word “ensure” and stick to UK English. Want me to write them into the file?

Episode Transcription

The first measurement could be actually someone from the, so-called business, stating that, "oh gosh, this is so slow." That would be like the coarse grained measurement that you can sometimes receive for free.
- Szymon Kulec

Hey everyone, and welcome back to The Modern .NET Show; the premier .NET podcast, focusing entirely on the knowledge, tools, and frameworks that all .NET developers should have in their toolbox. I’m your host Jamie Taylor, bringing you conversations with the brightest minds in the .NET ecosystem.

Today, we’re joined by Szymon Kulec to talk about systems programming in .NET. But that’s just the surface level detail of what we talk about. We do a deep-dive into some of the corners of .NET and C# that a lot of engineers hardly ever get the chance to cover. This is more than your standard, surface level conversation about C# and .NET.

Maybe sometimes you will actually create a thread. Something that you don’t do nowadays in .NET, because you know what you are doing and you want to own the specific thread for or your own specific purpose.
- Szymon Kulec

Along the way, we talked about how developers who are using C# and .NET should think about learning the deeper levels of the language and how things work under the covers. Knowing how the JIT works with your code will help you to write more performant code, for sure.

So let’s sit back, open up a terminal, type in dotnet new podcast and we’ll dive into the core of Modern .NET.

Jamie : Simon, welcome to the show. We’ve been connected over email for a little while. We’re doing this in the present for us, which is the past for the listener, but when they hear it, it’ll be the future, which is complete, utter nonsense compared to what we’re actually going to be talking about today.

Szymon : Nice to meet you, Jamie.

Jamie : Nice to meet you too, Simon. Welcome to the show. Before we get to the topic at hand, I wonder, would you be able to introduce yourself to the listeners a little bit about you, like the work that you do and things like that?

Szymon : Yeah, definitely. My name is Szymon Kulec. I use Scooletz as my nickname, whether it’s GitHub, Twitter, or any social media. I’m a lead developer advocate at RavenDB, and I like my data to be served really, really fast.

So what do I mean by that? Either I’m thinking about removing some allocations here or there, or maybe applying some low-level stuff like vectorised computation. This is a particular area that interests me a lot. Fun fact: the very first version of .NET that I used was 1.0.

Jamie : So I’m that old. That’s amazing. I can’t confess to having used such an early version of .NET. I think the first version I used was one of the two, but like .NET Framework 2 point something, which was part of the university course that I did.

Fun fact for me: we didn’t use Visual Studio. The lecturer was a huge fan of the command line, and what he would do is stand at the front of the lecture theatre, plug his laptop into the projector, open up a command prompt, and do notepad space myfile.cs, then hit return.

Szymon : Oh gosh.

Jamie : Yeah, then write all of his code in Notepad, go back to the terminal and type in csc space myfile.cs, and that was the C# compiler that you could invoke from the command line. This is awesome.

Szymon : I mean, much better than writing the intermediate language manually, right?

Jamie : A hundred percent, yeah. But I remember I asked him — he’s the famous Rob Miles, so if anyone is familiar with MVPs, he’s one of the MVPs, or was, or whatever. Pretty famous anyway. I asked him once, “Why are we not using Visual Studio in the lectures?” And he said, “I’m not teaching you Visual Studio. I’m teaching you C#.”

Szymon : That’s the way.

Jamie : Yeah, exactly. I feel like it set me up with the right skills for debugging and for being able to build CI/CD pipelines, right? Because they’re all command line there.

Yeah. So you’re interested in making data happen quickly. Can we talk about that for a moment, just real quick? Because, for instance, I know that you work with RavenDB and they do amazing things with data. I’m just wondering, is SQL Server just not fast enough for you? Or is it a case of, you know, “I wanted to do what I can to make things faster”? Where did the impetus for “I need data quickly” come from?

Szymon : With Raven, the story is to make it as simple as possible for the end user, meaning any developer. As this is a document database and it speaks JSON, you put JSON into it and then internally it does a few things to the JSON so that it is, A, stored in a very nice manner, B, can be retrieved really, really fast, and C, you can think of it as sprinkled with all the distributed systems stuff, meaning consensus algorithm, replication, and so many, many more things.

So if you take that into consideration: easy to use, but at the same time really, really fast.

Jamie : So you’re not just making something that is super fast. You’re making it more easy to use for the user. Like someone like me, perhaps. That must be a difficult thing to do, because I know that — I’m going to use a really silly example — that time when you meet a developer who writes the greatest, for instance, the greatest LINQ query ever. It is super fast, zero allocations, but no one can read it and no one can debug it.

How do you walk that line of something that is incredibly fast and capable, but also can be used by a human that doesn’t have superhuman IQ powers?

Szymon : Yeah, that’s a good question. I think there is a line, and maybe even a few lines, that you can consider. One of them is clearly the client side or the server side. So if you take that into consideration, as I mentioned before, with regards to RavenDB, what looks to the external user as a JSON document that you can parse with Json.NET, whatever you want, internally it is a bit more complex, but at the same time much, much more efficient to process, index, and store.

I think picking the right level of abstraction is probably the first step to entering the systems programming space.

Jamie : I have so many thoughts and so many questions about how it looks to the end user like JSON, but it actually is a lot more complex, but way easier to deal with. Like, what? I get the feeling I missed a point. There’s something missing there.

Szymon : Yeah, so let’s consider a simple document, because at its core RavenDB is a document database. You may put any JSON of any structure. Imagine that you have a JSON document that represents a person that has an embedded address.

Usually, in regular databases, you would either split this address into multiple columns, or, for example, if a person could have multiple addresses, you would create an additional table called addresses and then you would link to the person record using a foreign key. So far so good?

Jamie : Yep, yep. That’s your standard referential database, right?

Szymon : Yeah, the standard stuff. So let’s go back to this JSON document, because now we are in this document-oriented world. You want to store it, and then you want to issue a query that will query for all the documents that have this address city equal to London, let’s say.

Now, if you think about it, with the JSON, to search for such a document you would need to actively traverse the whole document because you don’t know where this address is, right? You need to traverse, search for a property that has the name of address, and then you can search for the city, and then you can extract the value of the city. So that would be the usual JSON SAX approach.

With Raven, we use a custom special format called Blittable, which allows you to directly jump to the specific property. So if you issue any query, and if you combine that with auto-indexing capabilities, this puts you in a really nice place because, A, there is automatic indexing, and B, there is the capability of accessing the property directly.

Jamie : Does that mean that your indexing becomes the bottleneck? Because if you’re not indexing all of the properties, then you can’t easily jump to the records with that. Then you have to fall back on, I guess, parsing through each document all the way to the end, right?

Szymon : That would be the case if RavenDB required you to create indices. But again, with regards to indexing, RavenDB does the job for you. If you issue a query, it automatically indexes all the data.

So again, you put one thing on top of the other. From the client point of view, you just get the JSON and you can query it across or against an arbitrary condition. Internally, we change how we process and store things so that it is aligned to the needs of the user issuing various requests to the database.

Jamie : Yeah, my mind has just exploded. Databases were never something that I properly studied, so I’m always playing catch-up with how they’re all put together and how they all work. So whenever someone talks databases to me, my brain just explodes anyway. But that’s really cool.

Szymon : Even on the surface, I hope that the premise sounds nice. You can put an arbitrary JSON in there, you can ask for all the documents that match a specific criterion, and then you get them returned.

Jamie : Yeah. The only way that you could actually do that is either to hold off performing the query, I guess, until it has been indexed, or, like you say, just ask, “Has this been indexed?” No. Okay, then we’ll index it now and then we’ll return the response, which then means that the next query would be faster. Presumably. Unless you’re doing some parallel SIMD stuff.

Szymon : Oh yeah. I’m not saying that we are not doing any fancy algorithm to perform such an indexing operation. But in general, in principle, yes, this is the thing. If you consider any system, usually it will be 100, 200, maybe 500 types of queries that you issue. So once a specific shape of a query is observed and remembered to be indexed and to be provided with a special structure that supports such a query, then everything should be really, really fast.

Jamie : Yeah. Almost like it’s learning as you’re querying. That’s really cool. That’s really cool. I mean, that’s not what we’re here to talk about, but that’s really cool.

Okay, well, we’ve talked about that, and maybe we’ll have to come back to that topic another day. We were going to talk about systems programming. I am more than happy to pivot and just sit and listen to you talk about databases and RavenDB, but you know, folks may have clicked on this and gone, “Ooh, systems programming. What’s that? That’s building operating systems, right?” So my question to you is: that’s building operating systems, right?

Szymon : Unfortunately, that’s not the case. Let me elaborate. My way of thinking about how you approach systems programming is that, for example, imagine you are building just a regular application. Then it becomes slower because of some allocations, some logs. Maybe you spend a bit of time profiling it and improving here and there.

You may perceive that as a sliding scale. Now you’ve stopped developing the regular business logic and you need to focus on introducing these performance improvements. As usual, there are degrees to that. You may go with — I know, for example, one of the tricks — you pre-allocate the list, the good old-fashioned generic list, to be of capacity equal to the number of items you will put in there. This pre-allocates the stuff underneath, and you are happy because it doesn’t resize as you add more things to it. So that would be one thing.

You can think of many things such as that. For example, if you’re into the performance area, that would be stackalloc, where you can allocate chunks, spans on top of the stack so that you don’t pollute the heap. You can introduce more and more of such optimisation patterns, and finally you bring it to the point where it’s not that much about the business logic any more. Of course, it still needs to support the business logic, but it’s more about defining resource ownership, knowing how much you can do, how much memory you can spend here and there.

So that would be systems programming in my perception: pushing this — how do you call it — a sliding scale towards more and more ownership rather than just thinking, “Oh, that works.”

Jamie : A standard, in bunny quotes, .NET development application would be: “I’m going to say to .NET’s system, allocate all the things for me, do the memory management for me, handle that all for me.” Whereas if I want to think systems programming, I’m probably going to be thinking, “How am I either fully taking ownership of that journey, or how can I use the language features and constructs to help me to better control that?"

For instance, maybe I’m doing something that requires me to load a lot of things into memory. Maybe I’m in a video game engine and I’m loading all the geometry for a stage, for a scene, for a level, for whatever. That’s a lot of data. I probably want to be in control of how that’s being loaded, because I don’t want the garbage collector to fall over quite a lot whilst it’s trying to process my level, my area, my scene.

So we’re saying that systems programming, in one thought, would be: how do I take over the control of that, to take ownership of my loading and unloading? Am I a million miles away, or is that what you’re saying?

Szymon : No, no, definitely 100%. This is one of the buckets. There are more buckets. The other one that we could add on top of what you just said could also be the computation.

So what’s the usual story with .NET? You don’t create threads. There is the thread pool. It’s heavily optimised, so for 99% of cases this is the tool that you will use, because it’s so well optimised. There are all these hill-climbing algorithms and everything; it should be just fine.

But again, as we are pushing this toggle and we are moving into the systems programming area, maybe sometimes you will actually create a thread, something that you don’t do nowadays in .NET, because you know what you are doing and you want to own the specific thread for your own specific purpose. So that would be another area that I would also recall to the space of systems programming.

Jamie : Okay. I’m going to show my lack of understanding, I guess, of C and C++ and non-managed languages. I’m thinking it’s a little bit more like C and C++, where I have to explicitly say, “Create this thing in this way, give me a pointer to the address” — check me out talking about pointers on a .NET podcast — “a pointer to the address so that I can then release that at some point in the future, but I want you to be able to handle it in this way.” Give me the chainsaw so that I can cut down the tree very quickly, but also so that if I slip, you’re going to call me Lefty from now on.

Szymon : That’s a valid picture to imagine.

Jamie : It’s a violent analogy, I’ll put it that way, but it’s an analogy, right?

Szymon : Definitely. You mentioned the third part which I haven’t mentioned yet, which is unsafe code in C#. So this clearly also comes into the picture when discussing systems programming. Sometimes you will either use refs — meaning now we have this by-ref passing thing and you can obtain a reference to almost everything in .NET. But sometimes you will also use just the good old-fashioned pointers. Again, this falls into, “Yes, I know how to handle that, and I know that I’m responsible for the ownership of that particular thing.”

Jamie : Yeah. You’re unlocking memories of earlier in my career. I did study at university — we did computer science, and the first year was C# and .NET. Then everything after that was, “You can do it in C# and .NET if you want, or we can talk you through pointers and pointer arithmetic and all that kind of stuff, and you could do this in C and you can destroy the machine that you’re using."

So you’re unlocking memories of me getting things wrong and bringing down my system. I’d somehow got a handle to a pointer to an application I shouldn’t have done, because this was the early 2000s and whilst protected memory was a thing, I’d managed to somehow get a pointer that was not to the object that I was manipulating, but somehow to another application. That’s always fun.

Szymon : Yeah, yeah. Clearly it resembles good old-fashioned C/C++. Again, as we mentioned before, yes, you need to be careful when you play with that. But at the same time, if you want to push a system that is required to perform really, really well — right now I’m talking about a database; it needs to be fast, otherwise it won’t be a good database — there will probably be cases where you will need to take this chainsaw and be mindful of having two limbs at the end. Yes.

Jamie : Yeah. I suppose perhaps for the majority of listeners, this is maybe something they don’t need to always think about. I’m not saying they don’t need to think about it at all, but they don’t always need to think about it, because the majority of the work that I’ve ever done in my career has been: here’s a database, ask it for some data, show the user the data, and now the user will make a change to that data and write that data back. That’s the majority of the work that I’ve done.

I have worked in some regulated industries and some things where folks needed data streamed to them in that moment. If folks want to check it out, you can go through my LinkedIn and you’ll find out which ones.

I guess it’s always: how do I know — and I think we’re jumping around a bit because of me, I do apologise — but how do I know if my application needs the chainsaw? Maybe we shall stop calling it the chainsaw, right? But how do I know whether my application needs me to put the effort into creating a whole bunch of code that no one’s ever going to be able to read? I’m just going to put a whole big bunch of comments at the beginning of this block of code that says, “Don’t ever change this, because it runs really well in production, but when I leave, I’m taking the knowledge with me.”

Szymon : I see. Yeah. So clearly we should apply Pareto here, probably. You can get a lot of benefits just by applying the regular ways of working with .NET, meaning you won’t ever create a global lock that will lock every single request. This is the usual way, and this will bring your application to, let’s say, 90% of either potential or performance.

The question that you need to ask is: do you need to have more? Again, the answer depends on what you are building. For example, with RavenDB, one of the cases is that RavenDB can be provisioned as a cluster in a public cloud. In that case, you have a machine which has a spec, meaning it has, let’s say, 128 gigabytes of RAM, a nice disk and a good CPU. This is what the client pays for, but there is also a database that needs to support everything.

So now the question is: the cost of a database is X on top of the machine — is it beneficial for the user to pay on top of the machine? This is quite a different environment. I mean, the databases, right? Eventually, the client pays for the performance, and if they can go with 128 gigabytes of RAM instead of 256, the machine will be much cheaper.

Jamie : Yeah, no, you’re absolutely right. I think before anyone can put the effort into figuring out whether they need all of this extra effort, they need to know what their limits are, right? If you have a client who has a frankly massive machine, and you are hosting an app that, I don’t know, just writes reports to PDF, then you could probably rely on .NET to do most of the stuff for you, couldn’t you?

You need to think pragmatically, right? If you are consistently running out of RAM on that machine, or you are getting to a point where something isn’t working properly, then yeah, you probably do need to investigate all of the performance enhancements that you can put in place, right?

Szymon : Definitely, 100%. Let’s go back to the example that you just shared, meaning there is a PDF report that is created and there is not enough RAM. What would be the first step? Probably you would run a memory profiler to investigate why the given amount of RAM is not sufficient. That would be the first step.

My way of thinking of it is: okay, my app is now at 80 or 90%. If I investigate a bit, it will be 95%, and then you investigate a bit and push forward to 97%. The question is, are you in this 99% area? Usually you are not, and usually you’re just fine with following the good patterns: not being over-allocatey, not creating one lock for the whole system on that many cores that are provided. Usually this is not the case, that you need to bother yourself with these systems programming topics.

But once you are into the area of bringing 99.9% of cases as fast as possible, and processing data as fast as possible, this is the tooling that probably you will use and bring to solve the issues and to make it as fast as possible.

Jamie : Okay. So what is that tooling? Let’s say I’m writing an app and I’ve listened to this recording, and I’m thinking, “Oh, I know, I’ll do exactly what they said on the Modern .NET Show. I’ll just make my app faster, because that’s all I need to do, right?"

There’s that thing that comes from engineering around “measure twice and then cut once”. Otherwise you’re going to — feels like I’ve been really violent all the way through this recording — but if you are physically chopping a piece of, maybe, some wood or some metal, you need to make sure you’re chopping it to the correct length. Otherwise you’ve got to go buy more metal.

I suppose what I’m getting at is: are we benchmarking to make sure we’re in that 99.99% of cases? Or should we just go, “Do you know what? I want to learn how to do this, and I’m going to do it on production, when I’m being hired by the people who are paying me to build this. I’m just going to waste time and make things — try and make things faster whilst making them unmaintainable”, right?

Szymon : Yeah. You mentioned already the measurement part, and this is really important. The first measurement could actually be someone from the so-called business stating that, “Oh gosh, this is so slow.” That would be the coarse-grained measurement that you can sometimes receive for free. This is a nice input, in regards to the fact that, okay, I know that this is so sluggish that someone is perceiving it as terribly slow.

So that would be the point where you can enter and try to first understand what’s the particular case that they are talking about, and then profile. This is the usual first step, where you either profile memory or profile performance. There are so many great tools out there to profile. Also, .NET itself has performance counters and all that stuff. So this is the first thing.

If we are talking about a regular business application, that could be the thing. If you are in a public cloud environment, sometimes the bill will tell you that, “Oh, this requires some optimisations”, right? Because, for example, you pay X thousand dollars a month for that particular service, and maybe it’s just worth spending an hour or two to try to understand it. So that would be the first step of measurements.

But this is a generic step. If we want to push it a bit further, as I think we are doing during this whole podcast — we are pushing it a bit further — we would move into the benchmarking territory. The best tool out there, that is even used by the .NET team themselves, is BenchmarkDotNet. This is a micro-benchmarking tool. So this is the systems programming area, where you measure and verify that a small chunk of code can deliver sustainable results depending on the data.

Jamie : Okay. So we go ahead and we get benchmarking on it. We do that micro-benchmarking. So I’m looking at maybe a single method? Or am I thinking a unit? Am I thinking an integration?

What I’m thinking is, I don’t want to just fire up BenchmarkDotNet and tell it to start my app at composition root at Program.cs, then perform a bunch of actions and say, “Cool, give me all of the stats”, right? Is that going to be useful? Or am I making an assumption there that it wouldn’t be useful to benchmark the entire application suite, versus one very specific portion of it? Or am I thinking that I shouldn’t be benchmarking a small portion of it; I should be benchmarking the whole thing? I feel like there’s perhaps an odds to this, right?

Szymon : Yeah, yeah. Let’s go with this PDF report creation example that you came up with. Let’s imagine that we just profiled it. We ran either dotTrace or some other tool from JetBrains, or some other available tool. Then you notice that there is a particularly small method that is responsible for 80% of this report creation. This is usually the case: you will find that there is a small chunk of code responsible for a ridiculously big piece of the pie of the execution time.

Let’s say this is maybe five, ten, twenty lines of code. That could be something that, if you are into amending the PDF creator tool, you could wrap up in a micro-benchmark, which would execute this particular small chunk of code.

If I may jump and spend one more minute on micro-benchmarking with BenchmarkDotNet: the thing with this tool is that it’s amazingly written. The reason for me labelling it as beautifully written is that it performs all the statistical work for you. It will warm up the code, meaning it will run it tens, thousands of times before actually running the benchmark. Then it will repeatedly run the benchmark, and it will gather all the statistics, like the mean time, standard deviation, allocations, and that stuff.

So now, again, usually if you are not into the land of building a library or a database, a core part of the system, you would say, “Okay, I’m good with profiling.” Usually this is the case. You just profile, you notice some behaviour, you augment it a bit. But if you want to have a repeatable set of benchmarks that are run, for example, with every release, and you are delivering a library, a database, a core service, and you have some algorithmic work — or you basically need a small piece of code to run as fast as possible — probably you would enter the micro-benchmarking world, where the best tool that I use is BenchmarkDotNet.

Jamie : So we’ve got my piece of code, maybe my PDF generation code, or PDF editing code, or whatever. I’ve done some dotTrace stuff with it. I’ve looked at BenchmarkDotNet. I’ve seen that the results show me that my application is first warmed up; we’ve run it a bunch of times just to make sure that the jitter has done what it needs to do, because in modern .NET land the just-in-time compiler will do some stuff for you: pre-compiling, recompiling, working with the hot paths, figuring out all that kind of stuff.

I’ve got my results, and now I’m looking at a couple of tables of results. I’m looking at maybe allocations. I’m maybe looking at the amount of time it takes to run a certain process. I’m looking at maybe my data sizes — maybe I can drop those.

What am I looking at, and how am I going about this — almost going to have to be violent again, I’m sorry folks — it’s almost like I want to be surgical with how I’m going to be making these changes, right? I don’t want to just go, “I will make everything 1000% faster by rewriting the entire application.” I probably want to focus on a very specific small area first, right? It’s not going to work if I say, “I’m going to tear down my house and rebuild it and everything will be better”, right? I want to look at: what’s the one smallest thing? Or am I looking at tearing down the house? Am I making a wild assumption here? Which is the right way?

Szymon : Yeah. So again, the usual way that works in 95% of cases would be: as you enter your profiling session, probably you will find the bottleneck and you will just fix it. But again, if we are entering this library/database land, you want to, A, be able to narrow it down, so that as you work with this small snippet of code — twenty lines, maybe a hundred lines — you amend it, you want to be able to repeatedly re-benchmark it in an isolated way. We will provide the same data, the same input. It can be like 10 or 20 cases, and you will rerun it using BenchmarkDotNet.

This is one thing. This allows you to perform — again, I try to rhyme to what you said — this surgery using proper tooling, and then you verify that the time, or the allocations, whatever, is getting lower. So effectively, you can verify and prove with the benchmarking that you’re moving in a good direction. This is one thing.

But there is also another thing. Imagine that your system or your library — that is, again, a PDF creator tool — is six months later and you introduce some changes. How do you make sure that there is no regression? What you can do, using the very same benchmarks that you created, which are as isolated as they can be — think of them as maybe of the unit test kind of shape and size — you can rerun them again.

Funny story is that actually the .NET team, they use BenchmarkDotNet in the very same way as I described. Every single release, there is a huge set of micro-benchmarks that are rerun, that verify there is no regression between the versions. You can use that to confirm that, okay, maybe we changed something here and there, but we are moving in the right direction. At least we didn’t regress.

Jamie : That makes perfect sense, right? To be able to say, “We’ve made this change; it is the regression test”, right? You don’t want to somehow make a change and then suddenly .NET is maybe four thousand times slower than it should be.

Szymon : Yeah, yeah. The nice thing is that with such a tool, you can have two things. One is that you can work with it when you try to optimise, when you try to squeeze it so that it’s so much faster — you can re-benchmark it. But then once you are done, in the long term, you can verify that there are no performance regressions there, just by rerunning the very same suite that you were using when working to make it faster.

Jamie : Obviously we’re from different development worlds here, but the regression tests that I write are: “What if someone changes the way this works and we get a slightly different answer out the other side?” Whereas, you know, both yourself and your colleagues at RavenDB, and the .NET team, you’ve got wildly different requirements, right? You all have to make sure that, yes, if we make this change, is it a positive change for the user or is it a negative change? Does that mean, A, the API gets more difficult to use, and B, does the API get slower? I’ve never had to think about that.

Szymon : Yes, but I think we have the same common denominator, which is: a regression is a regression. Maybe the criteria are a bit different, but I think we agree that making things worse is not what we want to do.

Jamie : Yeah, no, I agree. I agree. So let’s say we’re in this world where I have this service. I’ve run my benchmarks. I’ve got my regression tests that I can run on a performance basis. I’ve got my performance regression tests. I’m actually looking at the numbers and I’m not that happy and I want to make those changes.

I’ll just throw ref in front of all the parameters for all of my methods, then I’m never doing any extra allocations. Or is it a case of — how do I go from “I have this report” to “I want to take ownership of the system, and the allocations, and all that kind of stuff”? I feel like that’s too big of a question to answer on a podcast, but people listening will be like, “How do I do it?”

Szymon : Sorry for being so pushy, but I want to push back. Usually the case is that, with the benchmark stuff, if you are not into the library or the database area, profiling is sufficient to remove all the bottlenecks. By profiling the specific case, you can address that, and usually you don’t think that much about micro-benchmarking.

Once you move into the bit-more compute-heavy work — either it’s a library or, again, a database — this will be the micro-benchmarking thing. I’m not trying to discourage you, like, “Never run any BenchmarkDotNet, never create a project with that.” If this is a case, usually it could be just a few core functionalities that you may cover with it to verify that — okay, this is, again, a PDF creator library, so we want to confirm that, for example, flushing to a file, or whatever, is done in a nice and fast manner. That is what you would cover.

Or, for example, you want to issue a PR on the .NET repo, meaning the runtime repo. This is where you would use micro-benchmarking. But for the majority of the cases, profiling is just sufficient.

Even within RavenDB, sometimes the very first step is, okay, you want to discover what is actually slow, or what makes it slow, or where do you want to spend time, or invest your time, in which particular area. So that later, as you dig deeper and you narrow it down to this particular algorithm or compute-related part, then you will unleash the micro-benchmarking with BenchmarkDotNet and try to sprinkle it with meaningful benchmarks.

Jamie : Right, okay. That makes sense, right? Thinking pragmatically, my forms-over-data app, my GUI app that’s just going to load me some stuff and send an email, that doesn’t necessarily need to have the micro-benchmarking. But yeah, if I’m building a system application — I’m probably using the wrong words there — but if I’m building a database like you all are building, right, half-built, I definitely want to be micro-benchmarking. I want to see where those allocations are happening. I want to see why those allocations are happening.

Why am I allocating and creating new objects or new data, rather than passing things by ref where I can, or looking at — I think in our prep work you’d written “borrowed data” or “borrowed references”, which is a fantastic way of putting things like refs and spans, rather than saying, “I will create my own instance of an object, and then I will copy the data in, I will operate on that object, and then I will tell the calling method, ‘I’ve changed this object, so here’s a copy of the data’”, so we’re copying it twice, right? We don’t need to do that.

Szymon : Definitely, definitely, yeah. As you were mentioning that, there’s one thing that I haven’t covered, I think, that will make a point that there are some overlapping things in both the performance land and the regular business app land, which is a saturation point.

Imagine that you design an online shop that needs to survive Black Friday. The question that you can ask is: how much RAM, how many disks, how many CPUs are needed to survive? This is a separate story, where you are given, for example, a set of requirements with regards to the CPU, RAM size, and again disks, and you want to investigate how much can I squeeze out of it.

In the modern era, this could be that, okay, I’m willing to pay, because the public cloud will provide me with all the resources that I need, and you can basically pay for the Black Friday. It will be a huge bill, but you can pay for it. But still, you need to make sure that your system is designed in a way that, even if you are willing to pay a lot, it can scale and support the traffic.

So this is the thing, like, searching for a saturation point, that I think is similar in both cases. Again, maybe a regular business application won’t be suffering from these spikes. Maybe this is the case. But sometimes it’s worth knowing, okay, I can load it up to that specific point and then it breaks, so that you know that on the Black Friday scenario, it won’t fail you.

This is totally separate from what we were discussing, because it’s not about profiling the application, it’s not about micro-benchmarking, but actually answering the question: how many requests per second can it support up to the breaking point?

The breaking point is sometimes described as the elbow or the knee. If you imagine a shape of a chart where it goes — initially it’s like a line, so it’s linear, but then it goes straight up — there is a point where there is the elbow thing or the knee. This means, basically, “Okay, I’m giving up. There is so much happening, I cannot push it further.” This is the line that sometimes it’s worth knowing, regardless of whether this is a database or a regular business application.

Jamie : A hundred percent agree with you there. Knowing what the average usage will look like, and what your projected — like you said, your saturation point — might look like, is very, very important.

I do wonder whether some folks lean first towards “What is our saturation point?” and then dial back from that. There is a line in The Pragmatic Programmer, I think the 20th anniversary edition, where one of the authors says, “If you want to get Netflix scale, go get Netflix customers and deal with Netflix scale.” Which is fair enough.

But if you are Amazon, if you are a database provider like yourselves, a database vendor, then you need to know what that saturation point will look like ahead of time. Like, what’s our maximum throughput, right?

But I suppose for the standard — I don’t know what the right word would be, right, I’m using the bunny quotes folks — the standard business application, the standard forms-over-data, standard, you know, “place an order and we’ll get it to you soon, not immediately”, it is definitely worth knowing what that saturation point is. But I think perhaps maybe not focusing on that from the beginning might be a good way to think about it.

But if you are Amazon, if you are RavenDB, if you are Netflix and it’s coming up to the holidays, or Netflix and you’re about to have a live sports event, or you know, the big premiere of some new movie, you probably want to know what that saturation point is, right?

Szymon : Yeah, I agree. I would add that, even if you’re creating an internal application, let’s say for a bank, and they have one thousand, two thousand employees — let’s imagine that this is, like, a paid time off application that folks can register their PTO on, basically, where and when they want to take it. I can imagine that January the first could be so spiky, because everyone is rushing in and they try to — it’s nine AM and everyone tries to plan their whole year.

I’m not saying this is always the case. What I’m trying to say is, even if you get a rough estimate from business, right, that, for example, it will be one thousand requests per second, or one thousand requests per second through one hour a day, this is a rough and nice estimate that you can consider for your testing purposes.

Of course, you may come up with patterns like using queues, right? So if you have a persistent queue — and now we are moving into a bit of architecture land — if you have a persistent queue such as RabbitMQ, in the cloud, for example, Azure Service Bus, then a queue can be used to amortise the influx of requests. As you mentioned, for example, “We don’t care” — I think you said something like that — “we don’t care whether the order will be fulfilled now or in five minutes."

But I would argue that, okay, if you know where to put a queue, you are already in a good place. If you measure, for example, your system and you see, “Oh, this is the bottleneck”, then you apply some architectural pattern. This is also a valid case. You probed for the saturation point, you found it, and now you apply something that can solve it.

Jamie : Yeah. That makes sense, right? Even, like you said, even if it’s a line of business, there is going to be a saturation point at some point, and knowing what that is will help you with your design and to work around that.

You know that moment when a technical concept finally clicks? That's what we're all about here at The Modern .NET Show.

We can stay independent thanks to listeners like you. If you've learned something valuable from the show, please consider joining our Patreon or BuyMeACoffee. You'll find links in the show notes.

We're a listener supported and (at times) ad supported production. So every bit of support that you can give makes a difference.

Thank you.

Jamie : I realise we’re rapidly running out of time, and I feel like we’re only just scratching the surface of what folks can be taking from this. I know that we did some episode planning ahead of time, and you very kindly wrote out a whole bunch of things that we could cover, and I feel like we’ve covered like 10%. Again, ninety/twenty Pareto. Exactly, right, exactly.

So what I’d like to do — you mentioned, let’s talk about some of the tools that you’ve mentioned so far so that folks can go and check those out. Maybe if there are other patterns for performance measurements and for benchmarking other than the ones we’ve already covered, right? Where, if you feel like you need to, “Let’s go do micro-benchmarking, but let’s start with macro first and drill down."

Could you go through and just mention a couple of tools that you recommend, like dotTrace and BenchmarkDotNet? I feel like I’ve just done the job for you, but you’re the expert here, right? Then we’ll talk about, maybe, if there’s time, some things to be aware of going into this. I feel like benchmarking is hard, right? It’s not easy to do, because you really need to understand what you’re looking at. Then, maybe, if you have any resources folks can think about looking at. How does that sound? I realise I asked about a million questions there.

Szymon : Yeah, definitely. So let’s go back and let’s try the stack-based approach. The very first thing — and I think it’s both applicable to a regular business application and to systems programming as well — is that whenever you notice something is wrong, meaning, “Okay, this is getting lengthy or too long”, or your business person states that this is taking too much time, profiling is the first tool.

Again, my preferred one is dotTrace from JetBrains, which allows you to take a look at what is actually slow. This allows you to narrow down your search to the area that is actually responsible for that. There are many tools, like flame graphs, etc., but basically profiling allows you to understand where the problem is. The other related one is dotMemory, which allows you to understand how things are allocated.

Profiling is the first step. If you have any issue, that should be the place to take a look at and to understand what is slow or what allocates the most in your application. Then the usual thing is to fix it. Believe me, I had cases where it was just one profiling session, and then it was, “Oh, that’s it, how stupid I was.” But sometimes you will spend a bit more.

This is usually the main thing that can help you with day-to-day work to make things faster, up to the speed that is acceptable. If a particular piece of code is really, really used a lot — this is a really hot path — you can put that into a micro-benchmarking lab using BenchmarkDotNet, where you will create a short snippet of code using this particular call site, or a class or whatever, or a method. Then you can micro-benchmark it, try to amend it, micro-benchmark it again, and you will follow this improvement loop. Then with every single release you can micro-benchmark it again.

But this usually is not needed for a regular business application. If you’re writing a library, you create a database, infrastructure-related code, you might be using the micro-benchmark.

We mentioned the saturation point; we haven’t mentioned the tools. Things like JMeter and a few others — I can’t remember their names — they will help you in loading the system up to a certain point so you can observe how fast it is. That will be the three things that you can use to measure.

Once you measure, and once you have the understanding that, okay, this is a particular scenario, this is a particular place that I need to amend, only then I would invest time in making things faster. Usually this is not the case that you need to sprinkle it with unsafe and pointers and blow up everything and pray to not break anything. No, no, no.

Usually, there are small things, like a short lock, or using a concurrent dictionary in a proper way — not accessing keys, for example, which issues a lock, and that kind of stuff. You will see that from the profiling. Then if you are into — you have this small piece that is really, really hot and you want to make it so much faster — then you can push it forward with all the tools that we were discussing, meaning taking ownership of the objects, going unsafe, using stackalloc, and then maybe creating a thread, but this is very unlikely.

So in the summary, I went from, okay, this is the tool that we will use with the highest likelihood or probability, and we end with where we started actually, by defining systems programming and the fact that it’s quite unlikely that, if you are in a regular business application, these tools will be used only so often.

Jamie : It’s interesting to me, just like a little side note. Whilst we’ve been talking today, it has reminded me of a conversation that I had with one of the developers on the Ryujinx team. So this is a now no-longer-existing Nintendo Switch emulator.

They were saying that — I think it was in the .NET 5 timeline. No, .NET 3 something. Yeah, the .NET 3 timeline — they noticed a massive uptick in performance. Because what they were finding was, up until .NET Core 2, and including .NET Core 2, they were finding that when they were creating new classes and such, the compiler would automatically allocate everything in the class.

Which is great if you’re doing business line applications. It’s dreadful if you’re doing games or any kind of hardware emulation, because you probably don’t want to spend cycles pre-allocating everything, because you probably want to get into the method and just have everything set to null because you are about to allocate with the values that you want.

So it feels a little bit like some of your advice is saying go the other way. What the developer from Ryujinx was saying was that, in around the .NET Core 3.1, I think, timeframe, there was this ability to control that allocation at a compiler level, which helped them, because by not allocating, it meant that their application was faster.

But for most people, by allocating it means that your application is faster. Like the example you gave right at the beginning: one of the things that you can do is, if you know how big the list that you’re about to put something into is going to be, pre-allocate. If you’re querying a database, for instance, or a data store, and you’re saying, “I want 10 items”, if you pre-allocate and say to the compiler, “Create me a list of 10 things, and already, you know, give me enough space for that now”, rather than just saying, “Give me space for a list” and then add ten items to it. Because every time you’re adding an item, you’re reallocating another thing, right? You’re wasting CPU time.

But it’s interesting to me that the advice, depending on what you’re doing, seems to be the opposite. Or maybe I’ve misunderstood.

Szymon : Oh, I mean, clearly, every single piece of advice is context-dependent. Oh absolutely, yeah, so you need to be mindful about that. But I think that, as we were discussing, like the probability of using any given tool — if you observe your application with a profiler, and then having data, real data, you will reason about, “Oh, what is actually going on?” The likelihood of going wrong is really, really small, because you observe the actual application as it works. Then you need to reason about the outcome of it.

Jamie : Mmm. That’s where “measure twice, cut once” comes in, right? If you have these regression benchmark tests, you can make that change, run your regression — because, folks, tests are not just for when you push to CI/CD, right? You can run, and you should run, the tests every time you make a change to the code base. Maybe not every single test ever, but the unit tests for the bits that you are changing.

Szymon : One hundred percent. I can’t tell how many times I thought that I’m so smart, “Let me make this change”, and then after rerunning a benchmark, it wasn’t the case. So again, measure twice, as you said.

Jamie : Excellent. That goes hand in hand with new SDKs and runtimes. I feel like we’re going to hit a point where we can’t carry on talking again, and we’ve completely run out of time. But there is a point where the new SDK and runtime can affect how your application runs, both in a positive and negative way.

So, again, that is where those regression tests need to be. “Oh, there’s a new version of .NET coming out”, you know, like right now as we record, .NET 11 Preview 1 was released only a few days ago. Now, you probably don’t want to run anything in production with Preview 1, but you could run those benchmark tests for your application when compiled with the .NET 11 compiler and running on the .NET 11 runtime, and see whether, at the moment, right now as I run it, would we be better? Like, would .NET 11 make our application better, whatever that better statistic is?

So you can actually start planning that migration. It could be that migrating to the next iteration of the runtime and the SDK is not great for performance, but is great for security. Then you make that judgement call, right? Or am I barking up the wrong tree again?

Szymon : Definitely. We had a few cases — and that was the topic of one of my presentations — that just switching the framework to the latest brings such a dramatic change in the behaviour. Usually it’s good. But again, if you have such a suite of benchmarks and you can rerun it with the new one, you can verify that the hot paths that you were optimising so heavily will be properly handled with the new framework as well.

Jamie : Absolutely. Absolutely. So what we’re saying, folks, is: go do some benchmarking, create your regression tests, and just leave it at that. No, don’t do that. Do all the things, or none of them. It depends. Profile first. That would be my rule. Yes. Yes. Get an idea of how your application runs before you start saying, “Oh, I need to make it faster.” Why do you need to make it faster? Because I need to make it faster.

I’ve had an absolute blast talking to you today about stuff that makes my brain hurt. I just wonder, is there any way you want to send people to learn more about you, or learn more about RavenDB, or learn more about profiling and benchmarking? What are some of the places you want listeners to go to, to learn more?

Szymon : Yeah, so if you search for Scooletz, which is S-C-O-O-L-E-T-Z, you will find my blog, Twitter, GitHub, all publicly accessible. I write a bit. I also provide presentations, slides, etc. So you can dive a bit deeper.

With regards to RavenDB, if you’d like to take a look at it and play with it on your dev machine, just hit ravendb.net, find the download button and give it a try. It’s a database written 100% in .NET, besides the small JavaScript engine that is embedded in it. But it’s a full .NET DB.

Jamie : Amazing. Well, thank you ever so much for chatting with me today. I feel like my head’s about to fall off, is what it is. My brain is about to explode, but that’s fine. That means I’ve learned something, right?

Szymon : Jamie, thank you for having this conversation. It was a pure pleasure.

Jamie : It was an absolute pleasure having you. Thank you ever so much. Take care.

Wrapping Up

Thank you for listening to this episode of The Modern .NET Show with me, Jamie Taylor. I’d like to thank this episode’s guest for graciously sharing their time, expertise, and knowledge.

Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at the podcast's website, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice—head over to dotnetcore.show/review for ways to do that—reach out via our contact page, or join our discord server at dotnetcore.show/discord—all of which are linked in the show notes.

But above all, I hope you have a fantastic rest of your day, and I hope that I’ll see you again, next time for more .NET goodness.

I will see you again real soon. See you later folks.

Useful Links

Szymon on LinkedIn
Szymon’s blog
Supporting the show:
Getting in touch:
- via the contact page
- joining the Discord
Podcast editing services provided by Matthew Bliss
Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show
Editing and post-production services for this episode were provided by MB Podcast Services

S08E18 - Measure Twice, Cut Once: Benchmarking, Hot Paths and the Chainsaw of Unsafe Code with Szymon Kulec

Sponsors

Embedded Player

The Modern .NET Show

S08E18 - Measure Twice, Cut Once: Benchmarking, Hot Paths and the Chainsaw of Unsafe Code with Szymon Kulec

Supporting The Show

Episode Summary

Episode Transcription

Sponsor Message

Wrapping Up

Useful Links