The Modern .NET Show

Episode 111 - RavenDB with Oren Eini

Embedded Player

Episode 111 - RavenDB with Oren Eini
The .NET Core Podcast

Episode 111 - RavenDB with Oren Eini

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Transcription

Hello everyone and welcome to THE .NET Core Podcast. An award-winning podcast where we reach into the core of the .NET technology stack and, with the help of the .NET community, present you with the information that you need in order to grok the many moving parts of one of the biggest cross-platform, multi-application frameworks on the planet.

I am your host, Jamie “GaProgMan” Taylor. In this episode, I talked with Oren Eini about RavenDB, why he took the time to create his own NoSql database engine, and the fact that he built it using .NET Core before it was released (back in the pre-1.0 days, when it was known as dnx), and some of the optimisation stories that he worked on when creating RavenDB.

Along the way, we cover what the GC (or garbage collector) is, performance issues to look out for when dealing with large JSON objects, and some tips that he has for those who want to optimise their applications.

So let’s sit back, open up a terminal, type in dotnet new podcast and let the show begin.

The following is a machine transcription, as such there may be subtle errors. If you would like to help to fix this transcription, please see this GitHub repository

Jamie

So thank you ever so much for joining us Oren. I’m really excited about this, because I have no idea what Raven DB is. And if I’m honest, my entire experience with databases has been SQL and SQL-like, and I know from our discussions before we started that Raven DB is noSQL, I believe. So. This is all exciting to me. Right, let’s let’s talk about this. But I guess, firstly, welcome to the show.

Oren

Thank you for having me. Really happy to be here.

Jamie

Brilliant. I’m glad to hear that. I’m glad to hear that. I wouldn’t ever want to have someone on the show didn’t want to be on the show.

Oren

Manadatory podcast training.

Jamie

That’s exactly that’s it. Mandatory podcasting. I like it. We need to make that I thing.

Oren

I do worry where you will get a draft form.

Jamie

Can you imagine you’ve been arrested for something you’ve been sentenced to four years of mandatory podcasting.

Oren

The question is what topic I will do for you. I can talk for years on technical topics, but try to imagine that they give you you have four years of mandatory podcasting about eating.

Jamie

Yes.

Oren

So leaving aside dystopian futures…

Jamie

Yeah. So I guess before we start the conversation Oren and I was wondering, can you give our listeners a bit of a quick sort of elevator pitch about you and the kind of work that you do? Is that alright?

Oren

Yep. So I’m Oren. I’m the CEO of a company called Hibernating Rhinos, we have primary product, which it’s name is RavenDB. I’ve been a Microsoft MVP for about, I don’t know, since 2007, or thereabout. Heavily involved in the open source space in the Microsoft world and in general. And for the past almost 12-14 years, I’ve been working on RavenDB.

And that started because I was building a, I was working with an hybrid project, which is an Object Relational Mapper. And I’ve worked with data all day long, every day, all the time. And you mentioned that when you think about databases, about oh, there is SQL Server or Postgres, those sorts of things. And for very long time, that was what it was to sell the database and everything else, you know, just maybe some one off solution or something like that. And what happened to me was that, at some point, I got looped into the consulting gig and, “come help us make our system better,” and all sorts of things. And I did that for somebody else. And I kept seeing the same set of issues over and over again. And I was dealing with databases all the time. So I saw the same problems. And I got sick out of that. And I started dreaming, “what happened if I could do something better? What if there have been if there was a bit of database?”

That was around 2008-2007. And that is the time where we saw the biggest move away from SQL databases, relational databases. And if you’re familiar with some of the names, though, are no longer they’ll react project Voldemort. There was hypersomnia. And some of them are still here, CouchDB, Mongo. There is another one that I cannot recalled right now; MindScape, I think but it’s not that one. And all of them will, okay. There are noSQL databases, but they will really tools for very specific purpose. And one of the primary issues that they had was that they gave up a lot in order to do what they wanted .And something that they gave up are things like transactions.

Now, if you talk to a developer, you tell them, “how can you live without transactions?” they might say, “oh, I can do this. I can do that. All sorts of things,” and that will work. But the problem here is that once you start to actually live in a world of transactions, the amount of opportunity you have to deal with just explodes. And it just wasn’t what people said, “oh, working with relational database is complex. So let’s work in another relational database,” but now you lost transactions. And that is not a trade off think is beneficial to anyone. So I wanted to have a transactional non relational database.

And by non-relational. I mean, think about the typical mode of operations that how you design system, “oh, I have a table. If this is a complex entity, I have multiple tables that are composed of this entity,” and stuff like that. And that works. But the problem here is that the more complex your system, the more tables you have, the more widespread your data become. So you want to have a different way to model the data. And the industry more or less settled on the notion of JSON document as a good way for data interchange for represent object graph, stuff like that. And so you have JSON over the wire and in your API, but a relational database that is wildly different shape in the bucket. And that caused no end of problems. So I wanted to be, I set up to be RavenDB as a transactional database that can natively work with JSON. And at the time, I had almost no idea how to build databases. And I say that, I mean that I built only three or four at the time. That’s like saying that I built the FizzBuzz, or hello world in terms of databases, because they use existing engines and capabilities to do that.

And it didn’t take a lot of time to figure out that this solved an actual concrete need that people had that they would be able to simplify the system significantly. And after I had the core project, we started working, “oh, what other problems do we have that we can help solve them solve them, we solve in a relatively unique manner?” For example, if you want the database to be fast, you have to use indexing, you have to be able to store the data in such a way that the database has an easy way to access it. And that’s what the indexes actually do. And the problem here is that for almost all databases, you have to tell the database, how you want to index the data. And if your data access pattern changes, well, so does your indexes. And that can be an incredibly expensive thing to do in terms of manpower and amount of attention, you have to provide it.

So one of the things that we figured out was that, “hey, if I’m the person writing the database, that entity has the knowledge, you know, what indexes you need. So let’s go ahead and create that automatically on the fly.” But if you’re used to relational databases you go, “no, no, you know, you’re never touching the indexes. In production. That’s the sort of thing that if you do, the system would fall over.” But again, you get to design the system, so it wouldn’t fall over in this situation. And then, basically the ball started rolling. And I’ve been working on that non-stop for quite some time. I think that next year, we’ll do the 15th year of the first commit for the project, and that’s basically me, and a very short amount of details.

Jamie

I like it. Sometimes, sometimes a person’s story is rather encapsulated within the story of the thing that they create, right? So sometimes she you know, if somebody says to me, Hey, Jamie, what do you do, then? I can’t really say, Well, I’m a computer programmer and stop. They’re gonna, you know, but I talk about all the other things that are related, right?

Oren

Yeah. And the, the process of writing this database is, subsume a significant portion of my life for the past decade or so. And today, the juicy thing about that is that I remember not wanting to write a database, a database writing a database, easy, huge task. I had no idea how big it is, at the time, and I still knew it was big, so I didn’t want to hide it. And it kept buzzing in my head, and it wouldn’t leave me alone. I remember waking up at William staring at the ceiling, and heavin, have you seen Minority Report where the doula drag and drop, I had that as your main shapes or the ceiling that shows the process of data flow between a document and indexes. And I like I can’t, I can’t not do that it was consuming every waking moment, and apparently, every sleeping. So I started writing that, and I’m a Dota, developer by trade. So I, I’ve been working in the Dota platform since the 1.0. version, the alpha version, actually. And it was natural for me to go and open that. And I credit that the fact that he started to write in .NET, meant that I could skip over a usually a huge number of hurdles that I had to deal with, because there’s water for more candidates. I’m currently in the process of writing a blog post series. And I’m showing how you can write high performance dotnet applications. And I chose this I’ve chosen to write a ready screw, it has two functions getting set. And I wrote that a in the most naive way possible. Don’t worry about memory, don’t worry about reading, just write it, the backend stew is the core dictionary. And I’m getting a million requests per second. So my goal is to see if I can hit the 10 million requests per second. But the evolution that they got is already a so far, so valuable, that 100 lines of code, give you something that is typically far beyond anything that you would actually need in a more systems. And a I think that was a major pathway that was actually able to build the engine and everything around that. And that was actually a subject, we managed to build almost five years of continuous releases and optimizations and additional features, everything like that. It didn’t realize that a we hit a block this big wall that we could not traverse because of the way that we built the system. And the primary reason was that we leaned on .NET to do its work, to manage memory to manage trades, all of those sorts of things. And what was the problem there? Let’s say that they have a JSON string. And I want to I have a JSON document that represent you. And I wanted to say, What is your name? So how do we actually process that? Well, I’m going to do JSON dot pals, get the then dot name, and here’s your name. But what was what are the actual costs behind that? So I have to pass a document, I have to materialize that into a set of dotnet objects, typically a sort of dictionary. And then I have to look up the relevant value. And then I’m going to just discard that and the GC is going to a removed at some future point in time. That’s on one hand, that’s magical. That is so much that you don’t actually need to worry about. On the other hand, once you start thinking about a performance, that means that you’re now screwed, because you don’t have control of that. And there is also another issue that we have to consider. Every time that a policy document, I create a lot of magic objects that then need to be collected. And the more complicated the JSON that they have, the more that cost rises. So in order to manage that, we had to effectively rethink going down strategy. I remember sitting down and writing here, here all of the things that we learned from being in production for five years.

Here the things that are painful. And I remember going into a production system and doing a support call. And let’s look at the percentage of time in GC and it’s 89 80% which effectively means that I don’t have a way to even address the actual problem, because I have high CPU. But it’s actually because of my location rate. And that led to a big issue, because the allocation rate that we had to deal with wasn’t because of something that we need right now, it’s because maybe some other thing that happened, shifting the pattern of access, stuff like that, and we got really good understanding of how the back end of .NET works, how the GC behave, the trading pool, a and everything like that. And then, okay, how can I effectively redesign the system. And the goal was to see if we can design a system in such a way that we are going to play to its strengths. So I know that it’s greater the GC cleans up after me, but can I avoid giving it garbage, that they don’t have to that they don’t have to? A, so we started thinking about all sorts of things. And many of those things are now standard practice of fogging, high performance code. So make sure that you have object pooling, make sure that your resume buffers are sorted. But because of the nature of the beast, the fact that we’re writing database, we decided to do two, three very important things. One of them was that we want to own the stack. That means that I don’t want to have any significant a component that I can’t look at and modify. And we have modified pretty much every single component that we have in the system, because we have to be able to do that. The second thing was that we said I want to go to manual memory, manual memory management, which is a really, really strange thing to say. Because I’m walking with it, what does it mean to have manual memory management. And the idea was, that will actually going to allocate memory directly from the operating system native memory. And use that, because the pattern of access that we had in many cases was poses a single query a single request. So we could allocate a buffer and a use the memory from this buffer and at the end of the request, we could just discard the entire buffer as a single unit. Except that we don’t even need to discard the buffer, we can now use that same buffer for the next location for next request for next worry that we have. Which is amazing, because now the cost of GC was reduced to the a, oh, I have to set the last allocated a location to zero and everything else was freed. And this is called a bumper location winner location. A, we started a look into that, and what some of the other things that we also needed to do, we had to be able to run on Linux, we have a server software on Linux II, one the server was pretty much so we have to wonder but .NET in that at that time had no no answer for that there was mono. And anyone who ever tried to it was also a mono knows that you should not. So I have a it’s been

nine years since the last time that we tried to answer steel model. And I still have discussed with that. And a that timeframe, we had a DNX what ended up becoming .NET Core came out. And I made a decision that we want to run on that. And that was an incredibly painful process that there was no ID writing code in and then compiling by hand and there was a lot of shenanigans around or you don’t have DCPI you don’t have that API, etc. But we were actually able to one on Linux, we’re able to a adopt this process of manual memory management and the notion that we are going to move as much as possible of the smaller tiny locations that you keep running into, into something that is scoped to the appropriate request or context that we run in. And then he just discarded. And a we push on that model a lot. And as a result, took us almost three years of active development to get the next version out. But the benefit of that was that when we run it we were running on almost I didn’t even know that that was I was willing to accept for the new version was it has to be at least 10 times faster than the previous one. Because you’ve sorted data faster. So toward the time that we invested into that, and we actually met and exceeded that goal. So you get to the point where a transaction database can heat up to 50,000 writes per second and a million reads per second, on a large scale data that’s double just well for us. And that’s it, that’s the that’s the project, how we got to where we are today.

Jamie

Okay, so you mentioned it a few times, just because there’s, there’s some people who listen to the show who either are new to dotnet, or, you know, are kind of new ish in their journey. And they want to understand what the garbage collector is and why that’s a big problem. So I just want to throw this out real quick.

Oren

So here’s the here’s the issue, when you’re walking with .NET, or Python or Ruby, you don’t usually think about memory, you say I need an array, I need the least, then you get them and use them. And then you just stop using them and someone goes and clean it up. That memo is being released to the operating system, that memo is then being used to kill the whole point is that there is some other entity that manager conversely, if using something like a C or Zig or C++, in many cases, you have to worry about the details about okay, I just hit a vector of numbers. And I use that now I swore to stop using that. How do I do that? How do I make sure that a memo is released? How do I make sure that this memo is released, even if there has been an ill? And because they if you don’t release all of that memory, that means that, hey, you know what’s going to happen, you have a memory leak. And that’s, that’s awful, it’s going to be a nasty way to spend a couple of hours total trying to figure out where you missed a one quart part where Oh, I’m allocating here, but I never actually imagined that during the second phase of location would I need to know then I’m not actually listed the memory and all sorts of other stuff. And the garbage cradle basically fixed it. This is the whole notion of automatic memory management. Where is the problem with that? Automatic memory management does not mean free. And one of the things that you have to realize that the cost of garbage collection is proportional to the amount of memory that you have in use. And why is that because effectively, it’s up all the time, the garbage collection needs to make a decision, I need to collect all of the memory that you allocated to you no longer use it. And the way that it does that is it run a scan it go through all of the memory that you have in your system, and try to see if that memory is reachable. So let’s say I have a static variable, I have something on the stack. And I can start and those are called routes. And I’m going to start from those routes. And I’m going to walk the entire a set of the data object graph to find all of the reachable objects. So the multiple objects you have the it’s going to that so the more you have a reachable object, the more you will actually have a work to do. And that leads to some interesting observations. Here is an interesting optimization that we had to do 11 DB, we had if you will disable the cache, memory utilization and CPU utilization will drop down significantly. So if you use the cache reven DB would use more memory and more CPU than if you did not use the cash. And on its face, it makes no sense. How can I pick but then you realize that the cash is also keeping a the management. So the more items you have in the cache, the more reachable items you have, the more the GC had to do work to do to scan at which point it would scan objects and try to say, Oh, I cannot really send it. Okay. Next time I’m going to do the same again. But in order to to handle that GC has this notion of generations. So the typical least access that either I allocate some memory. And I’m immediately going to forget about it and never use it again, this called the young generation of Gen Z, or I’m going to keep it for a very long time. So it’s going to end up in the old generation or gen two, generation two. So what happens is that every time that you have survival from a garbage collection, the GC, the garbage collection is going to move that to the next generation. The idea here is that we want to reduce the number of items that we’re going to scan. And demote GC you have, the more items end up being in the older generations, which means that those cars become more expensive. So just the fact that they had a cache that was usable and kept things in memory for longer

meant that I would pay more in GC cycles, I would pay more in a CPU, I would also be more in terms of the memory used. Why is that? Which is ridiculous, because the whole point of a cache is to trade memory for a CPU. But what happened some computation. What happened was that the cache, the memory that we used was filled to the brim with cache entries, at which point the cache would start evicting items, but also items that are already on gen two. So it will take even longer for them to be removed, at which point I would add more items to the cache. And the cycle continues. And I remember when we realized that this is the case, that was a consequence of a series of absolutely legitimate decisions, the way that JC was generation’s caching all of that. But all of those decisions together means that if you enable the cache, the performance is going to drop significantly. And that’s non obvious in the extreme. So it will quite a lot of issues like that, that will only dimension the JSON 1000 is another example, will then amount of managed memory. And the number of items that we have in the in the object graph is basically affected by the user. And it’s relatively simple to generate a one kilobyte JSON file that would when you pass that would consume a megabyte of managed memory, which is a bad place to be at a. So what ended up happening was that we took all of those things that we know. And we figure out how can we avoid? And the answer to both of them, and many of those problems actually end up being the same solution. So instead of a stolen data, JSON strings, instead of passing that into managed object, we actually have a serialization format. We call it splittable. That allows you to store JSON data in a format that is immediately accessible. Because I get a seminary data, and I’m going to say, Okay, give me your name. And then it’s going to know I need to jump to this particular position. And I’m going to get the name and everything has a roundup, and then you have a await. If this is just raw Pinelli data, it should not be managed memory. It could be native memory. Well, it’s transparent to the GC. So I may have 1000s of those items. But it doesn’t add up to the cost of GCC. And then we will took it further said, Hey, one second. I’m a this is just the backfill. I can use an old map file. And instead of doing things like reading from disk and stuff like that, I can say, oh, no, what I’m going to do, I’m going to just mem open the file, give you a pointer to the specific location, the file and let everything work from there. So we actually managed to integrate to four different jails, of solutions of solution to problems that we had into one common goal. And then it started spreading further. And we actually ran into some really interesting issues. For example, when we start writing a DDoS attack to flippin dB. We had our management, and we need to write it to socket, but the only API that you have in Dota Time to add to socket is a stream. And a stream accepts a byte buffer. So we had to do to jump through a multiple hoops where we would have the native memory. And then we write it to a byte buffer, and then we write it to the stream. And it was still faster. And now we have span and memory that manage that directly. So we don’t have to think about this. But it was in it. But we actually see how the Dynex came out. And eventually, you hit .NET. Core, and it was open source, and

they accept contributions. And you could always look at this hospital .NET, either you had the shell Source Initiative, or there was a .NET reflector, which is an amazing tool was an amazing tool. So you could always go in picky and figure out what’s going on. And a part of the difference was that now when you had a problem, you could fix it. And previously, you would submit a bug. And if you were lucky, it would get fixed in 12 to 18 months. Which means that effectively, this was nothing, there is nothing that this is all there is a bug, there is always going to be a bug fix is not allowed. And now you have the ability to go into the code, fix that, run your own version if you really want to. But if not, then you know, in three weeks, six weeks, you have a version that’s acceptable. That’s something that you can live with. It’s even more interesting when you consider that the A Belial for making a change in .NET. Previously was however many people you have in the .NET team. Whereas now it’s however many people can get the LPL Milch. So there was a huge amount of chores that you needed to do. And suddenly you had the community that everyone I really don’t like that there is a instinctly they’ll delete in the location, there is the location of about a six kilobyte of memory every time that you allocate this limiter. Why? Because he does buffering and not much faster Africa. No way to fix that. But then it was open source and it was fixed. People say Oh, I have a problem here. And I can submit a PR and get it accepted, it’s going to work. And if you look at the .NET results, you can see always oh there is this fix this fix but and sometimes they’ll be the span specification of .NET was a huge issue, for example, value task is a huge issue. But in many cases, oh, let’s optimize a link operations to specialists for this particular scenario. doses and dosage things adds up the sort of cascading effect. We add up a shaving supercenter from here some purchases from here. But then you look at the overall ecosystem. And okay, just upgrading the version, you have a major performance boost 10% In some cases, I think it was a really great way to dotnet dotnet core versus the dotnet framework. You could have your own version of the firmware. So previously would have whatever is installed on a machine. And every time that you had to update that was dismissed, Biscay big risky thing that you would have to do. But now, now you’re now I get to say, oh, no, I don’t care what version you’re running. Your code runs whatever you want. My my service has its own version of the framework that is a private read. I can deploy it separately, I can version it on myself. Just to give some context right now we have multiple versions of revenue be out for 4.2 to 5.3. Each one of them is tied to the on a developer that simplifies developing development and deployment significantly.

Jamie

I you know, there’s there’s a huge amount of stuff that you’ve said that is, so far, we’ve even talked a lot about, like the optimization path and stuff. I love the idea of being able to ship my application with all of the bits that it needs. This sort of self contained, deploy And I love that we could do that with dotnet. That’s, that’s fantastic. Because you can have, like you say, you can have the globally installed version of dotnet. But you can also have this as the local version, let’s say you’ve got dotnet, core three installed globally, but your application needs .NET. Six, will you ship it with .NET, Six has a self contained release, and it will use that version. And I also like what you said about all of the innovations that we get for free, right, that community may come up with an idea, I believe, now, I may be wrong about this, but I believe span of type T that you mentioned a few times, was a community idea. And it was I mean, I may be wrong, but I feel like it is. And that was just sort of adopted and like you say, the team at Microsoft then went through the entire code base and Spanner FIDE it right. They were like let’s use span everywhere, instead of you know, buffers or streams or whatever, in places. And and just that, that, that stuff that we get for free, is amazing. And I feel like because I was going to say when you were talking about all the all the the optimizations, you were doing the back of my head, I was going What about spends every turn spends, and you’ve already done it. But I feel like it might be relevant to say to listeners that a lot of the optimization things you’re talking about that you’ve done for Raven DB are whilst there, they might be good to have a in any application, I feel like a lot of the things that you’ve done a very kind of specific to revenue, what, what what I don’t want is for people to come from this episode and go, I’ve made it to do app. Now I have to optimize it this at the same level that Oren did for me.

Oren

I teach copywriting course in university. And I was just talking to some students as Oh, we have to make sure that the system is scalable and can handle a lot of requests. And one thing that I asked him was, how much is lots of requests. And they had no film answer. Now those students do not necessarily have to read it. But building a system that can do 10,000 requests a second is quite simple. You don’t need to go crazy. If you want to go to the 100,000 to the million, you have to do some work. But again, I wrote 100 lines of code to simulate, readies. I eat, I eat a 945,000 requests per second in the most thing if possible way that I could think of. And the typical way that you want to process lots of things is what am I trying to optimize for? Am I trying to build a system that’s going to be humbled by lots of requests and have a very short latency to process the request? Okay, that’s what am I trying to optimize the latency of delivery? So am I trying to make sure that I can make changes and push it into production and have to optimize the that pipeline of a user request to its loving deal? A and in most cases, I would say that unless you have a strict requirement, and the need to actually have high performance. Don’t worry about it. Don’t worry about it, because mostly that I’ve saw, it’s far more that the things that slows you down are not necessarily the the cricket the intricacies of the interaction between a caching and the GC, but more or you have n squared algorithm facility never consider oh here you need to load six items from the database. You go to the database six times instead of what and a that sort of things. And the old adage about a Oh, make it work make it right make it work first is true. That said in many cases you can say upfront, what do you want? What are the requirements then you can design your system accordingly. Here is a relatively simple example. Going back to the reddest example, I’m reading strings form Do not walk. That means they have to do a lot of location when I pass in the street, and I have to do more locations. I can read them as bytes. But it’s vital. But then I have a question, how do I manage? How do I store the keys and the values? I always told them as many audits to restore them as native memory. How do I make sure that no one is referencing memory that hasn’t been in use yet and almost have other stuff like that? A, it’s very easy to take upon yourself a lot of complexity that you they you order silly wealth. Manual memory management means that you have far better control of what’s going on. But if you look at a satirical code dictionary, in C Sharp, it’s nearly impossible for you to write something similar in C or Z gwass or C++, you have to use something called epoch base, a garbage collection, which is an insanely complicated way to say you have a garbage collection in a language that does not support it. A that’s retrieval, a lot of things that the GC makes possible. Another thing that I think, is really important to understand the implications, you have hotspots, you have the things that you spend most of the time doing, and optimizing them give you the most bang for the buck.

And one of the things that I love about a C sharp dotnet in general, is that I have the option of saying here, I’m using a stalks in spam, and I’m doing no location whatsoever. Because this is the heart of what I’m doing. In here, I’m just going to do strings glow. And I don’t care because this is going to run once. A Oh, this is the this is all things that happened during system startup. Yeah, not not an important consideration for.


A Request To You All

If you’re enjoying this show, would you mind sharing it with a colleague? Check your podcatcher for a link to show notes, which has an embedded player within it and a transcription and all that stuff, and share that link with them. I’d really appreciate it if you could indeed share the show.

But if you’d like other ways to support it, you could:

I would love it if you would share the show with a friend or colleague or leave a rating or review. The other options are completely up to you, and are not required at all to continue enjoying the show.

Anyway, let’s get back to it.


Jamie

oh, yeah, absolutely. I think that similar to what you’d said, or when you were talking to the students who were like, We need to make it hit, you know, survive a billion people hitting it at the same time. When people come to me like, I’m like, Well, I mean, do you have two users yet, you might be optimizing a little bit too early, you know,

Oren

it’s one point I got a request from customer, or we wanted to wanted to walk Bisco to ask for numbers or just make it look like Google. And this is in a went into the SEC website and pull the quarterly report for a Google and I gave them a quote for $50 billion, or something like that. And apparently, I somehow broke the system. Because the fact that they had a quote of this amount, a period apparently broke a lot of reports and made lots of people very angry. My point was, okay, you want me to make it a Google, here’s the Google budget, I don’t want the whole budget, I just want the quarterly budget, I will make it work. And I got the point across that, you know, you get what you paid for. And if you want to be able to scale 2 billion users, you’re probably not going to do that, or issues to the budget. And sometimes the way you subsidize, you have to make that determination. These other issues issue, especially when you talking about people who are starting out in startups like that, where the expectation is that, Oh, this must be scalable. We have zero user now, but we must be ready to accept 10,000 users on day two. And the problem here is that what do you get for what are the what is the cost of that sort of behavior? And I found that in many cases, by the way, if you find yourself in the situation, that is a relatively simple way to fix that, that you can ignore performance ignore but still be able to scale and still be able to live a life quickly. And that that solution is to put the behavior that needs to scale behind the queue. So let’s talk for a second about a let’s say that I’m a accepting, I don’t know. Email just from an app on your phone, that is a, we have to do image recognition that and say whatever this is a stolen piece of art or not Shazam, for a painting whatever. Well, I can try to create a highly optimal system, I can say I don’t care. What I’m going to do, whenever you upload an image, I’m going to throw it in the queue and have someone process that. And at that point, I, I have people who are upset about me, wait, how does it solve the problem. But let’s talk about the relatively simple scenario, I have a piece of logic that I need to add some computation or the data. And maybe many users will provide me these data. And I have to do all of the work all that well. What about if I toilet into a cube, and instead of trying to process that in line, or I just go through the cube a prosthetic backup in the background, I’m not going to try to write something special, I’m going to plug this into an Azure function. So you have a cue at the end, every time that you put up a cue, it goes into individual function. And then as you function as item itself, is going to spawn however many instances that it needs to process that. And so a lot of the complexity that is involved in actually running and managing that, I’ll just note that someone else already solved them for me. And then,

again, the holiday is that they get the nice development experience. And if there is a need to scale, then they as you want them is going to handle that. And at some point, oh, you know what, we’ll spend a lot of money on a zoo on a all of those function invocations, then I can see it in a try to optimize that you know what I can do, I could do something nicer, let’s add another lesson to the queue and write all of those items to the side. And then I can take all of those items and run the same walking set that they just had in production or the provider. And typically, I could opt I could or will find their own square that they have there or something like that, that oh, how could it be so stupid to to be something like that and fix that. And the a function bill, the demo that I’m going to pay is going to drop by a sub 300 percentage points. It also not a hurdle number that I’m pulling to solve real things that I’ve seen. And from the order that starts from the concept that every time that I know that they have something that I’m going to have trouble skinning, I’m not going to try to do it online. I’m going to throw that into a queue, let it let the queue management.

Jamie

And that all makes perfect sense to me. I feel like because we’ve we’ve talked about optimizations, I feel like this is a conversation to have, I feel like we modern system architecture might might not necessarily be the best way to do stuff. Because I feel like, in my opinion, right? We developers like to leap directly towards what’s the most complicated way that I could build this system, right? I see so many tutorials online that I like, I built a blog, a static site blog using Kubernetes. And, you know, JavaScript as well, I mean, JavaScript is not complicated. But you know, it’s Kubernetes and node and Docker. And it has 100% uptime, except for when the server that’s hosting it fails to 300

Oren

parts of that is down. And in order to deploy to make a change. It takes three hours and a PhD in computer science to figure out what is going on. Yeah, and the Yeah, the problem is a developer wants to work on interesting things. No one wants to work on the boring stuff. And I think that this the difference between a developer and architect, a developer wants to work on interesting things. And the job of an architect is to make it as boring as possible. But a lot a lot of things that we’re now facing, in terms of complexity, they’re the type of application to build in heaven significantly changed. In the past decade or so. Oh, some we’ll build a lot more API’s to, to manage a applications But not that a difference that before. A major part is a difference is the sort of qualitative difference that you have to provide well before 15 years ago, I have a startup idea too, that you would scan on the phone, or for every since and we will submit your own expensive portfolio. That’s an idea. I can build a startup of that today. Well, I can give that is literally a class assignment. And expect a users to expect my students to complete that on a weekend. And a, because this is something that you now have available and democratized then you now have a lot harsher pressures to deliver more. At the same time, we see a lot of people spend a lot of time on infrastructure puzzles that makes the system so much more complicated. If I’m looking at, I have a issue with Kubernetes, and its complexity, because I’m saying, Okay, if you develop if you’re deploying to a on physical hardware, then it gives you some advantage. If you’re walking on the cloud, there is almost no difference from a conceptual perspective between a Kubernetes pod and a virtual machine that you got from the cloud. What are the same, so the Kubernetes API is cross cloud and the cloud API is specific for the code that you’re using. But beyond that, now, on the other hand, you have the notion of containers and the ability to packaging time vomit in one shot, which is a major improvement, because otherwise, you would have to figure out what all of your dependencies and all of that on your own. That was not a fun game, I still remember where we used to do lift and shift. So you take a physical machine, and we just copied as ease into a virtual machine by cloning the hard disk, because no one knew what sort of secret dependency you had that had to be just right for something. Yep. Now to store all of that in a Docker file, and it’s working.

Jamie

Yep. And yeah, my, my, my go to thing that I say to people is your boss, client user, whoever it is that’s paying, or investing, either their time or money or whatever? Do you making this thing? Do not care how you made this thing? Right? They’re not going to be excited about the fact that you use Kubernetes, and a million pods, and you’ve used a sharded database. And there are functions and there’s you went cross cloud, and you did all this stuff. All they care about is when I click the button, does it do the thing? Right? So my my goal is make it so that when I click the button, it does the thing first.

Oren

Here’s an interesting observation. We built the user interface for the DB a using a single single page application. Now I have to admit that the last time I built a web applications, I believe that the proper way to do a formatting alignment of stuff was to put everything inside the table. And apparently, the UI, the quality that we’re using is knockout, which is now out of date doesn’t work, whatever. It’s still working. Everything else is just fine. But we have a user say no. Why are you why are you using this outdated technology? It’s working and we have a significant amount of coda. Oh, no, you have a legacy database. It’s legacy. It’s perfectly fine walking, walking code, but people have this epoch. And the moment that you start chasing the new thing, the oh, I can do this and do that and everything. Then you end up being in a bad position in a short amount of time. Think about people who bet the farm on Docker. And Docker is currently not a healthy company. But what is going to be the state in three years. Kubernetes is huge right now, but give it two years and probably the focus will be elsewhere. Do you still want to edit all of the sample files there? And there’s also the is something called the complexity budget that you have in your project? And this serious question is, well, do you want to spend that complexity budget? Are you going to spend that on your infrastructure? Or are you going to spend that on solving a business problem? And I don’t have a puppet answer, because it depends on what are the priorities for the user. Here’s a great example of a person who spent three weeks defining a set of Kubernetes scripts and models that they can use. So in the end, the clicker bottom and they have an application app. And it’s great to little creep, and deploy to multiple clouds almost philosophically. And it took them three weeks, and someone doing everything manually would have taken less than a day. They’ll only the thinking there was that I’m a product company, I do a lot of projects, it’s worth my time to optimize for that particular scenario, because I do that all the time. But in most cases, is this where so is will you actually getting the benefit? Another thing that is also super important that we learn? The hard way? Is okay, you have this cool new toy? How does it operate in production? How do you monitor, debug and operated? And that’s something that I think people don’t pay a lot of attention. Have anyone who ever tried to debug the ingress behavior of a Kubernetes? Or why do we have this particular pattern, of course or stuff like that. And you suddenly we have so many different components in the system that have to play together in order for something to walk, and the break now, all of the structure that you will rely on has to be peeled off in order for you to understand what’s going on. It’s insane level, of course, that you go to the paint.

Jamie

I agree, I agree completely. And the worst thing you want is, like you were saying, You heavily invest in this technology during like an early phase. And then in two years time, you’ve got you’ve gotten past your MVP status, you’ve put your application out there is making some money you’ve sold it on or something and the technologies that it’s using, just like the people who were creating it just go out of business, or there’s no support for it, right? You don’t want that. Because you need to continue to evolve your project. Right. And that’s it. I feel like that’s more of a long term, like you say, maybe an architect decision or a business decision that a lot of developers don’t really easily fall into. And I feel like it’s it puts you in that position where? Okay, so personal story. The I have an apprentice, and we and yes, we’re recording this in 2022. And I had the apprentice build a WinForms app. Right? Because the goal was not let’s build a WinForms app, the goal was let’s learn a whole bunch of principles. And they already had loads of experience of building wind farms, because part of the course that they’ve been doing has been wind farms, whether they should have done or not, is a different decision, right. But it also meant that, at least at the time of recording, there’s 22 years of documentation, examples, blog posts and videos that they can fall back on. If something doesn’t work, right. Yes, WinForms has kind of fallen out of vogue. But it’s still supported by .NET, Six, and there’s 20 plus years of resources.

Oren

Here’s a great example. So WinForms has been it’s basically take a witness with 32 API and expose them outside. So if you will aware of Windows two API, you have a very smooth transition to that. Now, let’s say that you actually built applications in WinForms. What does it means? Well, it means that your application for very long time could run on Linux. It means that you will, you don’t actually kill about things like a movie or a UI. Will you say that? Yeah, or WPF. Or there was the U WP and I don’t even remember the name of photos. They’re a UI framework that will push by Microsoft, just in the past decade. But you was only on WPF on a WPF then oh, you’re stuck on Windows and maybe there is a Avalon that might help and now there is the way that might help. But if you want running on a WinForms it’s there. It’s work. Kindle is no hassle in a cross platform, there is no hassle in actually building you wouldn’t maybe you don’t get it to be as pretty as equaled with WinForms for sure a, but the entire system is stable. Now, here is another important aspect. You mentioned that the company may go out of business, but whenever whenever they have something that is specialized, I have to consider also the manpower requirements. How much is it going to cost me to get someone who is specialized in this technology, if the post that they have walking on that is not available, they quit? They a took a leave of absence, whatever. So do I have anyone else that can debug this Kubernetes CI, a pipeline? Or is it oh, he’s off. I don’t know how to do that. No one knows how to do that. And a at one point, I remember, I was working for a pedal company. And the power company had consider a sub data to be ultra sensitive. So they wanted to encrypt that. And encryption was done using some guy who bought an encryption algorithm of I have no idea what quality. And the problem was that they consider the encryption to be secret. So he actually stole the code for the encryption algorithm in a USB thumb drive that he put in his pocket. And every time that he left the building, he would put it in his pocket and live. That was the secured way of doing that. Coincidentally, by the way, he was a fireproof, I have no idea why. And but that was a great example of a particular choice, sticking you into a data. But but at the same point, let’s say that we’re talking about cryptography, cryptography is a great example of an area where you need to have the expertise, you probably don’t have the expertise and messing up can have catastrophic consequences. So again, you choose what the wisdom of the crowd say, because they that has already been vetted. And in many cases, they boring old technology is the one that he wants to use, because it’s predictable. It’s the failure modes of that he’s understood. I can go and ask pretty much any winter to question and put it into Google and have an answer. With I’m using a thermal cut issue right now. The idea?

Jamie

Yeah. And so like we with this apprentice, we are, if there’s time during their apprenticeship going to migrate to .NET Maui, but only if there’s time. And if the documentation is is that right? Because I was talking to a friend a few days ago at the time of recording, and he said, you know, you got to be really careful, because some of the documentation for .NET Maui exists. But because it’s evolving so fast, it’s wrong.

Oren

There’s nothing. So senior people out TV used to okay, you might need to do a little bit. And one of the things that I love to do is okay, let’s go and look at how this is actually implemented. So the sound better what’s going on. But someone who is starting out and they documentation is wrong, or there is documentation bug. Well, that sucks. And sometimes that sucks a lot. And they don’t see that. And I think a major difference between in terms of experience is just being able to read the whole message. I had someone who tried to do some process where the automatic SSH automatically to process into machine and then deploy some stuff in cetera. And that was really interesting, because the it failed. It the reason that this failed, was that the will a righty that Windows and the piped slash are slash in into bash, and he didn’t handle that. And there was something in there that says that I was able to do this. Oh, the problem is here. But they wouldn’t see that

Jamie

right Yep, it’s a, it is a very important skill. I remember I have a similar story. I was working with someone a number of years ago who is new to .NET. They were flying as they were picking it up. But every time that an error happened, it was almost like they throw their hands like a compiler error or a runtime error, their hands would get thrown up. And they’d say, that’s out. I can’t do anything about this. And I would say, what is the error message? Say, Oh, well, you know, it stopped on this line. No, no, what does the actual error message actually say? And like you said, there’s a skill involved in, in pausing that information, and at least, with C sharp and JavaScript and these sorts of modern higher level languages, those error messages are really good, right? I would love to take someone who’s struggling with a .NET error message and give them something that’s written in C and say, right, it’s broken, is the error message. Good luck.

Oren

You know what, here’s a here’s a great. I remember, the year is 2001. I think I wrote my false seashell pug up at the time, I was writing in C++. And I wrote something like that a public class for one, static void Main and a, I had a field variable called for one. And I did F dot show. And it died. It didn’t work. And I could not understand what’s what’s going on. The there was some none offense exception, but okay, this is alpha better code, I don’t care. It’s probably broken. And a few weeks later, I remember, wait, I had this misunderstanding, conceptually, because I was writing form F semicolon, then F dot show. And it was like in C++, well, this is not the this would actually be a stock allocated value, not in C, this is actually a pointer. So when an actually did form F equals new form, and then F to chill, and then it worked. And then it went and wrote the same thing in C++, Forster, F, F dojo, or with a Osho. And I propelled ELS. And in .NET, it was the first exception on line 23. In C++, it was access violation exception, try to read 0x You’re not going to you’re good. You’re not leaving to like that experience, a that particular experience, and it’s been 20 something years, is the reason I started walking .NET Because of that sort of behavior.

Jamie

Yeah, it’s it’s something that I try to impress on, juniors, apprentices, mentees is that it may not initially be easy, because you’re learning all sorts of other stuff when you’re learning a new programming language. But the error messages are your friends, right? They will tell you almost precisely, not just the line, but their character where that error happened, right. So it’ll say like line 12 colon, it’ll say like a null reference exception. File Name 12. Colon three means the total third character and it’s like, pointing you exactly where you made the mistake, or your misunderstanding is, I think that’s, like you say, I think it’s it’s not an easy skill when you’re starting. But that’s there’s like, the one of the most important things I think that that juniors should should spend time on, not learn, but spend time on because I’m not saying you have to learn this. I just like spend some time with errors, make something break and see what it looks like.

Oren

Another important skill is when you write your code, throw the right L’s because if somebody is or they’ll throw exception What am I supposed to do with that? What happened? Why it happened? How could you fix it? Write it out. And in some cases, you spend more on Ill Ill crafting. To do that.

Jamie

Absolutely. No, I agree with you completely. I feel like personal personal opinion, I feel like the System Exception. Class should be like a base class or an abstract so that you can’t use just the basic one, you have to use a very specific one, right. But that’s my personal opinion, based on, you know, several years of building frameworks and applications because if you just throw a new exception it’s pointless.

Oren

killed that much about the exception type. Give me the, the, the description. The example is that if you are in a C++ program or not, so if you’ve got a C program and you want it and you get an L. E file if I permission denied, you read only which file?

Jamie

Yep, yep,

Oren

it suddenly realized that in order to get good, l heavin. In in C#, you have to be such a great example. If you go and look at OpenSSL and how they do error handling, then that you have sort of like exception, but in C will kill push items into an L stack, and then you can put that in. Okay, great. That’s wonderful. But that is the baseline that you need for proper requirements. For proper development over time, otherwise, audits No, I have no idea why go go look at the log file, try to correlate the timeframes.

Jamie

Yeah, I agree. What I will say, though, or is I’ve, you know, I’m very respectful of people’s time, right? I know that I’m kind of dropping everyone, right. Like this exciting part about exception design. And maybe we can come back and talk about that another time. But because because I’m using up a lot of your time this afternoon, and obviously, it’s a big ask for the listeners to listen through. A long conversation is a very in depth conversation. I’ve learned a lot so far. But what’s the best way for folks to find out about Raven dB, maybe a little bit more about yourself? Where can they go to find these information?

Oren

For myself, go to agenda.com ayend.com. That’s my blog, I’ve been writing there for ATD has also a forever the be revenue be .NET. This is the source of everything in there. And if you have any questions, afterwards, feel free to also email me that i n data.com. And I will be very happy to answer any queries.

Jamie

Excellent. Excellent. Well, like I say, I feel like I’ve I’ve got interrupted you mid flow that but I’m also one of the things as a podcast host is you’ve got to be really, this is peeking behind the curtains, right? You’ve got to be very involved in the conversation, you got to be very involved in where the conversation is going. But you also have to be very respectful of your guests time. And we’re, we’re almost out of time for the time slot that we created for this. So I’m just trying to be very respectful of that. Perfectly. Okay. Thank you. Hey, no worries. So yeah, what I want to say is, thank you ever so much for being on the show, because I really, I say this to everyone. And every almost every episode ends with me saying, I really appreciate the time and you sharing your knowledge, but I really honestly do. You know, I feel like there’s loads in this so far, that folks are going to be able to listen to and go, Oh, wow. Yeah, that’s a great idea. That’s a great point. Or maybe you know, your points about why are you optimizing too early? Or? You know, we didn’t you hinted at it, I think, but we didn’t really specify about making sure that the things that you that you need to optimize for measurable, because that’s what really

Oren

is just one last thought, if you’ve been in any sort of system, and you’re talking about performance, or a scale. So to start by defining an SLA service level agreement, and a good SLA, it looks like this 99.9% of the requests of a particular type would complete in a given timeframe, let’s say 200 milliseconds, as long as the number of users that they have does not exceed 500 users per second, something like that. Because that is actual that gives you a way to measure and you can monitor that. It is it just they’re putting a finger. Little finger pointed when see where we’re at.

Jamie

I yeah, I agree completely. And it goes, it goes not just for developing apps, but like, let’s say you’re trying to do something else in your personal life needs to be measurable. Because otherwise, you just say, Oh, well, you know, like if I said I want to learn Mandarin Chinese, right? Well, that’s great. But how much of it do you want to learn and what time frame? And how are you going to show that you’ve learned that much? Right? That’s that’s way off base for programming. But I feel like it’s the same thing, right? If you have a an SLA or some goal that you’re going to hit in a specific amount of time you said they X number of users the request so the request has to take less than this much time for a specific group of users. And that makes perfect that you’ll locking that down, you’re saying this is what we’re aiming

Oren

for. There is a whole range of research around gamification of activities. And a lot of that is the moment that you have measurable goals that are a concrete an actionable, you’re going to hit or at least you’re going to try and most people will actually hit that, that makes a world of difference.

Jamie

It does, it does. And I feel like we should leave the listener with that, because that’s a great place to leave it. So it gives them something they can action on and actually improve their their development practice. I think. So yeah. I mean, I said that earlier. And then we had this wonderful conversation about gamification and specific, measurable goals. But I really have enjoyed this conversation. And I feel like there’s loads here for, for the listener to really get out of it. So I want to thank you, for me, and I feel a bit a bit a bit presumptuous here. But I want to thank you for being a listener as well.

Oren

Thank you very much. It’s been a pleasure. Thank you very much.

The above is a machine transcription, as such there may be subtle errors. If you would like to help to fix this transcription, please see this GitHub repository

Wrapping Up

That was my interview with Oren Eini. Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at dotnetcore.show, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice - head over to dotnetcore.show/review for ways to do that - reach out via out contact page, and to come back next time for more .NET goodness.

I will see you again real soon. See you later folks.

Follow the show

You can find the show on any of these places