Episode 111 - RavenDB with Oren Eini

Embedded Player

The Modern .NET Show

Episode 111 - RavenDB with Oren Eini

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Transcription

Hello everyone and welcome to THE .NET Core Podcast. An award-winning podcast where we reach into the core of the .NET technology stack and, with the help of the .NET community, present you with the information that you need in order to grok the many moving parts of one of the biggest cross-platform, multi-application frameworks on the planet.

I am your host, Jamie “GaProgMan” Taylor. In this episode, I talked with Oren Eini about RavenDB, why he took the time to create his own NoSql database engine, and the fact that he built it using .NET Core before it was released (back in the pre-1.0 days, when it was known as dnx), and some of the optimisation stories that he worked on when creating RavenDB.

Along the way, we cover what the GC (or garbage collector) is, performance issues to look out for when dealing with large JSON objects, and some tips that he has for those who want to optimise their applications.

So let’s sit back, open up a terminal, type in dotnet new podcast and let the show begin.

Yep. So I’m Oren. I’m the CEO of a company called Hibernating Rhinos, we have primary product, which it’s name is RavenDB. I’ve been a Microsoft MVP for about, I don’t know, since 2007, or thereabout. Heavily involved in the open source space in the Microsoft world and in general. And for the past almost 12-14 years, I’ve been working on RavenDB.

And that started because I was building a, I was working with an hybrid project, which is an Object Relational Mapper. And I’ve worked with data all day long, every day, all the time. And you mentioned that when you think about databases, about oh, there is SQL Server or Postgres, those sorts of things. And for very long time, that was what it was to sell the database and everything else, you know, just maybe some one off solution or something like that. And what happened to me was that, at some point, I got looped into the consulting gig and, “come help us make our system better,” and all sorts of things. And I did that for somebody else. And I kept seeing the same set of issues over and over again. And I was dealing with databases all the time. So I saw the same problems. And I got sick out of that. And I started dreaming, “what happened if I could do something better? What if there have been if there was a bit of database?”

That was around 2008-2007. And that is the time where we saw the biggest move away from SQL databases, relational databases. And if you’re familiar with some of the names, though, are no longer they’ll react project Voldemort. There was hypersomnia. And some of them are still here, CouchDB, Mongo. There is another one that I cannot recalled right now; MindScape, I think but it’s not that one. And all of them will, okay. There are noSQL databases, but they will really tools for very specific purpose. And one of the primary issues that they had was that they gave up a lot in order to do what they wanted .And something that they gave up are things like transactions.

Now, if you talk to a developer, you tell them, “how can you live without transactions?” they might say, “oh, I can do this. I can do that. All sorts of things,” and that will work. But the problem here is that once you start to actually live in a world of transactions, the amount of opportunity you have to deal with just explodes. And it just wasn’t what people said, “oh, working with relational database is complex. So let’s work in another relational database,” but now you lost transactions. And that is not a trade off think is beneficial to anyone. So I wanted to have a transactional non relational database.

I teach copywriting course in university. And I was just talking to some students as Oh, we have to make sure that the system is scalable and can handle a lot of requests. And one thing that I asked him was, how much is lots of requests. And they had no film answer. Now those students do not necessarily have to read it. But building a system that can do 10,000 requests a second is quite simple. You don’t need to go crazy. If you want to go to the 100,000 to the million, you have to do some work. But again, I wrote 100 lines of code to simulate, readies. I eat, I eat a 945,000 requests per second in the most thing if possible way that I could think of. And the typical way that you want to process lots of things is what am I trying to optimize for? Am I trying to build a system that’s going to be humbled by lots of requests and have a very short latency to process the request? Okay, that’s what am I trying to optimize the latency of delivery? So am I trying to make sure that I can make changes and push it into production and have to optimize the that pipeline of a user request to its loving deal? A and in most cases, I would say that unless you have a strict requirement, and the need to actually have high performance. Don’t worry about it. Don’t worry about it, because mostly that I’ve saw, it’s far more that the things that slows you down are not necessarily the the cricket the intricacies of the interaction between a caching and the GC, but more or you have n squared algorithm facility never consider oh here you need to load six items from the database. You go to the database six times instead of what and a that sort of things. And the old adage about a Oh, make it work make it right make it work first is true. That said in many cases you can say upfront, what do you want? What are the requirements then you can design your system accordingly. Here is a relatively simple example. Going back to the reddest example, I’m reading strings form Do not walk. That means they have to do a lot of location when I pass in the street, and I have to do more locations. I can read them as bytes. But it’s vital. But then I have a question, how do I manage? How do I store the keys and the values? I always told them as many audits to restore them as native memory. How do I make sure that no one is referencing memory that hasn’t been in use yet and almost have other stuff like that? A, it’s very easy to take upon yourself a lot of complexity that you they you order silly wealth. Manual memory management means that you have far better control of what’s going on. But if you look at a satirical code dictionary, in C Sharp, it’s nearly impossible for you to write something similar in C or Z gwass or C++, you have to use something called epoch base, a garbage collection, which is an insanely complicated way to say you have a garbage collection in a language that does not support it. A that’s retrieval, a lot of things that the GC makes possible. Another thing that I think, is really important to understand the implications, you have hotspots, you have the things that you spend most of the time doing, and optimizing them give you the most bang for the buck.

And one of the things that I love about a C sharp dotnet in general, is that I have the option of saying here, I’m using a stalks in spam, and I’m doing no location whatsoever. Because this is the heart of what I’m doing. In here, I’m just going to do strings glow. And I don’t care because this is going to run once. A Oh, this is the this is all things that happened during system startup. Yeah, not not an important consideration for.

A Request To You All

If you’re enjoying this show, would you mind sharing it with a colleague? Check your podcatcher for a link to show notes, which has an embedded player within it and a transcription and all that stuff, and share that link with them. I’d really appreciate it if you could indeed share the show.

But if you’d like other ways to support it, you could:

Leave a rating or review on your podcatcher of choice
- Head over to dotnetcore.show/review for ways to do that
Consider buying the show a coffee
- The BuyMeACoffee link is available on each episode’s show notes page
- This is a one-off financial support option
Become a patron
- This is a monthly subscription-based financial support option
- And a link to that is included on each episode’s show notes page as well

I would love it if you would share the show with a friend or colleague or leave a rating or review. The other options are completely up to you, and are not required at all to continue enjoying the show.

Anyway, let’s get back to it.

Jamie

oh, yeah, absolutely. I think that similar to what you’d said, or when you were talking to the students who were like, We need to make it hit, you know, survive a billion people hitting it at the same time. When people come to me like, I’m like, Well, I mean, do you have two users yet, you might be optimizing a little bit too early, you know,

Oren

The above is a machine transcription, as such there may be subtle errors. If you would like to help to fix this transcription, please see this GitHub repository

Wrapping Up

That was my interview with Oren Eini. Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at dotnetcore.show, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice - head over to dotnetcore.show/review for ways to do that - reach out via our contact page, and to come back next time for more .NET goodness.

I will see you again real soon. See you later folks.

Episode 111 - RavenDB with Oren Eini

Embedded Player

The Modern .NET Show

Episode 111 - RavenDB with Oren Eini

Supporting The Show

Episode Transcription

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

A Request To You All

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Jamie

Oren

Wrapping Up

Useful Links