Episode 119 - Comparers with Stephen Cleary

Embedded Player

The Modern .NET Show

Episode 119 - Comparers with Stephen Cleary

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Transcription

Hello everyone and welcome to THE .NET Core Podcast. An award-winning podcast where we reach into the core of the .NET technology stack and, with the help of the .NET community, present you with the information that you need in order to grok the many moving parts of one of the biggest cross-platform, multi-application frameworks on the planet.

I am your host, Jamie “GaProgMan” Taylor. In this episode, I talked with Stephen Cleary about his Comparers library and how comparison and equality of objects in your code base mean different things to different people. For instance, one block of code may view equality as two different object instances with the same ID field, and a different block of code may view equality as a combination of other properties being equal. It’s all different for different people, for different consumers, right.

We also talk about the importance of unit testing in the comparers library and how writing these unit tests has sort of unearthed some interesting corner cases in the .NET BCL. Along the way, we discuss our opinions and guesswork regarding a potential corner case in the .NET BCL. But please do remember that neither Stephen nor myself actually work for Microsoft or indeed were involved in writing the original BCL. As such, our opinions and guesswork are just that: guesswork and opinions.

So let’s sit back, open up a terminal, type in dotnet new podcast and let the show begin.

Jamie

So, Stephen, welcome to the show. It’s wonderful to be connected with you.

Stephen

It’s great to be here.

Jamie

Thank you very much. Thank you very much. Now, we are going to talk about Comparers and the spaceship operator a little bit today, but I thought before we did that, would you mind giving us a bit of an elevator pitch to Stephen and some of the work that you’ve done or the open source libraries and stuff that you’ve been part of, just so the listeners get a flavour for: Who is Stephen and what does he do?

Stephen

Yeah, sure. So, my name is Stephen Cleary. I live in northern Michigan, in the US. So quite a distance from you, and I have been doing C# for quite a while now. Most of what I’m known for is work with async and await or other Asynchronous things. And I also have a few other open source libraries that are not really related to async and await, one of which is the comparison library that we’re talking about today. I have a few others as well, but my AsyncEx library is probably the most well known of the ones that I’ve done.

Jamie

Cool, okay. So I don’t think I’ve come across the AsyncEx library myself. So I guess before we start talking about Comparers and stuff, is that different to async/await in C# and .NET? Or was it like before, back when we had the parallel compute libraries? How does that all work?

Stephen

So, AsyncEx, what most people use it for is actually asynchronous coordination primitives, so so these days you do have SemaphoreSlim, which has async/await on it, and you can act you can use that as an asynchronous semaphore or semaphore however you say that, or a mutex (mutual exclusion object). And there’s also some other ones in there. So there’s some asynchronous collections. There’s a whole bunch of other coordination primitives like monitors, and I even do some condition variables which I have found occasionally useful, especially when implementing collections. So this was all, again before channels, but a lot of people still find those types useful even these days.

Jamie

Yeah, I like it. It’s always worth having more tools in the toolbox than you need, right. Because then you can pick not necessarily the right one, but the most correct one. Right?

Stephen

It’s a good way of putting it. Yes. And my library, if you look at something like System.Threading.Channels, it’s extremely efficient and it’s a fantastic library. That’s what I steer people towards these days. But the AsyncEx library is really geared more around correctness than efficiency and maintainability. So there’s not a whole lot of interlocked variables or anything really tricky like that with, like, lock free solutions. I use a much more basic and easier to maintain kind of approach with my own library because I have to maintain it.

Jamie

Sure. I think there’s definitely something in that. I feel like there’s a lot of people who produce very terse, almost laconic code that no one knows how to read it or what it does, but whatever it does, it does it very fast and uses hardly any memory, but no one knows what it does. Right?

Stephen

Yes. I recently did a video series on YouTube on TCP/IP protocol design, which is something that I learned many years ago, and it’s something you hardly ever use. But I wanted to do a video series on it because .NET had come out with pipelines and things like that, and so I wanted to use pipelines and channels and see what it looked like to build a TCP/IP application using modern techniques. And it was fun, but some of that code was a little confusing. You get into the span constraints and stuff like that. And I’d never really used what were they like read only refs and things like that that I’d never even used before. But it was fun.

Jamie

I mean, rather you than me. Oh, my goodness. I think that there’s a time and place for the incredibly terse write wants read only code I think Richard Campbell calls Perl read onlyl, right: when you’ve written it, you ain’t never going to change it again. And there’s a time and place for that. But most of the apps that most people work on, they need to be debuggable or at least readable, right?

Stephen

Yes, absolutely. The write once is a great way of describing that kind of code. You understand it when you’re writing it and then after that, hopefully you understand it still.

Jamie

Absolutely. I’m a very big proponent of making code readable. I’ve got it right here and the listeners can’t see it, but it’s in Code Complete 2 - a book that I read every couple of years - and there’s one bit that always leaps out of me and that is I’m not going to code it directly, but essentially the one thing we need to do as developers is make our code readable for other developers because the other developer could be you tomorrow, it could be someone else on your team. If you can’t read it, you can’t debug it.

Stephen

Yes. There’s some quote about that saying if you write code, I’m not going to remember it exactly, but essentially if you write the smartest code, you know how debugging is harder, then you’re not going to be smart enough to debug it. I love that book too. Code Complete is a fantastic classic.

Jamie

Absolutely. And with modern C# and the just-in-time compiler and all that kind of stuff, it’s constantly reevaluating and re-optimizing your code anyway, so you could spend loads of time ahead of time running really complex, difficult to debug code. And then what’s going to happen is at runtime, the JITter is going to step in and go, “I’ll make that better for you,” because there are hundreds of people, maybe thousands of people who, and I’m not going to say about you, but way smarter than I definitely waste. These are like Ph.D people, right? And these are compiler design people. They know their stuff. I’m going to let them do their stuff and I’ll just write hello world, it’s fine.

Stephen

Yes, the tiered optimization is something I just dove into more than I have previously. Just a couple of weeks ago, I knew it was there, but I never really played with it very deeply. But there’s a great blog post by Stephen Toub just recently on Performance Improvements in .NET 7, I believe is his latest one, and he talks about using this fantastic technique that he uses where as a generic argument, they pass a value type that implements a certain interface. And by constraining it to that interface, what they’re actually doing is forcing the compiler to create two different copies of the code, since it’s a value type. And then because it’s constrained to the interface, the JITter as it goes over that code, if it’s called multiple times, is really able to optimize it extremely tightly. So it’s just this fascinating little trick that they were able to do. And I was looking into that more over the last couple of weeks. It’s not the kind of thing I think I would have ever even thought of, but it’s quite interesting.

Jamie

Yeah. And the most fun part about that for me is we got all of that for free.

Stephen

Yes.

Jamie

And that just means that when you’re able to migrate to .NET 7, or indeed .NET 8 if you need the long term support. You get all of that out of the box for free. So technically your apps could possibly run faster in the long term or be more efficient, or use less resources or all of these things could happen and you get it free, you don’t have to pay for it. Which is mad.

Stephen

They’re always working on updating the compiler, updating the tiered optimization in the Just-in-Time compiler, and of course the base class library as well. I love the optimization post that they do because every time I think, “wow, that’s amazing, well, I’m sure it’s as good as it’s going to get now.” And then the next version of .NET comes out and it’s like, wow, they did even more.

Jamie

I often, not going to lie, I get about a quarter of the way through those optimization posts and go, “this is just numbers now, I don’t understand it.” There’s a bunch of stuff happening and my code goes in and it comes out better. That’s as far as I get.

Stephen

Just retarget the new framework and your code is faster.

Jamie

Absolutely. That’s it. Maybe they should put that as a tl;dr of every post, “change the TFM recompile. That is better. Fantastic. Thank you very much. Good night.”

Gosh okay, so we’re going to talk a little bit about the Compares library and there’s lots of stuff to cover with this and for the listeners, I’ve said this lots of times, there’s a bit of insider baseball talk. Before we do these interviews, we usually have a bit of a discussion about what we’re going to talk about and I’ve got loads of notes here and I hope we can get through them all because I feel like it’s quite a detailed topic, which is good. I like this, right?

Stephen

It’s more than it seems. Yes.

Jamie

Okay, so let’s talk about the Compares then. I do have a question much later on about carnage, because Compares is for comparing things, right?

Stephen

Yes. So my comparers library is geared around being able to define comparison or equality, which is like a subset of comparison on your own object types, that’s what it’s designed to do. Or even other object types. I guess it could be third party.

Jamie

Okay, so say I want to use the comparison library, that’s for a quality of two objects, can I just use ==?

Stephen

So, great question and it really depends on what you want to do first.

If you have an object type, then == by default uses a reference equality. So you have an object instance that is really only equal to itself and any other object instance, even if it has all the same properties and whatever is going to be not equal to that because they’re two different object instances. So you have a reference equality that’s kind of built into object types, and I actually prefer that I use that as much as I can everywhere in my code. Now, coming from a C++ background, that was very odd for me to have this kind of reference equality because in C++, references don’t really exist. You’ve got shared pointers and things like that. But really references are not widely adopted. It’s not the natural way to program C++. So when I first started writing in C#, reference equality seemed very odd. But now I prefer it for everything that’s my default go to.

But that’s not the only kind of equality that’s useful. Value types, as the name implies with values, they use value equality. So you have like two different integers. Even if they’re boxed, they are going to compare equal only if their values are equal. So these are the two kinds of equality that are built into .NET - or like the record types. Actually that’s one I keep forgetting about. But the record types use like structurally equality by default. So they’re equal if all of their properties or fields are equal.

But sometimes it’s really useful to extend this a little differently or define equality or comparison using different rules. So one that’s somewhat common is what I call ID equality or key equality. So you’ve got one property on this object and what you really want to do is, “say if that property, this ID property or key property is equal, then these two objects are equal.” So you’ve got this kind of key equality that’s also pretty common.

And a lot of people really like the structural equality. Again coming from C++. I did at first, but now I tend more towards reference equality. But I understand that desire for structural equality too. And that’s again where all the properties are equal, then that thing is equal. But then sometimes you want structural equality that isn’t quite all the properties. If you have metadata on your types, if you’ve got an e-tag - an entity tag - or a last modified date or something like that, well, you don’t usually want to include that inequality. So really what you want is structural equality except for these fields. And sometimes you want special values to be considered equal to other special values. So if you’ve got nullable integers, sometimes you want null being equal to negative one or like negative one and zero being the same if those are your invalid values.

So there’s all these different ways that you can compare things in different parts of your code. And I’m not saying that these are all a great idea, right, because if you have zero and negative one as invalid values, that’s not ideal. But sometimes you have the code that you’re given and you have to work within those boundaries and sometimes it’s nice to just define your own compares. The problem is defining your own compares is quite difficult. So that’s why I wrote the comparers library, is to really make that as easy as possible.

That was a really long way to go about saying first you have to decide what comparison even means to you. And I would even go so far as to say a lot of people, when they think about comparison or equality, they assume that there is an equality or comparison definition that makes sense everywhere. So a lot of people say, “well, I want reference equality everywhere,” or “I want structural comparison everywhere.” But really, I have always found it useful to have this part of the code wants to compare things this way and this other part of the code wants to compare these same objects a different way. And that’s actually really useful in some cases. The comparers library that I have allows defining different compares. It’s not just an object defining its own comparison. You can define a compare for other objects. And maybe this compare object is used by this part of the code and this other compare using this other logic is used by this other kind of code.

So that’s the whole goal of the library is to make that as easy as possible.

Jamie

I see. So maybe in my head I’m thinking if I’m and you can use compares in whatever way you want to build, but if I’m working domain driven design and I have a bounded context over here and my comparison over here. Like you said, in that banded context, my comparison is, “these three fields need to be equal, and this one needs to not be equal.” But in a different bounded context with the exact same perhaps models, it’s a different set of or maybe equality means something completely different in that other bounded context. Right?

Stephen

Yes, absolutely. And even things like object relational models will have different assumptions around that. Entity framework, which most people use kind of sidesteps it all by saying, “well, if you have objects with the same key, they are actually the same object instance.” And there’s some name for that, I forget reference identity or something like that, that they use to enforce that. So as long as you have one DB context and you’re getting those entities out of that DB context, if it has the same key, then that’s actually literally the same object instance. But not all ORMs follow that. Sometimes I’ve actually seen some homegrown ORMs that use equality to mean, “has anything changed?” So it’s literally comparing all the fields except for the ID field to see if anything’s changed. And there are other ones that use ID equality to say, well, “if these reference the same underlying row, then if the keys are the same, then those are really the same row, even if the values are different.” So there’s many different ways to look at equality. Even if you look at other ORMs - object-relational models - sometimes equality is used for things that can be a little surprising. And this just with entity framework, for example, you are not supposed to override equality on your entities because entity framework uses reference equality and it depends on that. So if you want to compare them, you need a different comparer, which is exactly what my library provides.

Jamie

It’s not in any way the same, but I remember when I first started doing seashell development, it was really my first proper development. Before then, I’d done like, homegrown stuff. I’d be like, “oh, I’ve got a C++ stock and I will learn the C++.” Or before that, it was a little bit of assembler. So I was like, I will do the assembler stuff. Because when I was at college so college over here in the UK, 16 to 18 years old, I did electronics and we had a Motorolla 68k. This isn’t really that long ago, but the board was brilliant. It’s called a Flight 68k. It’s brilliant. Motorola 68k with about two, I want to say two megabytes of RAM and all that kind of stuff. You have to boot it. You had to drop out of windows and boot into a DOS IDE to be able to program for it. It was brilliant. And he used parallel port and everything amazing, tt was.

But coming from that background, when I hit C# and I was like, yeah, but ==, it means is it equal? And this object is not equal to that. And I used to fall over on that all the time when I was first studying until, like you said, until somebody said, no, those are references. Think of it similar to, but not the same as references in C because they’re referencing, like a memory address. I get it now, but no, it wasn’t written anywhere in the C# documentation or released the training stuff that I was going through, that a string is immutable and it is a value type and it lives up here, or either one of these or all of those, and reference type and stuff like that. I was like, “oh, well, I don’t understand it. Why is this object not the same as that one? It should be the same because it’s not.” So, yeah, I think just comparison just means different things to different people regardless right. Regardless of the domain you’re working in and regardless of the language and the tools you’re using.

Stephen

Absolutely. Yes. It means different things to different people.

Jamie

Absolutely. So is that where the idea came from then? You were just sitting there working with some objects one day and was like, “I wish I could just compare these things in this way without having to override = and write a whole bunch of C# that does that equals?”

Stephen

I don’t actually remember what I was working on when I started this project, but I know that I was implementing I think it was - I don’t remember physicality or comparison less than greater than that. I don’t remember which one it was, but I was doing it for a whole bunch of objects over and over again. And it was key based. You know, I remember that. So it was, you know, “these objects are considered equal if, you know, these two or three keys are considered equal or whatever.” And I was just doing it over and over again and I realized, “okay, I’m writing way too much code to do this because what I want to do is say I compare these two objects using this key.” That’s all I want to define. That’s what I want to do. Nothing terribly complicated, but say an object doing it itself, right? If you’re defining a natural comparison on that object, then you have to override Object.Equals. You have to override Object.GetHashCode(), which a lot of people forget. You have to implement IEquatable<T> and implement the equals off of that. And all three of these are essentially using the same logic. And there’s patterns you can do so that you can forward the object equals to ID cradles equals. And that’s fine.

But you always end up with duplicate code in your GetHashCode for your equals. And then if you’re doing operator equals and operator not equals, that’s even more boilerplate code that just forwards to the same code. So it just ended up being a lot of code with a fair number of pitfalls, especially between equals and GetHashCode. And if you’re doing comparison, you’ve got the compare off of I. What is it? IComparable<T>. And then there’s also IComparable without the T, which is also necessary in some cases, especially if these are objects that are going to be used by any user interface because a lot of those use the non generic interfaces. And at the end of the day, it was just there’s so much code just to get two objects to compare based on one field. It was just kind of ridiculous.

So that’s where the compares library was born was I had to do that too many times, so I made it into a library. And now the compares library, just to give an example, you can say, “I want a comparer builder,” ut uses like a builder API, a fluent API, “I want a comparer builder that will compare these types and then order by this key.” And that’s it. And it gives you a compare that implements IEqualityComparer<T> and an IComparable<T> and a non-generic IEqualityComparer and an IComparable. So it does all of that for you and you don’t have to mess around with a bunch of code.

Jamie

See, that’s what I really like when someone takes, “hey, this is a pain point for me, I’ll wrap it in a library, right?” Because although I’m not always against Don’t repeat yourself (DRY), but if you’re repeating yourself multiple times and there is no need for the code to be repeated, then totally wrap it in a library and use it elsewhere. But I’m also very wary of that advice because I’m very much a case of, “if the code needs to be duplicated because maybe there’s some nuance and you don’t want to have 1000 if statements to cover that nuance.” Right?

Stephen

Yes, there’s a few pitfalls. You can go in that way too. Right. If you’re trying to make a method too generic and you end up with a long parameter list or a parameter list with booleans, for example, that’s kind of like a flag. It’s not necessarily bad code, but it’s an indication that maybe this method is trying to do two different things if it takes a boolean parameter.

Jamie

Absolutely. Absolutely. On your on your point about overriding the hash code, I always remember of my first programming job whenever I need to do that, you know, I would always be told, “just put, just put, you know, multiply by 13. I’ll do modulo, you know, multiply by 13 or modulo 13, that’ll do that’s equality.” I’m like, “why do you want me to do that?” “Just do it. Because that’s a short version. That’ll do.” And I feel like you could fall into pitfalls overriding that hash code because you really need to think about what the properties are and maybe the order of precedence. Right. Is the ID field more important or whatever? So you really need to actually design that GetHashCode rather than just yeah, they’ll do. Number times number equals number return.

Stephen

Yes, you really do need to design it. I don’t know if that’s the most common pitfall with implementing your own equality, but it’s definitely a very common one. Because okay so first thing, it’s not obvious,“I am going to implement equality. I’m going to override equals all right, I’m done.” Maybe I’ll implement IEquatable, and that’s it." It’s not obvious that there’s another method called GetHashCode, something completely different that doesn’t sound like it has anything to do with equality, that you actually have to keep in sync with that same equality logic or else different algorithms will break.

So these days, it’s a lot better than it used to be. The IDE will warn you, and I believe both Visual Studio and ReSharper will say, “hey, you did not implement GetHashCode. You gotta do this.” So the IDE is definitely better than it used to be with those kind of warnings. But it doesn’t really give you a lot to go on either because I’ve seen GetHashCode implemented incorrectly as well. And combining the hashes is not entirely straightforward. Although there’s a new type, I don’t know if it’s in .NET 6 or 7. I think it’s in .NET 6, maybe even earlier. There’s a HashCode type that’s specifically for combining hash codes, which yeah, so it’s specifically like, “use this to implement GetHashCode.” That’s the implicit assumption, I think, behind that type.

And my comparers library actually does not use that yet. One reason is my comparers library targets all the way back to .NET Standard 1.0.

Jamie

Right? Yeah. Which doesn’t have all of that lovely, wonderful stuff.

Stephen

No, it doesn’t. That’s actually quite old code, I think .NET Standard 1.0 might even support like, Windows Phone. Like it’s that old?

Jamie

Yeah, I think it does. I think it supports that. I think I’ll look it up in a second, but before I look it up, I want to say it supports as early as .NET 3.5 - .NET Framework sorry, 3.5 - because 3.5 has a compared to the rest of it in my head, it has a very strange lifecycle. Like its support goes into the 2030s and all of the other versions around it have been dropped. But I wonder if that’s because of like embedded stuff, it doesn’t really matter. Right. Microsoft is they can support things however they want. Right.

Stephen

I’m going to guess down that 4.5.

Jamie

That’s have a look.

Stephen

3.5 added LINQ and stuff. 4.5 added async and await, which is really a fairly major language change. So I think .NET Standard…

Jamie

You’re absolutely correct. .NET Standard version 1 supports as early as .NET Framework 4.5. So, yeah, you’re right. As of today, as of me reading this.

Stephen

Right. Not that that standard even really matters a whole lot anymore. We’re moving into the one .NET world at this point.

Jamie

Absolutely. We’re finally reaching that point.

Stephen

Yes.

Jamie

Excellent. Okay, so I have in my notes that other languages have a thing called the spaceship operator. And I remember that that was because when we first got chatting, we were actually discussing the spaceship operator and I was like, “I’ve never heard of this. This is amazing. I’m going to go look this up.” And for the folks listening, a spaceship operator, if you imagine it typed into your IDE, it’s a less than symbol, an equal symbol, and then a greater than symbol (i.e. <=>) because it looks like a spaceship. I love it, but C# doesn’t have that at the moment.

Stephen

No, it does not. I actually looked it up a few days ago when we were preparing for this talk and apparently one of the language design meetings had a quote. If you go and search in the C# language repo, I just thought it was a great quote. So I was searching in there for spaceship operator, you know, see if anybody was really even thinking about it. But there’s this quote from earlier, early 2022, where in one of their meetings, one of the members said, it’s not attributed, so I don’t know who said it, but one of the members said

My new goal is to be able to say C# 11 has a ‘spaceship operator’. I don’t care what it does

- (not attributed) C# repo ;C# Language Design Meeting for January 24th, 2022

And then one of the discussions, somebody recommended that it be used as like a forwarding operator. So if you’ve got like an interface or something, then you can say, “I want to implement this interface using, you know, forward all calls for this interface to this contained object,” which I would really love, actually. I would love some kind of forwarding syntax and C#.

Jamie

I can see why they would have chosen the spaceship operator because it’s like, “beam them up, transfer them over here and drop them down,” right. Or am I thinking too literal about this?

Stephen

Yeah, I love the spaceship. So for my comparers library, the logo is actually like literally a spaceship because it’s kind of sort of if you squint it, is kind of a way of implementing a spaceship operator. So it’s implementing, not using on the implementation side. But yeah, looking at the Spaceship operator in other languages, like not as a forwarding operator, I don’t care what it does, but in like C++. And I think Perl even has it. I’m not experienced in Perl at all, but I believe Pearl has it and C++ and I’m sure several other languages.

So if you use the spaceship operator, what you’re really doing is saying, “let me know if this is less than, equal to, or greater than.” Hence the, you know, the keyboard, the spaceship that it looks like, “let me know how this object compares to another object,” with one line of code. And we kind of have that today in C# .NET, if you just call .CompareTo() so A.CompareTo(B) gives you less than zero zero or positive than zero, which is usually what the spaceship operator does in these other languages. Now, the implementing the spaceship operator is where things get more interesting.

So in some languages - like I was looking at this in C++ a bit ago. I am way out of date on C++, just to be clear. That was my first programming language. But these days, oh, the language has gone so far beyond me. But I try to read some things and their spaceship operator, if you implement it, then by default, it also implements all the other relational operators. So you can say, “you know, I’m going to define equality or comparison for my type based off of,” I don’t want to say an ID field because that’s a bad example to use for comparison. You compare you compare identifiers by equality, not by less than. That’s not useful. But if you were comparing people by name or something like that to sort people by their name, then that’s the kind of thing where you could say, “I’m defining comparison on my person to be based on their name.” And that’s like where the spaceship operator implementation can can come in.

Looking for a reliable and efficient data connectivity solution? Look no further than dotConnect by Devart! Devart’s cutting-edge products provide seamless integration with a wide range of databases, allowing you to streamline your workflow and increase productivity when working with .NET.

Whether you’re a developer or a business owner, dotConnect has the tools you need to succeed. So why wait? Try dotConnect today and experience the power of reliable data connectivity!

Use promo code NETCOREPODCAST for 10% discount.

Learn more on their website at devart.com/dotconnect.

Jamie

So I guess then in an ideal world, we don’t really need a spaceship operator in C# because we got .CompareTo() and we got the comparers library, right? Or do we?

Stephen

Well, so I have a bit of an interesting history with several of my libraries that I’ve written, then the C# language or the BCL itself eventually gets those same things, and then my library is no longer useful, which I am 100% on board with because then I don’t have to maintain it anymore. So there have been several libraries that that’s happened with, and even AsyncEx, if you look, there’s a .NET futures project, which is kind of if I understand it right, I’m not sure on this, but if I understand it right, I think it’s actual, like, BCL team members essentially, like, playing around with what future parts of the BCL may look like. And one of those things that they have in there is Asynchronous coordination primitives, which is almost all what my AsyncEx library is used for. So who knows, in another .NET version or two, maybe AsyncEx is going to be no longer necessary. And maybe in a future .NET version, if we have a good spaceship operator implementation, most of my comparers library wouldn’t be necessary. And I don’t mind that. I’m 100% on board with these ideas entering .NET proper, and then I don’t have to maintain them anymore.

Jamie

Yeah, it’s also a sign of a really good idea as well. Right? Because if the team adopted, then clearly you were doing something right.

Stephen

That’s a nice way to think of it. Yes. I was just going to say, usually they adopt it without actually having seen mine.

Jamie

That’s why none of my code will ever end up in the .NET bass class libraries, because it’s not good.

Stephen

So that’s not true.

Jamie

It’s very kind of you to say.

Okay, so, yeah, we’ve talked about how the comparers library solves a certain problem and that comparison doesn’t mean the same thing to everyone. And we’ve discussed a little bit about the spaceship operator and how if they bring the spaceship operator in, maybe comparers can perhaps be retired. Or maybe you can just live as an alternate way to do things. Because like we said earlier on, it’s always good to have multiple tools in the toolbox because two different screwdrivers will drive screws - you are just applying rotational force to it. But a Phillips head screwdriver cannot work with a Torx head screw. Right. So you need the different tools anyway, because there’s always going to be some small difference or some semantic difference that you can do easier with one library than you can with another. Right. So long live compares, is what I say.

Stephen

I’m sure it will continue to live if only because I’ve taken one interesting difference in opinion from the BCL authors or the .NET team, and that’s actually around how null is handled. So yes, talk more about that if we want.

Jamie

Absolutely, let’s do it.

Stephen

Okay.

Jamie

All right. I feel like null is known as the million dollar mistake, but everything in memory has to be set to something, right? And I’m happy with no being a thing. I’m happy with it. A lot of people aren’t, but I’m happy with it.

Stephen

It’s useful. Yes.

If you look at the design of the .NET BCL, and especially around how equality and comparison works, it’s not obvious at the beginning, I don’t think, but it becomes pretty clear that null is treated as literally as no value. And that’s not something that I realized when I started writing my comparers library. So, for example, one of the constraints around Equals and GetHashCode is if you pass two equal objects into Equals, then you pass both of those objects into GetHashCode. they have to have the same hash. That’s a constraint, it’s a requirement on how you implement GetHashCode. So the question is, when you pass two different things to equals, what is actually a value? And it comes down to this kind of semantic thing, and I read the documentation as saying one thing and the .NET team read the documentation as saying something else. So there’s some interesting history here. So if you say if you call .Equals and you pass null and null, that is actually allowed. It’s right there in the documentation of equals that null has to be equal to null. And if you pass null into GetHashCode - okay, so we’re going to follow that rule of if we pass two things into equals, then if we pass them into GetHashCode, they have to have the same hash code. Well, if you pass null into GetHashCode, that’s actually invalid. So it actually has undefined behaviour.

And the reason for this is I was seeing this as well, if you pass two things into equals, then you should be able to pass them into GetHashCode. The .NET Core team was seeing it as if you pass two values into equals, then you pass those values into GetHashCode, then they should be the same. But null is not a value. And I was saying, well, “null kind of is a value because you can assign it to a variable, you can pass it as an argument. So it is a value even though it kind of means missing value,” right. From the .NET team’s perspective, equals and you pass two nulls to it is allowed and they’re always compared as equal. But if you pass no one to GetHashCode, that’s invalid. Now, some of the comparers in the .NET BCL will return a value. If you pass null to GetHashCode, it will return like some value that it always returns for null. It’s like hard coded other compares in the .NET based class library will throw an exception. So that’s kind of like this interesting edge case behaviour of something that I didn’t really pick up on until I was fairly well along. I’d already written my comparers library, assuming that null was a perfectly valid value. And then when the .NET BCL was updated with all the nullable annotations, “you know, this can be null, this can’t be null.”

When when all of that happened a while ago, then my comparison library started having a lot of trouble upgrading to nullable because I was allowing null into GetHashCode everywhere and really the underlying implementation was very dependent on doing that. So yeah, that’s one of the interesting edge cases and one of the digressions. I thought about it and I decided, “I’m going to stay allowing null.” So the .NET BCL, if you compare, knows no is always less than everything else with my comparers library. You can say no is equal to something else if you want, or you can or you can compare no as larger than everything else. So if you have like a sort of list, all your nulls end up at the end instead of a beginning, that kind of thing. Sometimes it’s just convenient to toss the knolls at the end or to compare them as equal to zero if that’s your invalid value or something like that. And the comparers library allows that. I doubt if a future spaceship operator in C# would allow that because of the historical treatment of the .NET team as null is not a value, it can’t really be compared to anything.

Jamie

Right, see that’s that’s really interesting. Like part of me thinking is that, I mean, we would know, right? We’re just we’re just waxing ideas here, right? I think that’s probably the wrong way to put that. But I wonder if that’s a combination of everybody forgot at some point whilst implementing different compares or parts of the BCL, or maybe that’s one part has evolved whilst another part has stayed - I don’t want to say “stagnant”, but this would have been early on in the.net BCL’s lifetime, right? So this would have been pre 1.0, but I wonder with everyone working on it so rapidly, I wonder whether just a decision has gone one way and a different decision has gone a different way and we just have to deal with this undefined behaviour. Like you say, for GetHashCode.

Stephen

You know, I have no idea on the history, just flat out say that. But I suspect that it comes from the old like pre-generic world. So .NET 1.0 - before I ever used .NET, I don’t think I even started using it until three 3.5 somewhere in there. But long before I used .NET. Before generics were there. I think that the original idea around GetHashCode not accepting null was because they made a decision that your dictionary keys cannot be null. And I think that that’s really the root of all this because GetHashCode is mainly used by dictionary and other hashset kind of containers. I think that that’s where that’s originally coming from, is they wrote a dictionary, you didn’t want to deal with null keys. And so to them that if your key doesn’t have a value, does it make sense that it’s even in the dictionary? I can understand that.

Jamie

Right.

Stephen

So they didn’t support null keys and then this whole idea of GetHashCode not supporting null kind of just grew out of that. That’s my assumption, but I really don’t know for sure.

Jamie

Right. And usual standard disclaimer that we throw out for these episodes is that’s just us to talk in and we don’t know, right?

Stephen

Right.

Jamie

We weren’t there, we weren’t part of that decision. A decision was made and now we’ve all got to deal with it.

Stephen

Yes, and who knows at this point if anybody would even remember or if it was even written down, they were moving so fast back those days.

Jamie

Totally.

Stephen

Who even…

Jamie

The reasoning behind your guess makes perfect sense. Because if you think of a dictionary as a physical object, ie. A dictionary for a language, you’re not going to have an entry that doesn’t have the word exists, but the definition sorry, the word doesn’t exist, but the definition does. You don’t have a definition for a word that doesn’t exist. Right, so I totally get that. Whereas you can have a word without a definition that makes sense. You can use a word without having a defined meaning, but you can’t have a defined meaning for a word that doesn’t exist.

Stephen

Yes.

Jamie

I guess you can because people use the term schadenfreude, but you know…

Stephen

Great way of putting it. Yes, the physical dictionary.

Jamie

Cool. Okay, so that was one of the things I was going to ask you about with the comparers library. How does it deal with nullables? Is it just kind of the same as like as your non-nullable type so you can say, “hey, when I pass you a nullable foo, it is equal to a nullable bar if both of them are null,” right? Is that kind of just how it works? Or does it get really complex around that?

Stephen

So you can make the logic as complex as you want with the comparison library, you can treat nulls specially. However, by default it will do the .NET style comparison of nulls: “hey’re less than everything else, they’re only equal to each other.” So because that’s a very useful default behaviour.

But the comparers library does allow you to override that. You have to pass a special parameter to whatever method you’re using to say, “well, actually I want special null handling here.” Because with the comparers library you would define a comparison, say, a key comparison as we’re comparing people. You would say, I want a compare builder for people .OrderBy and then you would pass x, you would pass a lambda into that. Just x => x.Name. Well, most usage of that, people would be surprised if x was allowed to be null in that, even if they were comparing nulls. So by default it does the null comparison logic before it even calls into that lambda. So the x is known not to be null. But you can also pass a, “hey, I want special null handling here.” And then you could say x => x != null, for example, or x => x == null. If you want to compare, say, like nullable integers to go back to your example, if you want nullable integers, but you want all the nulls at the end of the list, you want to sort this list and have all the nulls at the end, you can create a key compare that goes to x => x == null. And what that does is it really just compares true and false and true is always considered greater than false. So if it’s null, then they go to the end and you have to pass the special null comparison flag to say, “hey, I actually want nulls in this lambda.”

Jamie

Right, okay. So it’s kind of built in and supports them. Ok.

Stephen

Yes, it works just fine, but also I strive for like a reasonable default behaviour. But if you want to do something fancy with nulls, you can opt into that.

Jamie

So I can imagine writing tests for this whole thing if you’re allowing anything to be compared in any way possible. I can imagine there are about a million lines of tests for each thing that you could possibly set up, tight.

Stephen

There are not a million. I didn’t actually count the lines of tests, but there are more than 1,500 tests currently.

Jamie

Wow.

Stephen

Yes. Now a lot of those are are tests that, that essentially test different kinds of comparers in the same scenarios. So they’re kind of like templated tests. It’s been a while since I looked at that test code, but I use this really funky, like, xunit style of here path and all these different things, but I’m actually passing in like lambdas and things like that as my test data in order to it’s pretty complicated, the test code. But the end result is there’s over 1,500 tests and they test, as far as I know, every possible edge condition, because there are many. This is by far, by the way, the most tests out of any of my open source libraries. Compares has more tests than anything else. And it didn’t start out that way.

It started out as, “oh, I’m just going to implement this comparison library and I’m going to do this and it’s going to work.” And I had some basic tests and that was it. And then people started using the library in unusual ways and then I realized, “oh, I’m not covering all my edge conditions.” And so a few major versions ago, I think, around Compares version 4 or 5. It’s currently on version 6, major version 6. And I had to go through all these endless edge conditions just finding all kinds of stuff and yeah, yeah. So that’s when I got tons and tons of tests into this library to handle all this stuff and fixed all of the little tiny, weird edge condition bugs.

So to give an example, if you have an equality compare for integers, let’s say, and compare supports non generic interfaces as well, which again is used by a lot of user interface or like very generic kind of code. So it also implements IEqualityComparer without the T, without the generic argument. So what if somebody casts it to that interface and then creates a new object instance, that has nothing to do with integers at all, creates a new object instance and passes it to the Equals method on this equality compare that’s designed to compare integers, right? A value type. What’s even the most reasonable thing to do in that situation? It’s crazy, but there are like occasional situations where things like that will happen, like usually deep in some library code that’s using this as an IEqualityComparer.

And so we I had to decide how are we going to handle this and all these different kinds of different kinds of edge cases. And I found some interesting framework behaviour where they handle the edge cases differently in different compares. Like I mentioned the GetHashCode and null, there’s if you do a String.CompareOrdinal(null.GetHashCode()) and all it will throw. But if you call IEqualityComparer<string.Default> call GetHashCode on that, which is really the same compare, right - the default compare for strings is an ordinal comparison, but if you call GetHashCode on the equality compare default, it does not throw, right?

Yeah, there’s some fun edge cases around nulls. And what if you have a compare for a value type and somebody passes objects to it? What if they’re reference equatable? Should it return true or should it throw an exception? So the comparers has now, as of version 4 or 5 or whatever, we’re currently in 6 as of now, the comparers library has consistent rules for how it handles all these weird edge cases so that the behaviour is very predictable. And I love that. Although it did require again, this library has more major revisions, I think, than any other library I’ve written. So it’s on version 6. That means it had 6, well, 5 backwards incompatible breaking changes in the API, some of which were necessary to fix some of these corner edge cases. But yeah, it just ended up being a lot more complex than I thought it would be going into it. You wouldn’t think it would really be that complicated.

Jamie

Yeah, like we we started this conversation and well, okay, we started the behind the scenes conversation and I was like, “okay, quite literally, why would I use this versus ==?” And you’re absolutely right, it all depends on what the comparison means and about those edge cases, because those edge cases define what your comparison is and it then perhaps allows the developers, the designers, the implementers of the code or whatever, the chance to actually sit and think, “what does this actually mean? Maybe I have to go back to the domain expert. What does it mean if this set of circumstances leads to this value?” Rather than it being something that is a bug that’s called in live. You can then, because you’re using comparers, you could wrap it in a unit test and say, “right, okay, everyone, look, here’s my unit test. What am I asserting here? What should be correct behaviour?” And then you get the sign off and then that becomes a regression test. And then somebody passes in a null or somebody passes in the word Jeff, and it’s supposed to pass, but it fails. Well, guess what? You’ve now got your regression test you can fall back on. I like it. It’s a lot better than just using ==.

Excellent. Okay, so talking about comparers and perhaps about yourself as well, are there any links that you would suggest people check out to learn more about compares and the other libraries you’ve done? And maybe different people do different things. You mentioned earlier on you were doing a YouTube video. So this is essentially a chance for you to, “let’s do the promo stuff and talk about me,” and totally please do that. And what I’ll do for the people listening is whatever you mentioned, it could be about you. It can be about, “hey, I want to learn more about comparison in.NET,” whatever it is. I’ll get a bunch of links and I’ll throw them into the show notes. So whatever you’re about to say now, this part of the transcription will be full of those blue wondered words. Go ahead and click it.

Stephen

Okay, well, obviously the comparison library itself. So I’m Stephen Clary on GitHub - with a ph - And let’s see. So I’ve got a blog stephencleary.com. I actually have not written anything on comparison yet. I’ve been planning to write this forever. I’ve got all the notes and everything all written up, but it’s such a huge topic. I can go on, and on, and on about comparison and all the different things, all the different ways it can be done and the different ways you can implement it. And trying to get that into a blog post that’s also useful is surprisingly difficult.

Jamie

Yes.

Stephen

I don’t actually have at the point of this recording, I don’t have any blog posts up on that yet. Maybe I will by the time this gets published. We’ll see.

Jamie

I wonder if and this may make more work for you. I’m not holding you to this at all. Perhaps a book, if a blog post isn’t going to work

Stephen

A whole book just on comparison, it seems like real overkill there.

Jamie

Like I said, I’m not trying to make extra work for you, but maybe I don’t know.

Stephen

I’m sure you could. To be honest, if you get into much of the comparison, there’s some good articles online, and I don’t know any of the specific ones off the top of my head, but there’s a difference between mathematically equality and equivalence, and some of that plays into some of our comparison when we start looking at what we want equality to mean. So there’s equality versus equivalence in mathematics, and then in .NET, we have this additional concept of identity - so like reference equality. So you’ve got equality and equivalence and identity, and really most of the time what we’re doing is equivalence, mathematically speaking. But then there’s all kinds of there’s endless rabbit holes you can go down like, “what exactly is a partial sort as opposed to a total sort? And which things in the .NET actually assume a total sort and which ones don’t?” I think C++ is better defined there, that they have the notion of whatever it is, the “strict weak ordering”, which you have to wrap your head around. Sure, but it’s very specific. .NET is a little bit more hand wavy, say, well, we’re going to compare things, whatever that means. Sometimes it doesn’t make sense to do like a less than or comparison. Sometimes you can only define equality, and the comparison library supports that. But anyway, there’s just so many rabbit holes you can go down with that. And it’s not really something I understand completely. Every time I have to look up, “okay, what is a partial ordering again? What’s a strict week ordering?” I never remember. I don’t know if I’m really the right person to write that. If it was a book, I guess.

Jamie

Okay, well, like I said, I wasn’t trying to make any extra work for you, but I was thinking it’s bigger than a blog post, maybe a series of videos, I don’t know. Like I said, I’m not trying to make work for you.

There you go. Excellent. Okay. Yeah. Have you any other links? Maybe people can reach out to you on Twitter if you’re on there or something?

Stephen

Yes, I’m on Twitter sometimes. I tend to treat it as a publish only platform because it can be a little toxic. But you could reach out to me on Twitter. I will see it sooner or later. I’m at aSteveCleary. So that’s my Twitter handle. I have a YouTube channel, which I kind of I try to write - I try to do it once a week or so, although it’s been quite a while since I’ve done it, where I just I code whatever, some open source stuff and I… yeah, we’ll see if that continues. I don’t know. It’s not something I’m I’m like I’ve been very consistent on so far, but I’ve got a YouTube channel: Cleary Coding. I don’t remember it’s something like that. I’ll put a link to it on my blog because I don’t think there’s even a link there now.

Jamie

Yeah, and I’ll get a link to it in the show notes. So if you’re interested in finding out more about Stephen, click all of those links and I would say give the comparers library a shot because it is. My goodness, it’s so good.

Stephen

I hope it makes your life better.

Jamie

It certainly did for me. That was something I did a few weeks ago as I brought it into a piece of code for a client and I was like, “okay, right, do you all these hundreds of equality classes you’re writing? Yeah. You get sorted, let’s get on with this.” Because at the end of the day, we’re not getting paid by the line of code, we’re getting paid for implementing features. So let’s implement the feature faster. Right?

Stephen

That is the goal.

Jamie

Well, thank you very much, Stephen, for being on the show. I really appreciate it. And there’s loads of stuff that I’ve definitely picked up up on, not just the corner cases in .NET, but like the fact that equality is different to different people, or rather not equality, I think comparison is a better phrase to use. The comparison of two things in memory is different to different people. It will mean different things to different people.

Stephen

Yes.

Jamie

And yeah, having all of those different.

Stephen

I was just going to say even different parts of the code. The same person writing two different parts of the code. And sometimes it’s really nice to have two different meanings of comparison.

Jamie

Absolutely. And I really liked this idea of having multiple different tools in the toolbox, that maybe a library is already partially supported by the framework or whatever that you’re using, but maybe the library does things that the framework doesn’t, keep them both around, because there’ll be something in both that you will use. You’ll be like, it’s the same argument with Newtonsoft.JSON and System.Text.Json, right?

Stephen

Yes.

Jamie

One is heavy duty. It’s the comparers of JSON. Right? It does all the things really well and the other one, it does them quite well. But there are some edge cases that don’t work that’s that’s your == in in this instance. In this metaphor, that’s your == that’s System.Text.JsonON. Newtonsoft.Json is the comparers of JSON libraries, is how we’ll put it.

Stephen

It’s a good comparison.

Jamie

And that may mean different things to reveal. Anyway, I call it quits on this one just because I’m going to end up running out of things to say. But like I said, Stephen, thank you ever so much. I’ve really appreciated it and I’m sure the listeners will do.

Stephen

Thank you for having me, Jamie.

Jamie

You’re very welcome.

Wrapping Up

That was my interview with Stephen Cleary. Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at dotnetcore.show, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice - head over to dotnetcore.show/review for ways to do that - reach out via our contact page, and to come back next time for more .NET goodness.

I will see you again real soon. See you later folks.

Episode 119 - Comparers with Stephen Cleary

Sponsors

Embedded Player

The Modern .NET Show

Episode 119 - Comparers with Stephen Cleary

Supporting The Show

Episode Transcription

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Sponsor Message

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen

Jamie

Stephen