S06E23 - Generative AI for .NET Developers with Amit Bahree

Embedded Player

The Modern .NET Show

S06E23 - Generative AI for .NET Developers with Amit Bahree

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Summary

In this episode, Jamie and Amit discuss the impact of Generative AI on .NET developers and the importance of understanding how it works. Amit shares insights into the various ways AI is used by search engines like Bing and Google. He emphasizes that with AI-powered tools such as GitHub Copilot, developers have more control over the output compared to using search engines.

Amit highlights the benefits of having access to information about what the AI is doing when using copilots, allowing developers to make informed decisions on what is right or wrong for their projects. He also mentions that while Microsoft Azure provides an excellent platform for Generative AI, there are other companies offering similar services.

Amit encourages listeners to check out his book, “Generative AI in Action,” which will be available soon from Manning Publications. He can be reached through social media or his blog (blog.desigeek.com). The episode wraps up with Jamie’s recommendation of the book for .NET developers interested in understanding and utilizing Generative AI.

Episode Transcription

Maybe start with Generative AI. As you, I think, touched on, it’s different from what we call "traditional AI." And I also want to acknowledge the term "traditional AI&quotl is very odd to say it’s not traditional. It’s very much prevalent and relevant and active

- Amit Bahree

Welcome to The Modern .NET Show! Formerly known as The .NET Core Podcast, we are the go-to podcast for all .NET developers worldwide and I am your host Jamie “GaProgMan” Taylor.

In this episode, Amit Bahree joined us to talk about what generative AI is, what it isn’t, and how it’s different from, so called, “traditional AI”. He also talks through his new book “ Generative AI in Action by Amit Bahree,” a book that I had the good fortune to read ahead of publication and can definitely recommend.

I also had to ask him the “million dollar” question that has been on a lot of our minds since generative AI seemingly burst onto the scene out of nowhere in 2023:

I’m not asking is it going to replace an engineer, but like, can an engineer for now just ignore it a little bit?

- Jamie Taylor

Yeah, no. So, no, it’s not replacing any engineers, I can tell you that. No.

- Amit Bahree

It’s important to note that Amit and I recorded this conversation at the start of May, 2024. Whilst there’s very little that we talked about which was time-bound, there have been a few major announcements since the recording and this episode initially going live. Some of these include the release of both GPT-4o, thats ChatGPT’s omni modal model, (you’ll learn what that means in this episode) and Microsoft’s Small Language Model family Phi-3.

Anyway let’s sit back, open up a terminal, type in dotnet new podcast and we’ll dive into the core of Modern .NET.

Jamie : So, Amit, welcome to the show. It has been a fantastic journey getting you on the show and a very, very warm welcome to you. For those who don’t know, Amit and I actually met in person in Seattle to discuss this. So that’s pretty cool.

Amit : Hi, Jamie. Thank you for having me. I’ve been super excited and looking forward to this since we met a few weeks ago now.

Jamie : Oh, my goodness. It’s more than a few weeks. It’s. Yeah, we met sort of mid March, I think, for during MVP summit.

Amit : Oh, wow. Okay. It has been a while.

Jamie : Yeah, yeah. So for the listeners, we’re recording this on May 3, 2024.

So the other thing as well is I’m actually going to lean on my editor a little bit, see if we can get this out pretty quick, only because the topic that we’re talking about, I feel, is moving on really quickly. Right? And I was hoping. I know you’ve got a book coming out. We can talk about that in a minute. I may ask you to do like an elevator pitch. Just let the folks know a little bit about you. But we’re going to be talking about generative AI and what it kind of means for developers and maybe perhaps a little bit like a super basic entry level what it is and how it works sort of thing, because I know you’ve got the book that’s just come out or will be coming out hopefully in the near future when this episode drops.

So I wonder, before we can get onto any of that, would you mind sort of introducing yourself to the listeners? Maybe a bit of a elevator pitch, that kind of thing?

Amit : Yeah, absolutely. So again, thank you for having me. My name is Amit Bahree. I’m part of the Microsoft engineering team in what we call AI platform. It’s the team that builds all the AI products that both ourselves, Microsoft uses, other product team uses, and of course, most importantly, our customers use. I’m responsible for a bunch of engineering on the platform, but at the same time, I’m also, you know, it takes a village and I’m one of the monkeys in the village. So, you know, that’s me at a high level.

Jamie : Amazing. Cool.

Okay, so we will be talking about generative AI. I know that a lot of the AI stuff is done traditionally in Python and other languages, and I know that we’re talking on the .NET stuff, but what I’m hoping is that we can talk about "what kind of things .NET devs might need to know in the coming months with the massive AI wave that’s already happened and is continuing to happen?"

But I guess before we do that, I know you have a new book coming out called " Generative AI in Action", and I was wondering if you could give the listeners a bit of a. Let’s have a talk about that. So, background for listeners. I’ve read most of it. I’ve read an early access version which was very kindly provided to me by Manning. I am going to be buying the book, though, and I recommend you all do too, because it’s really good.

But with all that said, who is the target for the book and what does it cover?

Amit : Right? Yeah.

So maybe I can perhaps before I answer that, start, like why I did this, unless you’re going to ask me that later. But, so maybe start with generative AI. As you, I think, touched on, it’s different from what we call "traditional AI." And I also want to acknowledge the term "traditional AI" is very odd to say it’s not traditional. It’s very much prevalent and relevant and active. The industries calls it traditional AI, and that’s what we sort of have to snap into, sort of that, you know, vernacular.

But having said that, you know, it is the new form of generation where AI is generating things rather than interpreting things and giving you a "yes" or a "no" or a "thumbs up" or a "thumbs down," in the past. Of course, ChatGPT took over the world. As I sort of half joke: my mom’s a ChatGPT expert, along with, I’m sure, many other folks Mom’s. But it certainly took the world by storm. And that’s really why a lot of where the genesis of the book came from, because one of the privileges I have sitting in my role and in the team is sort of having a front seat with all the work we’re doing with OpenAI and Azure OpenAI, and our other AI products around this set of things.

And primarily in my role. I probably talked to most of the Fortune 500 companies in the last 18 months there, from CXOs down to architects and leads and devs and whatnot; and everyone’s taken by "what can we do?" I think part of it is the hype, because it went from geeky science papers that a bunch of people would read like me, to sort of business news to sort of front page news, and, you know, folks are trying to figure out what to do or not. And when I was talking to all the people, I saw this pattern of questions and things coming back, and, you know, "what it should be", "how do we do it?" "How do I start?" Which is kind of what your question was, which I haven’t answered I’ll come to that, I promise. And then I got like, "surely somebody has written a book about this who can help answer all of these questions," again and again and again. And then clearly there wasn’t at that point in time. So I figured I might as well do one. So that’s sort of where it started.

But in the end it’s about for developers. You know, you don’t need to be. One of the beauties of generative AI is, you know, a lot of it is around language. Right now, of course there’s images and vision sort of things coming up, this video, this music. But primarily it is language, which is where, if you think of a large language model (LLM), which is one of the underpinnings the likes of ChatGPT, you don’t need to be a language expert to use them. You don’t need to be a machine learning, data science person to use them, which in the past you had to. You had to understand a whole bunch of terminology, special SDKs, Python packages and this and that. The beauty of these things is it’s wrapped up as an API. There’s an SDK in number of languages, and any developer can just use the API and inject this AI in whatever app they’re building to make it better.

That fundamentally, in many ways, is a democratizing from a developer’s perspective, where traditionally it was more machine learning, data science people. And now the trend we’re starting to see is most developers, if they’re not already, need to start thinking about how do I become an AI developer in a sense, "how do I incorporate a bunch of this new AI, which is just an API call away or an SDK install?"

That was a long, rambling answer.

Jamie : No, no, it was good. I really appreciate it because like, sometimes you need the long answer and you’re absolutely right.

I think I’ve been telling a bunch of developers that I know, you know, "hey, go get, you know, either Azure OpenAI or an OpenAI, you know, throw $10 at it, get yourself an API key and start, you know, just seeing what you can do with it." Because, you know, my experience of using, say, the ChatGPT/OpenAI API has literally been, "I create a JSON object, I send it over. The JSON object includes my prompts and it comes back with a number of responses." It is literally that simple. It is, you know, that’s like the "hello world," right? But it is literally that simple to get started.

Amit : Yes. And the SDKs make it even simpler. You don’t even have to craft the JSON sort of, you know, package, if you will. It’s even simpler.

Jamie : Okay. Yeah. Yeah.

So as a developer, regardless of the technology I’m using, what am I going to get when I read the book? I’m going to get like an introduction to—like am I getting, I feel like I’m cheating because I’ve read the book, but am I getting like the background information as to how generative AI works or am I getting, "hey, call this API to do this thing?" Like, what’s the thing there?

Amit : Right. That’s a great question.

So the way, so the title sort of in some ways gives away, gives it away. And, you know, it’s an "in action book." So I don’t go much into the theory and the genesis of things. It is more about, you know, the target audience, which was one of your questions earlier was like, it’s primarily any developer or data scientist for that matter. However, it’s mostly those who are in an enterprise setting, large companies, enterprises, organizations, who have a number of constraints and restrictions and so on. And the "in action" part is really, many of them don’t particularly care about the theory, so to speak. They want to, many of them, I presume, already understand, at least to some degree. And it’s more of like, "okay, how do I use it? Because how do I, at my job in my day, I have some work to be getting done. How do I incorporate this to get going?" This is sort of an enabler.

I’m actually going to use. And this is not me drinking the CoolAid necessarily, but Microsoft calls a bunch of this as "copilots." And really the analogy there is it is your copilot, so it’s helping you. So the "in action" sort of goes back into that. Like, "how can I get going and using it to solve whatever I’m trying to solve?" and rather not get into the, stuck with the tech itself, so to speak.

So the book does start with a little bit of introduction on what is generative AI, what can it do? What can it not do? What are language models? There is new constructs because of these gen AI things that most of us developers don’t necessarily know. So this notion of embeddings, this notion of tokens, this notion of vectors and so on. So these are basic building blocks that one needs to understand to be able to use these things. And that’s what we sort of introduce on. And then we go and like building the basics on, "if you’re generating text, what do you do? If you’re generating images, what do you do?" And so on. That’s sort of the first part. And then we switch into sort of more, that’s your foundation.

Then we switch to more interesting sort of advanced use cases on things like, you know, there’s new constructs, something called "prompt engineering," there’s new constructs of something called "RAG"—retrieval augment generation. "What is that? Why do I care? How should I use it?" Then we get into more advanced techniques like model adaptation, which is like many people call it, as fine tuning. "How do I tweak the model? What’s my new application architecture looking like? Which incorporates these, what are the things I should think about? How do I scale it up my app for production deployment, how do I do evaluations?" And then finally a lot around ethics, and responsible AI, and showing you things like what are new threat vectors one has to think about, how do you sort of mitigate them, how do you approach it? And so on.

So again, if I step back, if I’m in an enterprise things, it’s all of these things I have to think about. But the fundamental I want to go back and remind your listeners is: in the end it is adding and complementing what we as developers know. So things like distributed architecture, things like if I need to scale things out, things like best practices for development, for security, for deployment, none of those change. All of those are more relevant if, you know; and then in those constructs we’re going to add a few new things, things like tokens and so on, you know, I touched on, which you need to think about.

So the mental analogy, maybe for a little older listeners like me would be: it’s almost like that shift when we went from sort of two-tier, three-tier, you know, pre-web architecture to the web architecture; it doesn’t mean everything we knew went out the door, but we had to think about new things in addition, that’s sort of the shift. Or for the matter, mobile phone applications, if you think about form factors and flows and screens and so on. It didn’t mean the previous things wasn’t valid or relevant. You had to add new things to be cognizant of. It’s the similar shift. Like there’s new things you have to be cognizant of, but you’re building on existing experience, knowledge and best practices that one already has.

Jamie : Sure.

You know, there’s a whole bunch of stuff that you’d said there, and I want to take some time just to unpack a whole number of them, especially when it comes to things like threat-modelling, right? So we’ve seen it already. I think it was, I want to say it was late last year, one of the Canadian airlines, they hooked up….

Amit : Yes.

Jamie : A version of ChatGPT to the—or it was reported as ChatGPT, I don’t know whether it actually was—but it was a chat bot on their website and someone was able to prompt engineer, I guess, their way into getting on a flight for free or getting a refund or something. And that’s, you know, a user who came along and tried to chance something, right? And so that’s something that I think that we’re going to see a lot more of.

And I know that whilst I was at MVP Summit this year in March, I’m not revealing anything that’s covered under NDA for MVP Summit. By the way, there was a very public, what is it? Almost like a CVE? But it wasn’t really a CVE. It was like, "we’ve discovered this new threat," whilst I was there and it became really public. I want to say it was. Is it [Mark Russinovich]? I always mispronounce it.

Amit : Mark Ross.

Jamie : Yeah, yeah, Mark Ross. He was the kind of the face of this piece of research that him and his team.

Amit : Yes.

Jamie : Perhaps someone from your team, I’m not sure.

Amit : Yes.

Jamie : Had done and said, "hey, this is a new threat." And I’ve seen ones where people will take ASCII art of the prompt they want to put into a GPT-like system and it’s able to understand the shape of the words in the ASCII art. And so I feel like I don’t want to scare people who are listening, but like this opens us up to a whole world of crazy attacks.

And I just also, the idea of the copilot, I absolutely love that analogy because it is perfect for, for what it is, right? I’m going to use GitHub copilot as an example. So GitHub Copilot sits inside of my IDE and it’s giving me information in the same way that an airline copilot would give me information if I’m the pilot. I still need to make the decision, I still need to work on them, but the copilot is feeding me that information. Right?

Amit : That’s right. That’s right.

I mean, it does like the Copilot just, just to riff on that is, I mean, all of it needs to be anchored in use cases. But in that example that you’re using of GitHub Copilot, and [I] work closely with the GitHub team, it is doing a lot of the heavy lifting and the grunt work and then easing you off that burden, allowing you the bandwidth and the time to go. Think about more interesting things around that, rather than things like scaffolding code and API calls and so on, which are needed. And maybe when I was younger, I would be super interesting to me, maybe less so now.

And then on your threat, I think, yeah, I covered that, I think, in chapter 13. And it’s that, you know, a part of it is, a lot of it is cat and mouse as well now. Because these are new areas of tech, there are people who are pushing and seeing what is possible to break, what is not possible to break. Where can it bend? Where can it not bend? And then, you know, equally, that’s where the cat and mouse, you catch it. You catch some yourself. But, you know, I go back to, it’s like understanding the limitations of these, because they’re not, despite all the hype, it’s not magic and solving everything on the planet, unfortunately—or fortunately. So understanding the limitations and then that also, then says, like, "how do you implement it?" Fundamentally, it is not much different than any other things you do. It’s just more of understanding the attacks, understanding those threat vectors, and then figuring out how do we do.

Your example of the canadian airline, if I recall off the top of my head, I think it was more of getting a fair or a refund, which was the model, was what we would call "hallucinating." So it basically gave an answer back that it made up on its own as a policy, not really what the policy of the airline was. But since it was a formal, how do I put this? I’m not a lawyer, but since it was a formal sort of communication or like an, you know, it was a chatbot it was their formal mechanics back to the customer, they had to go honour it. So the exact problem there is what we call is "groundedness." It wasn’t grounded in propriety, like in house information. That is, it’s their corporate policies or the airline’s policies on refunds or whatever it was. So for, you know, that’s one of the things we, we touch it across various chapters in the book on this concept of grounding, and then that’s one of the evaluations and tests one should do around that to avoid something like this.

Jamie : Sure.

Because I guess that, I feel like I’m leaping across our questions list, but I feel like that’s something that, you know, if somebody was to say to me, "build me some software and write some tests for it." I know that the, say, the .NET code or what, or the Python code or whatever that I’m writing for this maybe CRUD app is going to be deterministic. Every time that I push the button, it’s going to do the exact same thing. But then we throw in generative AI, and I know from reading your book and from some background information I’ve done previous to that, it’s all probabilistic. Right? So, like, how do I go about testing that? Again, it’s that extra thing you’ve got to think about, right?

Amit : That’s exactly, exactly right. I mean, these are non deterministic. Now, you can nudge them closer to getting a similar answer. Sometimes you might get the same answer, other times it will not be the same answer, but similar answers. And therein is also an interesting aspect, is, you know, there’s a chapter also on, like, evaluations, because you can’t, as you’re touching on, it’s not like a binary yes or no. "Did I get the same thing?" Because you will most likely not. So part of this is understanding how do I do evaluations differently and also benchmarks which go hand in hand and what are the different techniques and ways to go and do that. So that also becomes a very fascinating area and subject by itself.

Jamie : 100%, I agree.

Amit : And sorry, just one afterthought, as I finished talking, was it’s also, at least in the context of large companies and enterprises who are more used to determinism, right? Because they have regulatory reasons, policies, reasons, legal reasons and various others, to have that absolute: the answer is always the same and correct, as we all expect, our life will not work in many dimensions. I think some of it is also a mental shift that in some of these dimensions, the non-determinism is okay. And that’s not, it’s easy for me to say in reality, that’s, that’s, you know, that’s a big mental shift for folks, and it’s a journey. It doesn’t happen overnight.

Jamie : Absolutely.

I guess there’s going to be people listening in who are like, "whoa, you’re talking like, really high topic, like enterprise-y things, right? And I don’t know the first thing about Gen AI." So for those folks who are listening in, I wonder, could we, if you don’t mind, could we sort of look just a little bit under the covers?

I know that Scott Hanselman always says, "let’s look at the layer underneath," right? So let’s pretend I’ve opened up ChatGPT, or maybe I’ve built a system that uses Azure OpenAI or OpenAI or whatever, and I’m using it in a chat-like situation. I’m going to give it a prompt, maybe I’m going to ask it a question. What’s actually happening with that prompt? I know you hinted at things like tokenization and vectorization, things like that earlier on. I just wonder if we could take some of those ideas and just give the listeners an idea of what is actually happening there.

Amit : Sure, Jamie, that’s a really interesting question. The roots of these new gen AI architectures are the genesis of this paper from Google on, " Attention Is All You Need."

So what happens when you type in, let’s say "hello world" in ChatGPT we just make it simple just for our discussion. When you type in that text and sort of hit enter, the first thing that needs to happen is convert this text into a number representation. The way these models work is they understand tokens. So even though you’re entering a prompt which is a string or a sentence or a question or what have you, the thing they understand is a token. So the way you want to think is you talk to them via tokens. And a token is basically a numerical representation, because at the end of the day, a lot of, not a lot, almost all the AI models are just matrix multiplications and long, you know, floating point numbers.

So the first thing we do need to do is convert this text into what’s called a token, which basically chunks it up into pieces. Depending on which model you use, its token would be different. So in the context of OpenAI or Azure OpenAI, roughly three fourth of a word is one token. So if I put in "hello world" with a full stop or a period in the end (i.e. "hello world.") , that will probably be three tokens, because the period itself is a token as well. These tokens are basically a number representation which are called "embeddings." That’s how the model understands them. When you call in the API, you can go and look at what an embedding looks like. It’s a long array of floating point numbers, but the similarity of things is how the embedding is represented. And I can talk about that if you want.

So the text gets converted into a token, which gets converted into an embedding, a number representation, and that then is what is passed on to the large language model—an LLM— to help sort of digest and understand what you’re trying to do. And then once it digests that, it spits out an answer back. And then you sort of do a reverse, you get an answer back you convert it, you know, you get a string, you get an array or floating numbers, it gets converted back into numeric, sorry, a string representation, which then you see as generated completed text on the output. So, you know, if we look at it at conceptual level, those are the steps: text coming in converts to tokens, converts to embeddings. You go through what’s called an encoder and a decoder. These are the two steps of the model. And then out pops sort of a reverse, if you will, and then eventually text.

What is important to understand, I think to some degree a large language model is a fancy autocomplete. Its single purpose in life is to find the next word that it thinks is the right word. And if you don’t tell it to stop in some manner, it’ll just keep going, finding the next word and the next word and the next word, understanding the sentence, right? Understanding what you’re trying to do.

I think that’s where if you break it down to quite simply like, you know, it’s only single purpose is: making you happy by giving you the next word that is meaningful in the context of the sentence. And you basically rinse and repeat, rinse and repeat, and then out pops a paragraph or a sentence and what have you. But that’s all at a conceptual level what happens when you sort of plug in "hello world" into, let’s say, ChatGPT?

Jamie : Right, okay.

And so I happen to know then, because I’ve been playing around with the API, that obviously with OpenAI I can get up to five different responses, right? So when folks are telling me, "oh well, if I’m only sending four tokens for ‘hello world,’ why am I looking at perhaps 200 tokens for the whole request?" And I said to them, "well, you know, you send the four tokens, it generates the response, it uses tokens to generate that response. But there may also be a couple of other responses that come with it," right?

Amit : That’s right.

So you have options and parameters, which is what you’re touching on, to tweak and say, "generate up to a certain limit of responses." You can either say, "give me all of them back," or you can also say, "of these five, in your example, give me the best one back." So you can say like best of like best one of five kind of a thing. And then the other aspect on these, at least these LLMs is, I think this is where most developers who are new to this get somewhat confused in the early days because it’s an API call. You know, we expected just like we would call another library, right? I call a function or a parameter, give it parameters, give it an input, out pops an output. It’s doing some compute, and then I get the answer back to whatever I’m trying to do.

In this case, there’s two buckets of work that happens, and it is important to sort of disconnect the two, at least mentally. The first bucket of work is understanding the prompt. What are you asking? This is where this whole new techniques of prompt engineering are. So I’ll use an example, which I think I use somewhere in the book I can’t recall off the top of ahead. If I say, "write a three verse poem on pandas," versus if I say, "write a three page essay on pandas." Now, understanding my request, that is, my input prompt, is the first bucket of work that has to happen that uses separate different tokens to understand, you know, what I’m asking for.

In my two examples, they’re almost the same. For all practical purposes, they’re the same request going in, in the sense of tokens, the same compute needed to understand what I’m asking. But my second bucket of work is the generation that in my two examples is completely different. And in both in terms of tokens and hence the amount of compute and ultimately cost, I mean, the, you know, so a three verse poem of pandas, the amount it takes and the tokens it’s going to use to generate is going to be very different than a three page essay on pandas. Even though my input was the same or very similar, my output is drastically different.

So when you, when you think about the notion of computational cost and actual cost, because tokens is the new currency in these new generative AI models, my input is about the same and the cost of processing the input is about the same, but my output is very, very different. So one has to be cognizant of both of these dimensions.

But it also adds to other things like latency. So again, if I’m building, depending on my use case, I need to be aware, "what, how long would my generation be? What am I asking it to do?" The more I’m asking it to do, the longer it’ll take as well. So the earlier mentally one can break it up as processing the input and processing the output—those are not formal terms—I think the easier the adoption and the mentally for many, at least many developers are new to this.

Jamie : Sure, sure.

Yeah. I think the shift in, in thinking here is definitely one that takes a moment of reflection when first dealing with generative AI’s especially if you’re dealing with the, the APIs. Because, yeah, you’re right, it’s, it’s the cost of generating that response. And I do know, and this may be outside of the scope of our conversation, but I do know that when I put in a request, I can get multiple responses back and they could be filtered out or whatever. And like you said, it’s about picking the one that fits the request best. Right?

So let’s stick with ChatGPT as our example. There is some code in the ChatGPT front end that is going to, when it sends a request over to the presumably OpenAI API, when it gets the responses back, it’s going to go, "ah right, okay. So this response has," I don’t know, making up a name here, "but like a weight of 0.2, this one has a weight of 0.6. I’ll choose the 0.6 one because perhaps that’s what fits my request better." And I think, I think that’s worth knowing as well for developers.

Amit : That’s right.

Jamie : I’m using the wrong words there.

Amit : No, no, yeah, it’s a probability.

Jamie : There’s some processing that goes on.

Amit : Yeah, it’s a probability distribution.

So again, you know, if you go back to my thing like, you know, the single purpose of an LLM is to predict the next word and that’s its happy place. How does it do that? Right, so that’s where this concept of vectors come in. So the vectors in are in, what we would call is a multidimensional space.

Now we as humans are used to 3D, we think we know 4D, maybe some do, but our brains beyond that can’t like comprehend, you know, there state spaces are, I think like the open air models is 1,536 dimensions, if I recall off the top of my head.

Jamie : Wow.

Amit : So basically what the embeddings, the numbers, which is, you know, floating points numbers, at the end of the day, what they do is: if you think 3D, but if you think like 1,500 times more, these numbers get plotted there and the distance between, and the numbers are representing, like I said, loosely speaking, a word, like, it’s not a word because there’s three fourths of a word, like I said earlier. But just to keep it simple, let’s say it’s representing a word. The relationship of the words and the meaning of the words are plotted in this multi dimension space and how close they are, the angle between them, the distance between them is really how they get the meaning.

So when you, and I have this example in the book, so in the sentence, "the dog sat on the," the next word most people would go is "mat," perhaps, right? But you can actually look at the probability distribution. The model will also think like, "sat on the couch," "sat on the floor," "sat on a chair." So all of these are valid, but in this multi dimension space, most of the times it’s like "sat on the mat," or "the cat sat on the mat." "The mat" is the highest probability distribution. So it’s like, most likely that’s what you want. So that’s what you get.

But you can, using the APIs, actually, even in ChatGPT itself, and just on the web app, you can go and see, like, "what were the other options one has, and what was the probability distribution of the other options as well?" But in the end, it is just a probability distribution, but it is not doing it by itself. So, like, "the dog sat on the mat," it’s not just looking at "mat," the word mat itself, it’s looking at "mat" in the context of the whole sentence, and understanding the meaning of the sentence; which is where the paper, which originally was "Attention Is All You Need", the concept of attention comes in.

If I make it real with another example, which is one of my pet examples I use with many of our customers, is like the word "bank." Now, if I use a traditional search engine, where it’s just doing a regular search, like a keyword search, if you search on "bank," it’s going to give you a list of documents or links and saying, "here’s all the words, here’s all the places I found bank." Like, you know, the literal. But the notion of these vectors and the similarities thing I touched on is it takes into context of, "Jamie is in finance, maybe Jamie works for a bank or for a financial institution." And when you say the word "bank," it will understand, like you’re talking the bank as in The Bank of England, or, you know, one of the other banks, as opposed to, let’s say, the bank of a river, or if you’re an F1 fan, the bank of a racetrack.

Now, those three banks and the ones I said are fundamentally very different meaning, even though they are, like, if you literally, they are the same thing, but the meaning is very different. So when you’re thinking of in these multi-dimension spaces of these embeddings, and I was saying, you know, their angles and the distance gives you the intent of the meaning, this is where it’s picking it up. So, "Jamie works for the bank," is very different than, "Jamie likes to have lunch on the bank of a river." Maybe you do, maybe you don’t.

But the intent of that bank and the meaning of that bank is how basically gets represented as these probability distributions in the context of the sentence.

Jamie : Right.

It’s, uh, you said when you said how many dimensions OpenAI used, and you’re right, I think, in three dimensions. I don’t think I can think in four dimensions. And then you’re saying, I think I brought it down as 1,536, which might not be the correct number that you said, but that’s what I wrote down. And I was like, that’s a lot of dimensions. I’m glad I don’t have to do that math.

Amit : The beauty of this is most of us don’t really have to worry about this. We can just use the models. We can say, "look, give me the top five probability of the next word." If I choose to do that, I can inspect it as the human and say, "yeah, that one’s better. No, this one’s not better." And then there’s other sort of a little more advanced parameters in the API, I can go and nudge it. So if I’m going to pick on you again, Jamie, sorry, but if Jamie doesn’t like the word bank for, for whatever reasons, he’s had not a great experience at the bank of the river he was having a picnic at. And Jamie can go and say, "never ever generate the word bank for me." So you can actually, in a way, force it to say, "if you see that in your generation, please delete that, loosely speaking, and replace it with something else." So there’s sort of more advanced parameters you could nudge and control the generation back, should one need to.

A Request To You All

If you're enjoying this show, would you mind sharing it with a colleague? Check your podcatcher for a link to show notes, which has an embedded player within it and a transcription and all that stuff, and share that link with them. I'd really appreciate it if you could indeed share the show.

But if you'd like other ways to support it, you could:

Leave a rating or review on your podcatcher of choice
- Head over to dotnetcore.show/review for ways to do that
Consider buying the show a coffee
- The BuyMeACoffee link is available on each episode's show notes page
- This is a one-off financial support option
Become a patron
- This is a monthly subscription-based financial support option
- And a link to that is included on each episode's show notes page as well

I would love it if you would share the show with a friend or colleague or leave a rating or review. The other options are completely up to you, and are not required at all to continue enjoying the show.

Anyway, let's get back to it.

Jamie : Sure, that makes sense.

And I guess I’m going to ask. So we’ve been talking about large language models so far, and I know that a few weeks before we recorded, there was this announcement of Phi-3, which is the latest in a family of small language models (SLMs). So I wonder, could you speak really quickly to the difference between a large language model and a small language model other than, like, just the name? Like, is it quite literally just a smaller version?

Amit : So the underpinnings are the same, you know, what we’ve been touching on and the stuff we’ve been talking about is fairly high level right now. So a small language model has the same thing, this concept of tokens, embeddings. It may have an encoder or a decoder part within the model, and it’s doing generations back. So, you know, fundamentally architecture, when I say architecture, I mean model architecture. You know, it’s basically the same construct.

The main difference really is literally in the name, it’s, "small in relation to the large." So this is one of the other conversations I’ve been having with many of our customers. It’s not, when you compare it to a large language model. So let me back up for a second. One way in the industry, we measure sort of the power and also the complexity of these models is how many parameters a model has. Which in the end is, you know, as we, as we touched on, all of these AI models are at the end of the day, non-deterministic. And the way, you know, it figures out is different layers and different knobs and what have you. So the number of parameters or the number of knobs it can tweak to go perform what it’s trying to do. So the large language models are like trillion plus parameters. Now, that’s like I said, the parameter count is a measure of the power of the models—i.e how sophisticated they are and also the complexity of the models. Of course, the larger the model that is, the more parameters. And this is where laws of physics and computer science doesn’t change. I need more compute, I need more GPU’s, I need more memory, I need more bandwidth. So bigger doesn’t necessarily mean better per-se.

So therein is where the small language model comes. Now, they’re, "small in relation to the large," but by themselves they’re still pretty large. So, for example, Phi-3, in fact, we announced that last week. Now we announced Phi-2 a few weeks ago, what you were touching on. Phi-3 comes in a family of sizes. The smallest is Phi-3 mini, which is the one we announced, and Build is coming around the corner. And I’ll just say only that.

But Phi-3 is 3.8 billion parameters. Now, 3.8 billion parameters by any stretch of the imagination of computer science, is not small, it is pretty complex. But when you look at, which is like hundreds of billions of parameters in relation to that, it is small. So I think that’s the first thing, because people often have this incorrect notion that, "oh, small means I can deploy it in sensors and I can run it on my phone." We showed how you can run Phi-3 on the phone. Yes, but can you run it at production at a certain scale where it’ll work meeting a user’s needs is a different conversation.

So relationship to the large, it’s small, but by themselves, are they fairly complex. But then the other… so that’s one on the size. But the thing they offer is because they’re smaller, they’re quicker, that is, latency is lower. I don’t need as much computational things. I can actually run them on a CPU if I need to, rather than a GPU, or I can run them on a smaller set of GPU’s or smaller amount of memory. So what folks need to probably think about is: in some, I use this car analogy, right? So if you think of like a frontier model, as we call it, like a GPT-4, if you think of that as a Ferrari, and if you think of a Phi-3—as a small language model—like a Fiat, you know, there are times where you need a Ferrari, but if you’re stuck in traffic or what have you, there’s times where actually Fiat would make more sense.

So it’s not that one is better than the other, it’s more like, "what is the use case? What are you trying to think and what are you doing? And that for that point in time, use the right model." So that’s how you want to think about [it]: the capabilities are lesser because they’re smaller, but they’re also then quicker and faster and cheaper to run.

Jamie : Right, okay.

And then I suppose, okay, so in a situation where perhaps I don’t need to use a large language model because maybe I don’t have the massive amount of compute required, the hardware required, I could perhaps say, "hey, you know, for this particular task, I’m going to shift over and use my small language model because maybe I can run it locally." Like you said, I can run it on a CPU rather than a bank of GPU’s. Or perhaps I don’t need to get specialist hardware, or maybe we’re in a constrained environment, maybe. Oh, here we go. Oh my goodness. Scalable microservices-based AI. Okay, so we’ve got the central LLM on a server somewhere. When I’m sitting in the office and I’ve got a stable Internet connection, I can be connecting to that and having that help me do my work. But then maybe when I go out into the field, I don’t have a stable Internet connection. The app that I’m running that is connecting to the LLM could perhaps detect that and go, "hey, I’m going to switch over to the SLM now because it’s running locally and I can do some of this processing. And then when we get back to the office and we’ve got a stable Internet connection, I can actually shift over to the LLM and maybe do some eval work. Or maybe get you a better answer. But the best answer I can give you right now is based on using the small language model, which is running locally. Here’s what it says," right.

Amit : That’s 100% great.

I’ll actually even push you more. In a way, it doesn’t even have to be whether it’s a stable connection or if I’m out in the feel in sort of a disconnected or semi-disconnected sort of mode. The beauty of these LLMs and SLMs and GenAI is it’s an API call. So when you’re building your application, it doesn’t mean a mobile app, but a web app, an enterprise app, or, you know, whatever you’re building, it’s an API call. You’re not… it’s not like, for example, like the CLR, you’re not. It’s not that, "I have this dependency on a certain version and I’m stuck with that dependency for good or for bad, for my whole application," because these are stateless APIs.

So the way we suggest, and I touch on this in the book as well, is when you’re building your application, because it’s an API call and they’re stateless, you use the right model at the right step in the time. So in the same workflow, you should be able to, and you should absolutely think about using multiple models for their power that they are. So, for example, if my use-case is, let’s say, summarization, which actually a lot of our financial companies, a lot of the banks are using it, there’s a lot of finance, long winded documents which are needed to be summarized for various reasons, and I can talk about those, but I won’t necessarily get into the. To understand, for example, if my workflow is, ""summarize these documents at the close of a trading business day, and I’ve got to summarize it, have a point of view within, let’s say, my financial institution, before markets open the next morning." Until now, one had to, like a human would go sit and read through, summarize it, "what’s happening in which industry or which company and what have you." Now you can have an app as a copilot, which is doing that grunt work, summarizing it and giving you sort of the tl;dr or the summaries, right? "Here’s the key points."

Now, if I need to understand what type of a document it is, let’s say, or which industry is it? Because I have experts, maybe Jamie’s an expert in computing industry, but Amit’s an expert in agriculture—by the way, I have no idea about agriculture. You could use a small language model because it doesn’t need to go comprehend the whole document. It needs to know enough that, like, "hey, is this financial related? If it is, then I’m going to go punt it to Jamie", versus, "this is agriculture related, I’m going to go punt it to Amit." So you don’t need a large language model and all the horsepower and the power it has to go do that. You can use a small language model.

Now, once you get it to the right place, in the right workflow, then you can bring in the large language model, and in the end, you’re saving cost because, you know, a large language model is more complex, more costly to run, more tokens it’s using. So it’s not only your disconnected, semi-connected scenario, which is 100% accurate, but it’s also use the right model for the right capability at the right step in the same application.

Jamie : Right, okay.

Yeah. So I could have a small language model that I say, "hey, tell me what this document is," like, "based on pages one and two. What kind of document is it?" And then it will come back with, you know, "it’s a 90% probability that it’s a," like you said, "a financial document." Okay, cool. So then my app would say, "right, okay, you’ve come back 90% positive it’s a financial document." Like you said, "I’ll go push it to the large language model that is trained on finance stuff and have it do that," rather than have the large language model that is trained on finance stuff waste time in case it’s a document about mechanics or engineering. Right?

Amit : Yeah, I mean, and it’s also efficiency and optimization. So, like, you know, most applications have an orchestrator, right? And then in your orchestrator, in your workflow, it would be using a bunch of these models to go figure out, "which way do I punt it?" Right. It’s a fancy traffic cop at the end of the day.

Jamie : Okay, that’s really cool. I hadn’t even thought of that. Like, most of my ideas around GenAI have been, "hey, summarize this," or, "can you create me a thing that does this?" But, like, having something that can help make those decisions as to where to process, how to process.

Amit : I mean, like GitHub copilot, for example, you know, it’s a bunch of this stuff running like, as well. Like, you know, "do I have different models which understands a certain programming language or a certain package or a certain library to help me write code better?"

You know, in some cases, for example, latency and all, doesn’t matter. In other cases, it matters, I think what many people need to step back and think about, like, "what’s the problem I’m solving? What’s the use case? What’s the task at hand?" And then go like, "okay, right now that I understand what we’re trying to solve for, what do I go pull in?" I think many of us, including me, have been guilty of like I have GPT-3 or 4 or your next thing, and I’m going to go use it everywhere. And I’m like, "no, maybe." So switch it on its head, start with the use case. And in the use case, what all can I use? As long as it’s language related in our examples, right? I mean, yes, there’s image models as well, but like code, we touched on GitHub copilot, for example. Code is language. It’s a different language, but it’s a language. So, you know, you are also in the constraints of a language model.

I think that’s also have to sometimes go remind people. They’re like, "can it go predict the future?" I’m like, "no, it’s a language model."

Jamie : That’s fantastic.

Okay, so as we’re running a little low on time, and I have a number of hopefully shorter questions from the community. Some wonderful folks over on a slack group that I’m part of for the Coding Blocks podcast have sent in some questions. And I wonder if we could run through those, a few of those, if that’s okay.

Amit : Yeah, absolutely.

Jamie : Excellent. Okay, so user Ronald over there has asked, "in your opinion, is it now urgent for developers to start learning about generative AI and LLMs or AI in general, like the generative AI in general? Or is it something that they can just sort of ignore for a little while because things are moving so quick, and focus on non-AI tech tools to help them learn? And will they still be okay, do you think, in their careers in six months time? I’m not asking is it going to replace an engineer, but like, can an engineer for now just ignore it a little bit?"

Amit : Yeah, no. So, no, it’s not replacing any engineers, I can tell you that. No.

I don’t think it’ll be, from a career perspective, helpful to ignore it, I would say, because the basic, as much as there is new models coming out and the hype and the speed of which is tremendous, the underlying constructs and the principles are so far the same, you know, the things we touched on, embeddings and how they work and tokens and so on and so forth. I think the, I think it. I won’t, if I was them, I won’t ignore it. I would understand how they work at a high level, understand, "how can I use them in my use cases," and don’t have to get into the weeds of the tech itself. I mean, that’s the beauty. It’s an API call. So understanding the power of them and being aware of what’s happening in the industry, I think would be beneficial from their career perspective, because this is not one of those you can, you know, like other places you can leapfrog the tech, right? You can, like, ignore a bunch and just leapfrog to, like, this. This will not be one of those, as we are seeing right now.

Jamie : Right. Okay. It makes sense, right? So I’ve been telling people it’s like the pocket calculator, but my friend Jim said something that’s even better. Right?

So I said, "it’s a bit like a pocket calculator. When the pocket calculator came out, all of the engineers and scientists who weren’t using it were operating at a less productive state than the engineers and scientists who were using it." And he said, "let me take it one step further. It’s a bit like the spreadsheet, like a spreadsheet application. Before things like Excel and other spreadsheet apps, it would be expensive to hire someone to pull out a physical ledger and write out all of the things that needed to be accounted for. Whereas when Excel and things like that came along, you had a, to within a certain constraints, near infinite amount of space to actually write out whatever it is that you’re writing out and to have it help you do those calculations, which meant that more people could afford to have accountants and more people could afford to put things into spreadsheets to help them visualize that data."

Amit : That’s right.

Jamie : And I think you’re right. I think it’s definitely not going away. It’s something we definitely need.

Okay. And speaking of Jim, he’s actually asked. So Jim is one of these… Jim is brilliant. He’s a retiree, but he’s very much still interested in technology and still does a lot of stuff with development.

Amit : That’s awesome.

Jamie : And he’s asking, "how dangerous is it? Or is it even dangerous? How dangerous is it to use GenAI to generate both code and tests? Both could be hallucinating in a way that their issues cancel each other out. Shouldn’t code or test always be created by hand, or at least be reviewed by human eyes? Do we trust but verify?" I would say the latter.

Amit : So, you know, trust but verify and the humans always in sort of command, if you will. That’s why as we touched on the "copilot" [idea] two [or] three times as we’ve been chatting.

I mean, so couple of things: For example, there is a lot of value for GenAI to build code, specifically when it is a lot of scaffolding. I touched on that. So there’s a lot of bootstrapping code, which is fundamentally needed, different packages, different libraries, hooking things up and so on and so forth. It’s not solving the business problem, so to speak, but it is necessary sort of tax. We can’t get away from it. So a bunch of that stuff? Absolutely. The machine, so to speak, should create it. You check it, validate it, and then leaving you time to go do like the more business problem in the code to go solve.

So I don’t think in principle there’s an issue that it’s harmful, but the premise there, of course, is you should always trust per verify. The one other thing, more of a—it’s not a plug, but just because, you know, some folks may not understand,—like, at least in the context of GitHub copilot, like when you, when we generate the code, and actually even on Azure OpenAI, when you get a response back, whether it’s a code or not, it’s not the raw output that you’re getting back. It actually goes through a bunch of other checks and balances to make sure it is okay. Of course, it’s not perfect, but there’s an ensemble of other machine learning models it’s going through to make sure it’s okay.

And we’re not, you know, writing, for example, security defects, and at least the known one. Now, we can’t capture everything because that in general is also a cat and a mouse. But, you know, the known security practices, for example, in code or in the context of generation of text, like, let’s say from, you know, OpenAI or Azure OpenAI, it goes through this content filtering and harms detection and so on and so forth.

So there’s other AI helping this as it’s coming back. So it’s not completely just raw, raw, if you will.

Jamie : Sure.

I think it also bears repeating, something that we said when we sort of first met for that coffee in downtown Seattle was, "a lot of people may not realize, but they’ve been using AI for a long time."

Amit : Indeed.

Jamie : A lot of the apps and services kind of use AI anyway, like Google search, perhaps. Perhaps Bing is using it, too. You know, I’m not, not asking for the secret sauce here. But like those search results that we’re, that we’re used to seeing, there’s probably some decision making going on in there that’s using context about who it is that’s searching or maybe the location they’re searching in. And that’s like an intelligent context injected into the search results, right?

Amit : That’s right.

And so 100%, like Bing uses AI, Google uses AI, a bunch of other stuff. Use AI, the thing in those also. And this is where like the copilot analogy comes even more sort of in the forefront. We call them those AI "on autopilot" in the sense. So if I’m searching or you’re searching, we have no say in what and how it should search. Right. There’s no knobs we can tweak. We’re not in control. It spits out something and that’s what we have to sort of use. But we can choose not to use it. But, you know, that’s what we get.

Whereas in the copilot you are in charge, so there’s knobs and tweaks you can do. It’s not forcing, like, "hey, this is the way." You can see and understand what’s happening. So you, it is your copilot. It is helping you at the individual and… but ultimately you are the one who is deciding it, not the algorithm or the AI is deciding as the other one. So all the more reason it empowers you more to know what is right or wrong because we only as humans understand that. Right. These things don’t.

Jamie : Sure, sure.

And you’re absolutely right there. Like, the more information that I can be provided when I’m using a system by the system, the better. GitHub Copilot, I believe there are other copilots as well. So like other offerings and stuff. And I suppose there are other suppliers of Generative AI stuff, right. I know, I know I’m talking to you and you work for Microsoft, but you know, I just wanted to point out to folks that, you know, there are other companies as well, but obviously, you know, definitely check out OpenAI and Azure [Open]AI, right?

Amit : Yeah, that’s right. I mean, we’re not the only game in town for sure.

Jamie : Yeah. Cool. No, that’s amazing. So I guess as we come to wrap up then, would you mind reminding folks about the book and then maybe if there’s a way to get in touch with you, if they have any questions. Is it possible to get in touch with you? I mean, maybe you don’t want people asking you questions?

Amit : Yeah. So the book is, again, from publishing by Manning is called " Generative AI in Action." It should be out soon. I’m finished with it. They’re going through the final checks and whatnot. And I think all of us are keen to get it out as soon as possible because I’ll have to keep editing and updating the thing as quickly as the world’s moving in the context. So please do go check it out.

Getting in touch with me is quite simple. I’m on Twitter as my last name @bahree. I have a blog which is more for me, it’s my brain dump. So if you, if you go, you know, and maybe Jamie in the show notes, you can put that if you want, but it’s like it’s blog dot desi geek dot com. I can, I can email you that later as well. So there’s a bunch of ways folks can get in touch with me if they want to, equally, if they want to keep away and say, "I have nothing to do with this person," I won’t blame them either.

Jamie : Oh my goodness. Amazing.

Well, Amit it’s been wonderful chatting with you today and, you know, thank you ever so much for being on the show and helping folks to sort of get an idea of what’s going on with AI; and then hopefully, you know, they’ll go buy your book and learn a whole bunch more. I know personally, I’ve read the book and I know that it’s going to be useful for almost every developer I know out there who’s interested in this stuff. It does, it reads very much as a, "let’s gently introduce you to some stuff." It’s not like, "AI is going to rule the world!" or anything like that, which is what I’ve read from some of the more civilian authors, shall we say, on AI.

Amit : I appreciate it. Thank you for having me. It’s been great fun and happy to answer questions as folks may come.

Jamie : Amazing. Thank you.

Wrapping Up

Thank you for listening to this episode of The Modern .NET Show with me, Jamie Taylor. I’d like to thank this episode’s guest, Amit Bahree, for graciously sharing his time, expertise, and knowledge.

Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at the podcast's website, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice - head over to dotnetcore.show/review for ways to do that - reach out via our contact page, or join our discord server at dotnetcore.show/discord - all of which are linked in the show notes.

But above all, I hope you have a fantastic rest of your day, and I hope that I’ll see you again, next time for more .NET goodness.

I will see you again real soon. See you later folks.

Useful Links

A discount code, good for 45% off all Manning Products: dotnetshow24
Generative AI in Action by Amit Bahree
Phi-3
Attention Is All You Need
Coding Blocks podcast
Connecting with Amit:
- on X (formerly known as Twitter) @bahree
- Amit’s blog
Supporting the show:
Getting in touch:
- via the contact page
- joining the Discord
Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show

S06E23 - Generative AI for .NET Developers with Amit Bahree

Sponsors

Embedded Player

The Modern .NET Show

S06E23 - Generative AI for .NET Developers with Amit Bahree

Supporting The Show

Episode Summary

Episode Transcription

RJJ Software's Podcasting Services

A Request To You All

Wrapping Up

Useful Links