The Modern .NET Show

Episode 109 - OCR and Azure Cognitive Services with Nick Proud

Embedded Player

Episode 109 - OCR and Azure Cognitive Services with Nick Proud
The .NET Core Podcast

Episode 109 - OCR and Azure Cognitive Services with Nick Proud

Supporting The Show

If this episode was interesting or useful to you, please consider supporting the show with one of the above options.

Episode Introduction

Robotic Process Automation (RPA) is a revolutionary technology that has been gaining traction in recent years. It is a low-code solution that allows users to drag and drop commands together to build a bot that automates repeatable business processes. This is a promising technology that can revolutionize how businesses operate, and it is important to consider certain aspects when using it.

Nick Proud is the head of software development for an RPA company, and he is a full-stack developer who is a big fan of the C# language. He believes that being a full-stack developer implies having the ability to quickly pick up frameworks and run with the documentation. Nick believes that Intelligent Automation, which is related to RPA and involves mapping out a process that would normally be done by a human, is essential for success in this field.

RPA is excellent for reducing toil, or the busy work that needs to be done to get to the main objective of a job. It is also great for automating processes, allowing people to focus on more complex tasks. However, RPA projects can require human intervention for business exceptions or events that don’t happen according to plan. To address this, Nick and his team have created a web application to enable humans to intervene when needed.

Azure is a great choice for RPA projects since it offers APIs that are fire and forget and has pre-built machine learning models for specific document types. AWS and Google also have similar provisions for RPA projects. Nick is currently using Azure Cognitive Services and Form Recognizer to extract data from documents. He emphasizes the importance of using the best tools for the job and not trying to shoehorn a library into a solution.

Finally, listeners can learn more about Nick’s work by searching for NexBotix on the web or by contacting him on Twitter or LinkedIn. Links to the topics discussed in the interview will be included in the show notes, and Nick recommends searching the Microsoft Form Recognizer docs for more information. Automation is a powerful technology that can revolutionize how businesses operate, and RPA is an effective tool for taking advantage of its possibilities.

Episode Transcription

Hello everyone and welcome to THE .NET Core Podcast. An award-winning podcast where we reach into the core of the .NET technology stack and, with the help of the .NET community, present you with the information that you need in order to grok the many moving parts of one of the biggest cross-platform, multi-application frameworks on the planet.

I am your host, Jamie “GaProgMan” Taylor. In this episode, I talked with Nick Proud about the work he has been doing with Robotic Process Automation and document processing with Azure Congitive services. Although there are tonnes of services, libraries, and solutions for reading through and programmatically reasoning about a corpus of documents, the Azure Cognitive Services Form Recogniser seemed to fit both the problem and the solution that Nick was working on.

Along the way, we talked about how RPA is a reduction in toil or busywork for people which allows them to focus on the task at hand, we talked about our own personal definitions of the term “full stack developer”, and we talked about how important it is to look at a number of possible supporting libraries and services when approaching a new problem - rather than attempting to shoe horn a library or service into your solution just because you are familiar with it. Sometimes we developers have to step outside of our comfort zones and attack a problem in a unique way, and that’s one of the key take aways from this episode.

So let’s sit back, open up a terminal, type in dotnet new podcast and let the show begin.

The following is a machine transcription, as such there may be subtle errors. If you would like to help to fix this transcription, please see this GitHub repository

Jamie

So thank you ever so much, Nick for spending some time in your afternoon. Joining us on the show, I really genuinely appreciate it. I say that in every single episode. But I really do genuinely appreciate people taking the time out to talk to me, and to talk to the listeners as well as it’s not very, very kindly.

Nick

Thank you very much.

Jamie

Hey no worries. I mean, let’s, let’s share the knowledge. That’s what I’m all about. If someone walks away from this, having learned one thing, I’m good. That’s how I do it.

Excellent. Okay, so I wonder if you wouldn’t mind sharing a little bit of information about yourself, you know, like a little elevator pitch about Nick. Maybe some of the work that you do a little bit of background, that kind of thing? Is that alright?

Nick

Yeah, that’s cool. So yeah, I’m Nick, I’ve been working as a software developer for probably about 10 years now. Well, in tech for about 10 years. So I’m currently the head of software development for an RPA company. So most of my work has been very sort of specialised towards RPA. So robotic process automation. And we can talk a bit more about what that means, if you like. But recently, my work has required me to work on a lot more with Azure. And so, you know, I’ve been delving into the various cognitive services that Azure provides. And it’s blown me away really some of the things that we can now; one of the things we’re working with a lot is document processing. So it’s yeah, it’s been a lot of finding ways to quickly get information from documents. So I’m the big C# person, big fan of the language. It’s my main language, but I would consider myself a full stack developer. But that’s the kind of loose term these days, it’s very difficult to define. But yeah, I guess that’s me in a, in quite a large nutshell.

Jamie

I get that. The, I’m still very much of the opinion that - and people can complain to me if they want, but it’s an opinion, right, everybody’s got an opinon. Full stack developers can’t exist, because like, alright, the front end technology is probably JavaScript or Blazor. But like just saying JavaScript, you’ve got React, vue, Svelte, Angular, you know what I mean. And to say that you’re a full stack developer, to me, says, “I know enough of all of these technologies to be dangerous with them,” or “to be able to build something with them.” But like, that requires a lot of, and I’m not saying it can’t be done. But that requires a lot of patience, and time and investments along those technologies, and then to like semantically separate them in your own knowledge and experience. So if people can do that, fantastic, I can’t. So I’m like, I’m serverside. specialist is me. I’m happy with with just going, “nope, it’ React. I haven’t got a clue.”

Nick

Yeah, I think I think for me, like, especially when I’m hiring, when they’re looking for this, when we’re using this term, full stack developer, I kind of hate using it, but I kind of have to because the market expects it. So I always try and make it really clear that when I refer to someone who’s full stack, it’s more that they have experience in picking up frameworks quickly. So, you know, for me, being able to jump into Angular, and being like, “okay, I get what this is, I can run with the documentation,” that for me, would qualify, you know, it’s a very gatekeeper term. But it’s also expecting way too much of people, I think, like you say, like, if you were to expect that a sull stack person knows all the JavaScript frameworks, there’s just not a thing. It’s just I don’t even see that as being possible, because the new one comes out every three minutes.

Jamie

I think I’ve made this joke on another episode. But in the time we’ve been talking 19 different JavaScript frameworks have been created and then abandoned, right?

Nick

Of course, yeah, GitHub is just bloating.

Jamie

And they all use left-pad.

Oh, my goodness. But yeah, yeah, I totally appreciate that. So that’s the thing, right? I feel like maybe, as developers, maybe we need ubiquitous language. Right?

Nick

Yeah.

Jamie

Rather than having ubiquitous language to help us deal with customers and clients and users, maybe we need it, right? Because like I said, my description of what full stack means is different to what your description of full stack means.

Nick

Yeah, definitely. Definitely. And it’s, it’s down to the same sort of debate about, “what is a junior developer versus a senior developer?” To me, there is no specific definition. A senior developer at my company might be a junior at another company. It totally depends on what you’re doing. So yeah, I really kind of I really dislike the labels. But unfortunately, in my position, I kind of have to use them because people just don’t - we don’t, like you say, we don’t have that ubiquitous language for those specific areas.

Jamie

Yeah, I think I think perhaps our - this is me just intellectualising. And I shouldn’t do that, because I’m not an intellectual. But like, maybe because we’ve, like our industry has come from engineering, right? Where, you know, if you’re a structural engineer, you go from being someone who is more junior into a senior role, when you have enough varied experience. But the set of things you can get very experienced with a very, from my perspective, as an outsider are rather limited. Whereas, because, we said it just before we started, every company is a software company, right? And so your varied experience may be more varied than my varied experience, or vice versa? And does that mean that my varied experience is somehow inferior to yours? Does that mean that your varied experience is inferior to mine? And that’s, I think that’s the problem is because we’re all solving slightly different versions of the same problem, but with billions of different types of technologies and frameworks, and where it’s hosted, and how it’s been built up, and all these kinds of millions of different variables, whereas, and I’m going to reduce data, structural engineering quite a lot here. But when you build a bridge, there will be a specific prescription of steps you have to go through to build the bridge, and it won’t be that it changes slightly, because you’re building a bridge for one company. And then you go to a different company, you build a bridge there, and they have a completely different process. Right? So maybe, maybe we need some like guidance from other engineering. Or maybe we shouldn’t think of ourselves as engineers. I don’t know. I like I say intellectualising, it’s for people way smarter than me to figure it out.

Nick

We’re getting philosophical. Now, Jamie, this is this is big.

Jamie

It’s starting to sound like an episode of Tabs and Spaces.

Nick

Wait, that’s not what this says? No, I’m joking.

Jamie

No, it’s not.

So you talked about robotic process automation. So let’s, would you mind quickly, just running through what that is?

Nick

Yeah.

Jamie

Just real quick, just to set like a baseline, just in case someone’s listening in and going, “those are words, definitely those words, and they arranged in a very pleasing order. But I don’t know what the actual phrase means”, so would you mind it to sort of say?

Nick

Yeah, sure. So it’s got the word robotic in it. But it doesn’t refer to robotics, as you would know it in a more mainstream way. So essentially, robotic process automation is a, I wouldn’t say it’s an emerging field, but it’s gaining a lot of traction. It’s been around for quite a while. But I think it’s becoming more mainstream. It’s the automation of repeatable business processes. So a lot of people argue, Well, isn’t that what software is anyway. But it’s more specific to things that human beings would normally do manually. So specifically on a computer. So one good use case, I think, is one that we work with a lot, which is like invoice processing, or some finance back office task. Like what what does the human do they they look at an invoice, they take the data from it, they interpret it using their knowledge of the domain, and then they import that data into a system like sage. So robotic process automation and compasses, translate, translating those human processes into ones governed by software. So the robotic element comes in, because when we’ve created this programme, which performs those tasks, we refer to it as a bot. And it’s robotic, because it’s repeatable. And so you have to be careful not to stray into AI territory, because this is much more like I’m following a script, there is a concept of Intelligent Automation, which again, we can talk about. But at its base level, it’s, you know, we map out a process that would normally be carried out by a human, we write some software that does the same thing. And it’s usually geared towards going as far as emulating mouse clicks and sending keystrokes and things like that. So literally, if your job was to double click an icon, open an application, type some data in that we’d be writing software, which does exactly the same thing. Now, obviously, there’s some controversy around that because people often react with Well, are you just trying to write me out of a job? That is it’s a big, too big subject. But one of the things we’re trying to do at our company is work on the philosophy that it’s not intended to replace people, but it’s actually intended to remove the more menial aspect someone’s job. So someone who’s in finance could be a have a fantastic analytical mind, or someone who could be really good at solving problems. But they’re spending most of their day doing data entry. Well, they don’t need to be doing that they could be doing the more important things. And there’s always going to be an element of this will save us money in people. But I think that’s always been something that’s been there with technology to a certain extent. And it’s just meant that the markets have to change. So yet, robotic process automation, at the moment has some big players like UiPath, automation anywhere. And these are like low code solutions that allow you to drag and drop commands together to build what we would call a bot that fulfils a process. And I guess, at its core, that’s that’s what RPA is.

Jamie

Okay, cool. There’s a couple of really interesting things that you’ve that you’ve said that I’ve inferred, you know, I’ve inferred some things from what you’ve said that. The first is a really, it’s really interesting that like, a couple episodes ago, at the date of recording, we had an AMA, like an Ask me anything. And one of the questions was, what is the future of, you know, what’s the next five years of development look like? And I said in that episode, that I think low code, no code is like, the next step. You know, this idea of like you say, you’re dragging some commands around, do this, then do this, then do this. And I think that’s, it may not be the next step of direct the low code, no code blocks around, and then suddenly, a fully working, protected 100% wonderful app comes out the other end, this isn’t to say that I’ve got no code doesn’t do that. But I feel like perhaps you need a human in the loop, like an engineer or a business analyst or something that actually goes right, okay, let’s just tweak these things to make it a little bit more safe or secure, or something like, let’s say you have a system where it’s like, move to this very specific point on the screen and double click to open the icon, right. whereby if you move the icon, and I think that was that was one of the problems with like macros. There were many problems in macros. But one of the problems with macros and like Microsoft Word or whatever is if you move the toolbar, you macro no longer worked and, and things like that.

Nick

Yeah. The human in the loop phrase that you used is one that we use all the time. So the one of the big parts of what I’m doing at the moment is trying to, we’re trying to solve the problem of entry beat, making it easy to introduce humans to an automation process. So you know, historically, in an RPA project, if there was some kind of like business exception, or something that didn’t happen, according to what to plan, which is going to happen, because you know, the robot is only as good as what you tell it to do. And the information that you give it, it’s been very difficult to bring someone into to tell it what to do next. And to, to catch that. And if you could, then like you say, it was usually a developer or a business analyst. And those people don’t necessarily know the domain. Like, if we’re still using the finance example, you bring Bob from development. And he’s got to go and speak to Simon in finance to find out specifically what’s going on. So what we’ve done is we’ve created a web application where the bots can post analytics so they can see nice things like this is how much time you’ve saved. There’s so many processes you’ve got running at the moment, but also a kind of inbox to say, these events have happened that require human intervention, and then you can address them and send them back. So that that’s, that’s a quite, I think, novel area of RPA. That hasn’t been explored as much. Because it’s been very much developer centric, whereas this makes it more end user centric. If anything, it makes it so that we acknowledge that humans are always going to have to play a part in these processes.

Jamie

I like that idea. Because I can’t obviously, I can’t see it directly from the end users point of view. But you know, if I am a person in the business, and I’m worried that this system is going to replace me, including me in that may make me feel a bit more relaxed about well, actually, a is not perfect and be you know, there are other parts to what I’m doing anyway, that still need to be here because you can’t, you can’t automate every single part of a business right? Because there’s one of the things that I tell people all the time is that there are like fuzzy parts of your of your organisation that don’t quite fit in any kind of logic, you don’t realise are there and it might not be your business. It might just be your operation. It might be a department, the billing, there’ll be something that isn’t automatable and that’s just down to the way the business works, not because it can’t be automated just because of the way the business works. And so I feel like keeping people involved in that decision, keeping the people involved in the process keeping people actually working with it is a great idea. And I feel like there was something that you’d said that, again, I’m inferring. I’m not, you didn’t say this exactly. But you talked about how if you’ve got someone in finance, they might be spending their whole day doing data entry, rather than the actual, you know, the finance part of the work that they do in development, you know, you may think you may say, Oh, well, you know, I’m in meetings all the time instead of doing development. But you know, we deal with people, but the thing that I took from that is that perhaps this is a way to sort of reduce toil. So like, it’s the busy work that you have to do in order to get you to do the work that you are, like, your main objective for being there today, that

Nick

sort of orchestrating your job? In a way. Yeah.

Jamie

I like it. Okay, so we’ve talked about what RPA is. So how, how does this fit into as you call as your as your however you want to say it, the cognitive services stuff?

Nick

Yeah. And I never know what to say, as your Azure. I feel like our American friends say Azure, don’t they but, and I started off saying as your but then I can, I can never decide but I’m gonna go with Azure, because,

Jamie

okay, fair enough. But um, the transcription for this episode is gonna be pretty, it’s gonna be Azure, Azure, Azure, Azure,

Nick

Azure. But yeah, so primarily, one of the big reasons that we would look to Azure is that we’re very new business. And so when I joined the the business was, was already going for about a year, and they were pivoting away from reselling automation products, and they wanted to make their own. But obviously, you know, time is, is really important. And so having something to get you quick to market is, is integral. So we looked at a lot of open source solutions that we could use as a basis for building our own bot technology, really, and that was all in dotnet, which is great, because I’m a dotnet developer primarily. So we’re still in that world of dotnet. And so when we engaged with our first customers, we found that most of the clients we were seeing, were looking at document processing. So most of their processes involve some kind of digitised paper, you know, whether that’s an invoice could be someone’s personal details for a pension, it could be anything, really, the sort of constant was it was some kind of document. And so we needed to be able to quickly pass that document with OCR. So when I joined, abi was in use, such as an OCR solution, which, which is very proven, it’s, you know, it’s very mature. But I think it was kind of overkill for our use case. And so we needed to find something that fit in with this pay as you go. Sort of billing that we wanted to bring in because that that’s also something which is quite new for rpa, you know, the idea that you only pay for what you consume, we’re trying to pass on those cost reductions to our customers. So Azure was a really good choice in that sense, because it offers these API’s that are just sort of fire and forget, you just chuck a document at it. And what comes back is just JSON. But in reality, this really amazing artificial intelligence system has just run just for you. And it costs you like pencil or cents, you know, so, so that that was a kind of no brainer for us. And and it’s now an integral part of our stack. So the main one being the the Azure form, recognizer component. Because your OCR is hard, it’s really difficult. And not only can we get a wouldn’t say instant response, but we’re talking seconds with a really well structured JSON object, which gives us all the the integral information about the document. There’s also these pre built machine learning models for specific document types. So we can go as granular as saying, okay, the document we’re dealing with is an invoice. So just use Microsoft’s invoice machine learning, because they’ve done all the hard work for us. And so you know, that that is, that’s huge for us in terms of accelerating, obviously, we can. Also, we also have the option to do the custom training. So if you’ve got a very complicated document, which is relatively consistent, then we can we can train the document ourselves if we want to. But again, just having that flexibility, and then turning what is or was 10 years ago, a pretty big data science activity into just an API call means that we can do a lot in a very short space of time. So yeah, I think that that’s probably why we gravitated towards it. Obviously, AWS and Google have similar provide sort of provisions as well. But being a dotnet, sharp, it just sort of made sense.

Jamie

That makes sense. And I feel like if somebody was to say to me four or five years ago, you know, all of the big providers will be providing machine learning as a service, which is essentially what this is, or at least the issue of cognitive services stuff, like, because I’ve looked into it for transcriptions for the podcast, you know, because obviously, I want, I want as many people to be able to consume it as possible. So if you don’t know there’s a transcription that comes along with a podcast, it is on the website, I am looking at ways of embedding it into the, into the audio, but not many of the podcast, audio players support that at the moment. So there’s a couple of competing standards. And so what I’m doing is I’m kind of doing it the wrong way, which is waiting for a standard to win. Or it means that, you know, the listeners are then left in the dark with, well, how do I actually consume this, but so there’s a machine built transcription for each episode, and I’m paying for a service that does it for me. But there is there’s an as your speech to text, there’s a Google speech to text, then there’s an Amazon speech to text. And there are several others, there’s even an open source app, you can run on your computer. But you need to bring along the model, right. And I think my understanding and this may be incorrect, is that before you can use a machine learning system, you have to like build up a corpus of model data, that’s like, you know, my, the the example that I was once taught was, in order to teach the computer what a cat looks like, you have to show 1000 10,000 A million pictures of a cat in different positions different in different positions, different places, different breeds of cat different colour patterns of cat, because it’s only after those 10,000 pictures, a million pictures that the computer is able to figure out what the features that make up a cat like the pointy ears or the whiskers or whatever. And even then, depending on your your data set, you’ll only get maybe 98 99% accuracy. Because you know, there could have been some outlier image that’s like a person dressed as a cat or something kind of messes your data setup is that, do you know whether that’s what they’ve done? Like obviously, not with cats like but with invoices or something?

Nick

Yeah, I can’t speak whether they’ve got a dedicated, they’ve definitely got not got a dedicated endpoint for cats. I can I can give you that. But they Yeah, so I mean, this is only me surmising it, but I would say that they’ve taken a huge sample set of invoices. And then, you know, they also do identity documents, as well. So passports and things like that. And they’ve they’ve, they’ve done the machine learning modelling ahead of time. So they’re there, their job is to upkeep those those models. And we see that in the release of new versions periodically. So, you know, my the team that actually consumes this tech that builds the sort of RPA projects for customers, recently put a DevOps item in for me to update the version that was using, because it’s been proven that the models are just better in that version. So it’s kind of making it so that we have to say, are we using the latest models. But it just, that is such a huge time saver. But also, you know, we, we still have the issue of, well, what if their models don’t contain modelling for a specific document that we want to process. And we have seen that, you know, we’re not only processing invoices, and documents don’t always have those consistent layouts. So that’s a huge issue when it comes to things like OCR. But we’ve even found a way around that, you know, the obvious way to do it is to train custom models. But then you’ve got to do a training set for every different kind of layout that you encounter, which we did actually try for a while, and which was, which was, frankly, crazy. And obviously, all the all’s it takes is for one of those layouts to slightly change, and you’re back to square one, or at least you know, your your data becomes less reliable. So just as we were, well, just as I was beginning to despair, Microsoft updated their document analysis endpoint as well, which was, again, just an API that focused on just generic document metadata. So straight away, if I had a field on an invoice, for example, that wasn’t being picked up, because it’s not a invoice, see inverted quotes, kind of field, then I could do a second call to a different endpoint and just say, give me all the fields, and then I’ll just look for that field in that. So then I don’t have to mess around with a different training set. I just do two API calls instead of one and the cost implication is negligible. And you were talking about confidence levels as well in terms of how reliable the data was, you know, so how confident Am I that it’s not a grown man dressed as a cat as opposed to a cat. That’s another fantastic thing about the way Microsoft outputs the data from this API is that they give you the confidence score for every single item they’ve extracted. So we can make that decision programmatically to say, we’re happy with a threshold of 80% confidence. And if it’s below that, then we reject that document as being unsafe. So even though they’ve done a lot of the work ahead of time with the machine learning, they still give us a lot of control by giving us some really verbose data. So yeah, I’m a huge fan of it.


A Request To You All

If you’re enjoying this show, would you mind sharing it with a colleague? Check your podcatcher for a link to show notes, which has an embedded player within it and a transcription and all that stuff, and share that link with them. I’d really appreciate it if you could indeed share the show.

But if you’d like other ways to support it, you could:

I would love it if you would share the show with a friend or colleague or leave a rating or review. The other options are completely up to you, and are not required at all to continue enjoying the show.

Anyway, let’s get back to it.


Jamie

So here’s what I like about that is something you said earlier on, it saves all this time, right? You said something along the lines of 10 years ago, there would have been this would have been an NP hard problem to solve, this would have been a non trivial thing to solve. Because you, you know, you would have had to, or a team of data scientists would have had to come in, create a system that could then learn from your document, your source document set, and build that machine learning model. Whereas like you said, All that hard work is done for you, you can then actually focus on solving the actual problem, which is, you know, here’s this block of maybe, you know, this purple block that lives somewhere in the cloud, we throw a document at it. And then excuse me, we throw a document at it. And then we get the information out that we need, oh, it’s, you know, it’s an invoice for this much going through this person, due on this date. And whether it was paid or not, perhaps you get all of that information straight back. And that feels like it’s C#. To me, we talked a little earlier on about that the worry with RPA. And like the this the worry of it replacing jobs. I feel like it enhances them. We talked about reducing toil, right? And that like looking through if you’ve got 1000, or one invoices to action today. And, you know, it takes you two minutes to do each one. That’s 2002 minutes that it’s gonna take you to get through that pile of stuff. Yeah, if you can throw it out a system, which can deal with, even if it can, I’m not saying your system can’t but even if it can only deal with, say, half of those. And it takes a minute to do all of that half, you’ve immediately half the workload. I like that I really do.

Nick

Yeah, the The nice thing about it as well, is that, because we’re doing it programmatically, we can, alongside that we can track how long that’s taken to do. So when we we typically try to get an estimate from the companies that we’re engaging with to say, you know, on average, how long does this take you per document, and then we use that as a benchmark and say, okay, compare that to how long it took the bot to do it. And from that, we can generate a handling time. So we can see how much that’s reduced your handling time. But what we’ve also seen recently is that, because we’re no longer focusing on just doing the task, and we’re analysing something else that’s doing the task, we’ve got the time to look at the data and say, maybe we’re not doing that as efficiently as we could do it, maybe our data isn’t very clean. And so straightaway, you’ve not only automated a process, but you’ve also found the time, and the ability to analyse the way you do things, which again, could be done by that person who used to do the data entry, they can focus on innovating and making processes better than just doing the process.

Jamie

Yeah, and I feel like it’s something that we do as developers all the time anyway, in our actual practice as developers, like, who wrote assembler, right? The people who get paid a lot of money to write assembly, I can’t speak for you. But I certainly don’t write x86 assembler anymore. I mean, I used to so for those who don’t know, this is like the one step above the individual bytecode on your actual computer, right? It’s the code

Nick

that you write if you hate yourself and your time.

Jamie

Pretty much yeah. But it’s also like, if you want to eke out the most performance of your computer, you go right the way down to the machine code level and you write at that level. But every line of code that you are I have ever written and I’m pretty much I can put my hand on heart and say, any line of code that any of the listeners have ever written, has at some point before execution have been converted down to that, that low level. Now there’s a big market for being able to write assembler, and that’s great. But what’s great about C sharp Python go all of these higher level languages, these more modern and sort of like, dare I say 21st century languages is that they abstract all of that away. So you can actually get on with the, the problem at hand, which is displaying the word hello world on screen or dealing with, you know, give me this document and give me the data that accurate. And by abstracting away, you’re allowing that worker, that person whose usual day job is to deal with all of this big pile of documents, you’re abstracting away, the actual, they’re, like I said, I said earlier on toil. I’m sure you’ve said, process several times, you’re abstracting that away, you’re saying, Yeah, I will deal with that by pushing this button. Right. But that, like you said, it gives me a chance to go away and actually talk about, maybe there’s some maybe, maybe I’ve had this pipe dream of always being able to like, I can see that there is an inefficiency. And if I could get away from this big pile of work I have to do, I can solve the inefficiency, which will remove the big pile of work I always have to do, which allows me to do the work that I you know, some other work that I desperately need to do or want to do or whatever. Yeah, and it’s almost like a force multiplier, I think. I mean,

Nick

it’s to be honest, it’s no different to any other kind of automation in history. I think, you know, thing, if you the further you go back in history, the more we’ve done manually. And I don’t think this is really any different. It’s probably, I mean, it’s still, to some people feels like magic. But that’s because they’re so used to their everyday domain. And it’s another thing that’s made it really hard for us is that it’s so difficult to demo. Like, it’s really hard to show this stuff if you’re not making the bot click buttons and stuff like that. If if we say, Oh, you’ve got an API amazing. That’s like a dream for us. But it’s also really hard to demo, because it’s all back end. And it’s the same goes for this really. But with form recognizer in particular, even though we’ve we’ve talked about how, how easy it makes things from a process perspective. So from the perspective of the person who used to do that process, it’s still it’s not a trivial thing for the developers still, I mean, yes, we’ve got nice handy tools, like API’s, and we’ve even got a really nice SDK for this as well, if you’re a C sharp developer, but you know, they could still you still have to handle when requests fail, you still have to interpret the results, you have to pass the JSON. So all those developer II things still exist. And there’s still other challenges that come with it. You know, for example, when you call these endpoints, they don’t give you a response. In milliseconds, like a lot of API’s do, because why would they, you know, they have to go off and run these processes. So they instead give you this, it’s quite cool. Actually, I think they give you back this sort of check back later URL. So they say, accepted your request, check back on this URL later. And then you can either just ping that URL, or if using the SDK, you can use specific methods to say it has that request finished. And if it has, then you can pass it. And so you still have to build that mechanism to wait and to check to see if it’s finished, and all that stuff, which is cool. It’s a nice challenge. But it’s still there. You know, so if anything gets shifted a lot of the complexity to developers, which I think is right, but again, it’s something that’s all that stuff is really difficult to get across to the people who are consuming it.

Jamie

Sure, yeah. I can imagine that it is, I guess it would be it’s not the same because it is kind of magic. It’s a automated system, right? But like comparing typesetting, from, you know, the 1800s, or the 1600s. Like, I remember, when, when the Gutenberg press was invented, right? You had to take a bunch of lead blocks, which were the characters but written backwards or carbon into it backwards, to lay those out across a row, run some ink over it, and then press it against the page, and then eventually, you know, your words would come out, because that were like 200 years later, we’ve got moving typewriters, which would have been seen as this is wizardry, because it’s like, instead of having to typeset the whole thing, and run the NG cover press are greatly reducing the problem. But then, of course, we’re like, instead of having to do that, I can just literally teach Schmidt is there, right? or, indeed, Henry Ford, right? He didn’t invent the automobile. He just sort of made it really economical. And he’d said, if I’d have listened to the customer that have just asked me faster horses and to people who who were used to travelling about through either a horse driven away, getting into a machine that then seems to be driving itself, I suppose might have been as magical perhaps.

Nick

Yeah, definitely. I think it’s it sounds grandiose as it sounds, but it but I really do think it’s true, you know, Whenever my wife says it all the time, you know, she lives in spreadsheets. And whenever people ask, what what does your husband do, she’s like, well, he tries to write me out of a job. And I’m like, no, no, not necessarily, you know, you know, I’m trying to make it so that you can do coolest stuff, you know, but there will always be that kind of that not necessarily aggression towards automation, but fear, maybe fear, I think it’s there. And it’s, and sometimes as well, people make the Skynet comparison as well. They’re like, either you’re trying to take my job, or you’re trying to take over the world with with the artificial intelligence, and it’s just so out there. It’s funny, but it’s Yeah.

Jamie

Excellent. Okay. So you talked about with these API’s and the SDK is that as your give you that, presumably, Google and AWS do something similar, you can you can send off something to be processed. And because it takes some time, you have to wait. And then there’s, there’s a process for getting that data back. What does that data look like? Is it just like, here’s some JSON without a schema and you don’t? Or is it? Like, is it like, structured? Is it unstructured? Do you have to like write custom pauses to get that data back was like, because I could imagine, if I’m saying to assist them, here is my data, here is my file, give me the data back, I would want the data to be some kind of well known structure, but then it doesn’t know what I’m giving it. So how can it provide me with structured data? So is that the case?

Nick

Yeah, so it is it is very structured. But I think you get a different experience, depending on whether you’re using like straight REST calls with JSON, or whether you’re using the SDK, which, which I think is a pretty normal thing that’s quite normal for most API’s. I’d say, yes, it depends on your use case. But But yes, there is a known structure. So for example, if you call invoices, for example, you’ll, if you’re using the pre built model, you’ll always have a property called items. And that’s going to be your line items. And that’s an array, it’s always going to be like that, you’ll always have a field called vendor name. And it will always be named vendor name. So you’re able to rely on that schema. And it’s probably easier if you’re using the SDK, because you can just consume a more complex object and .NET. So you’d be able to just say, invoice dot vendor name. So yeah, that’s the, that’s the bit for me that makes it so compelling is that you can just treat it like any other object. It’s been a bit. It’s why I guess it’s been doubly complicated for me, because I do this call as part of a piece of software consumed by developers who are then not interested in the raw JSON that comes back, which is, as you can imagine a huge object, especially if you call against the document analysis endpoint, which is just give me everything that’s on the document, which could be tables, key value pairs, even does entities. So it will recognise that a person was mentioned or an organisation was mentioned, which is really cool. But for the developers who are consuming the objects I get back from Azure, they don’t care about a lot of that stuff, they may just want key value pairs, or they may just want line items. And so there is a lot of JSON parsing I need to do just to cut down that response and give it to them in something they would consume as part of an RPA process. Because they would just say, I get the response back. And then I take the line items into a data table, and I just write them into sage. That’s that’s their automation. So yeah, I’d say it has to be very structured and very, it has to be a known structure. Otherwise, the whole thing just doesn’t work. That’s what separates it from other OCR solutions, like Tesseract, for example. Because that will just return any plain text that it sees. And sometimes it will just look like how it got hieroglyphics if it’s, like a really poor quality document. And so, yeah, that’s that’s the, that’s the difference for sure.

Jamie

Okay, and then, presumably, with this with this, almost like a poco, right is a C sharp object, right? With this C sharp object, that gives you slightly more control of how to reason on the data, presumably, because you’ve said here’s a bunch of maybe their scans, maybe their PDFs, maybe their photographs, whatever you’ve got when you’ve gone, here’s a bunch of data goes sorted out, and then you get all of these, these pokers back, you can then can, that that that tells me that you can reason about that data, right? You can say hey, I’ve uploaded 1000 documents, they’re all JPEGs but guess what, I’ve got all Have a data now. Right? Yeah, how that works.

Nick

Yeah. And so and so there’s, there’s, there’s also, in most cases, a need to refer back to the original document. And so what we do is we you have a choice with form recognizer, you can either, you can either serialise the data down to bytes and use the local file, or you can upload it to some of the internet based storage, I tend to use Blob Storage just because it’s simpler, and then just give it a URL. And so we do that as part of our process, we’ll upload the invoice to secure blob storage, gives us a URL. And then we say to Azure form recognizer, use this URL to give me JSON for all the extracted data. And then because the data is just hours afterwards, you know, once they passed it, obviously, I talked earlier about how they need to raise these exceptions and the platform. And so what we do is we just take the extracted data, encode it as base 64, which is not so much of a problem with that raw string list literals that will solve that problem soon. JSON within JSON and all that sort of stuff. And then we can just send that up to our MVC application, which is just an app service. And then we can decode that on the other side. So we can say, in a Weber, this is what the bot extracted. And then we can show an iframe to say this is the original document. So it’s just it, we wouldn’t have been able to do that if we didn’t have that structure that comes from the response from form recognizer.

Jamie

I like it, like it. One of the things that you see, you’ve mentioned a couple of OCR systems so far, and I just want to, I want to give you the chance to say it, because I’m, you know, I’m worried that people are going to be like, but But Abby is brilliant or Tesseract is brilliant, right? I feel like as with everything in development, then please correct me if I’m if I’m wrong, and my assumptions is just that those particular services and libraries didn’t fit the solution you were trying to build, right? Whereas if it feels from what you’re saying that the Azure Cognitive Services and form recognizer fit your solution a little better? Yeah, definitely.

Nick

Yeah. I mean, there’s a couple of really big points on, on why it was just better for us, you know, Abby, is, is obviously very proven. And it’s a very mature technology. But it was kind of it. Like I say, it was overkill, because it was trying to do a lot of the things that our bot technology already did. So it was trying to evaluate the content that it extracted, and say, you know, check for specific values and use rule based extraction and all this sort of stuff. But we were already going to do that. Anyway, we just needed to get the structure the the actual content in a structured manner, that was the only thing we needed. And so you know, I guess you can do that with Abby. But the way it was configured, it was, it was kind of like, if you did that you would be under utilising it. So that brings a financial aspect into it as well. The fact that we can just say, well, we pay a couple of pence per page. And therefore we can sort of anticipate how much it’s going to cost us rather than having a licence contract for a year or something like that. So for some companies, that’s perfect, because I’m because they they may have your sort of regulatory things that say, you need to use specific specific technology or it needs to be on premise. That’s another thing about RBS, you can do it very well on premise. And so it’s not that Abby’s bad at all, I don’t think that I’m, I have the authority to say that because I’ve not used it enough. But I know for sure that just didn’t fit our use case. And it’s largely the same for things like, you know, Amazon textract, is the sort of equivalent on the AWS side, but it’s just that little bit more effort, when we’re already in Azure. You know, it’s, I guess, that’s kind of one of the advantages of being a cloud provider, is that once you start using one resource, it becomes sticky. You know, if you’re using a SQL Server, Azure SQL database, then you start looking at Blob Storage. And then you start saying, Oh, well, what’s this AI stuff? And before, you know, it, you kind of heavily invested in, in this ecosystem, not to say that you have to just use one cloud provider. But you know, it’s similar to our strategy in the sense that, once we’ve demonstrated how easy it is to automate one process, you’re gonna want to do another one. So you’re so yeah, it’s very much the case. I agree. It’s, it’s not the because happy is an inferior technology at all, far from it. It’s very complex, and it’s quite impressive. Just overkill for us.

Jamie

Sure, and I feel like there’s an important lesson therefore. I try to shy away from titles when I’m talking about developers too, but for those who, perhaps earlier in their career Just because a library exists, don’t don’t try to shoehorn the library into your solution, right? Yeah, look for something that fits well with what you’re attempting to build. That’s, that’s what I would say it sounds very much like you or someone on the team, or someone, somewhere, has done their due diligence and said, look, there are these things. There are these libraries, we could use these systems we could use. But there’s also this other one that we could also use, and perhaps, you know, I’m probably putting words into your mouth. And perhaps there’s been some kind of discovery work that’s happened, and you’ve got an actually, we think it would work better with this system, only because, you know, this system has these benefits and these drawbacks, but these other systems, which already exist, have these benefits. And these drawbacks, yeah. And then you present to the decision maker, and they say, Well, you know, what, if what you’ve said is, right, why No, I trusted it is because, you know, you’re the expert, let’s go with this.

Nick

No, that’s it. It really is as simple as that, to be honest, it’s, you know, what, what are the best tools for the job, and it’s never a reflection on the technology, because the use case is this myriad use cases for document extraction, you can’t just say that, you know, form recognizer as the last one. And that’s it, because it’s far from it, you know, most of the, the SDK is still in preview. And there is a risk there, you know, it’s being, you know, going into production with a preview framework. But it also is a testament to the, to our opinion of the technology. And it was really a need for Need for Speed, which is not a phrase of ever anticipated. So yeah, you know, I joined and it was like, well, we’ve got all these problems, you know, the OCR is not quite there, it’s very opinionated, it keeps trying to evaluate the tax, we just want to look at the tax. And I was like, Okay, I’m gonna go with an API based approach, because that’s light, and it seems to fit that need. And it could be very well that there’s other people in a similar situation who are using Azure. And actually, they need a lot more in terms of evaluating the data that they don’t have in their current stack. And ABI might be really good for that. Because it can do things like say, Okay, I’ve extracted this, and I see that it doesn’t contain the thing you’re expecting. So I’m going to send it over here. In a way, if you’re living in Abbey, and it’s, you want it to be the major part of your stack, it’s probably the way to go. But if OCR or text extraction is just a small part of a wider process, that I’d definitely say that you’re looking more likely at the cloud based providers.

Jamie

And you know, that makes sense. I like that you’ve, that you or someone on the team or a bunch of someone’s on the team have done that due diligence. Because it’s so easy sometimes just to fall into Yeah, well use this technology, because we’ve used it before. It doesn’t fit this problem we’re trying to solve, but we’ve used it before. I think that fits well with what you were saying towards the beginning, when you said about your definition of a full stack developer is someone who can sort of deeply grok a new library very quickly, you have enough experience of reading through code, which I think is, I think, is one of the most important skills as a developer is reading other people’s code, you have enough experience of reading that to be able to go right, I know where the important bits are. Because look, you know, if you’ve got 200 lines of code, and you’re only looking at one method, you don’t want to start at the top of the file and work your way down one line at a time, you jump straight to where you need to be. And then you gather the context as you go. So I like that you’ve you’ve done that. And what I would really like to do is have a chat about bringing things into production when they’re in Preview. I’m also very, very aware that you have a limited amount of time for us today. So maybe we’ll have to table that for another call another another episode.

Nick

Sounds good. Yeah, it’s a shame but it’s yeah, it’s there’s so much we could talk about in terms of the sort of journey we’ve been on recently, but a lot of it has been, you know, it’s it’s been the bleeding edge, you know, you’ve been trying to we need a piece of technology that does x quickly. And that is pretty scary. But, you know, Ash has been there for us. It’s it’s been an amazing tool. And again, it’s another testament to C sharp as a language. There’s this general purpose language that kind of does everything, you know, it’s obviously not the be all and end all there are other languages which are perfect for specific use cases, just in the same way we’ve said for ABI and OCR, but you know, as a C sharp developer, I feel a lot of freedom. I can feel like I can solve most problems, which is which is awesome.

Jamie

Yeah, I like it. I like it. So what I’ll say is Uh, Nick, what about if people want to learn a little bit more about maybe? Is there some documentation that you that you’ve read up recently? Or Can folks learn a little bit about the system that you’ve built, perhaps, or getting in touch with you or learning about what you’re doing? What are the best ways for folks to do that?

Nick

Yep. So you can, you can get ahold of me directly on Twitter, I’m just Nick proud. On LinkedIn, the same same handle, I post a lot about rpa, and C sharp on YouTube as well. So if you just search for Nick proud on YouTube, you’ll find my videos there. I’m always looking for new ideas on content, because maybe I’m not very imaginative, maybe I just need to help. So you know, for certain topics that you want to learn about, or, you know, if a form recognises that that one of those then then just just just hit me up and let me know. And then in terms of the technology that I’ve been working on, you just search for next biotics, that’s any x boti X. And you’ll see how we’ve been using .NET and Azure to build bot technology that brings humans into the loop.

Jamie

Awesome. What I’ll do is I’ll collect a bunch of links, and I’ll make sure to put those in the show notes. Because if anyone’s listening along, they don’t have to worry about how do you spell next buttocks again, don’t worry about it. It’s in the show notes. It’s in the transcription, and there’ll be a link right at the bottom of the of there as well. So go ahead and check those out. Click through. But I guess all the really remains to say, Nick is thank you for being on the show I and I say I said at the beginning, I’ll say it again. I really do say this every time but I honestly do enjoy talking to everybody in the community. And I’m walking away from this conversation with I’m gonna go learn about form recognizer. And I’m going to do some looking into, you know, this, this automation stuff for different providers, right, because there’ll be different API’s everywhere, so, and everybody will do it slightly differently. It gives me a greater breadth of knowledge of what’s out there. So thank you ever so much, Nick.

Nick

My pleasure. And obviously I forgot to say as well the form recognizer Doc’s from Microsoft are your best bet. So just just search Doc’s form recognizer, you’ll get everything you need.

Jamie

Awesome, awesome. Well, thank you ever so much, Nick.

Nick

Cheers. Absolute pleasure. Thanks for having me on.

The above is a machine transcription, as such there may be subtle errors. If you would like to help to fix this transcription, please see this GitHub repository

Wrapping Up

That was my interview with Nick Proud. Be sure to check out the show notes for a bunch of links to some of the stuff that we covered, and full transcription of the interview. The show notes, as always, can be found at dotnetcore.show, and there will be a link directly to them in your podcatcher.

And don’t forget to spread the word, leave a rating or review on your podcatcher of choice - head over to dotnetcore.show/review for ways to do that - reach out via out contact page, and to come back next time for more .NET goodness.

I will see you again real soon. See you later folks.

Follow the show

You can find the show on any of these places