Building Great Experiences Using LLMs, with David Hariri

Ep 18

Oct 25, 2023 • 58 min

0:00

ABOUT THIS EPISODE

Ada co-founder David Hariri joins Allen to talk about an audacious product discovery process that kicked off a 250-employee startup, limitations of hand-scripted workflows, strengths and weaknesses of LLM techniques like code execution, RAG, and function calling, and how support automation may create whole new business models.

Download Audio

Get new episodes shipped to your ears

Apple

Spotify

Castro

Overcast

Pocket Casts

RSS

TRANSCRIPT

Allen: Welcome to It Shipped That Way, where we talk to product leaders about the lessons they’ve learned, helping build great products and teams. I’m Allen Pike. Today’s interview is with David Hariri. David is a designer and developer who is now co-founder of the AI-powered customer experience startup Ada Support. Welcome, David.

David: Thank you very much. Thank you for having me. It’s a pleasure to be here with you, Allen.

Allen: Yeah, I’m glad you made the time and joined us. I think we had a conversation maybe a couple months back about some of the work you were doing and some of the work I was doing, and something really stuck in my mind. I was like, “This is a person I want to talk to more and learn some things from and be able to talk about this stuff.” So I feel like this is a fun venue for that.

David: The feeling is mutual. I’m excited to just get into it, continue the conversation from where we left off.

Allen: Perfect. Before we get into probably some of the things that we will be talking about, LLMs and how we can build products on them and some of the challenges of building and scaling out the kind of business that Ada is using AI to automate things that previously were difficult or thought may be impossible to automate, which is all super fun, interesting stuff, how do you like to summarize your story so far? That David’s story and the Ada story in a soundbite kind of way for the audience.

David: My story is that I love building products and I love having a lot of control over that. And so what I mean by that is that I never felt any desire to separate the act of designing something from the act of building it. I sort of think that they’re one kind of creative process. I think there’s a lot of people, by the way, who believe that and feel that way too. I’m in good company. There’s a lot of people I look up to in our industry that have the same philosophy, but it’s not something that’s preached as how to scale a company or a product. And so it’s been tricky in my career that the common theme has been as things get bigger and as things grow, how do I maintain that and also empower other people to have the same effect on these products? So my story is, yeah, I’m a geek at heart. I’m a hacker at heart. I’m a designer and developer. I’ve been privileged to work on teams that work with clients like Facebook and Pinterest and things like that in my agency days. And then I left that and co-founded a company called Volley with my co-founder Mike Murchison, and we sort of worked on that for a couple of years and then we pivoted to Ada, which has been my preoccupation for at least seven years, probably longer now at this point. I’m not sure. I stopped counting. My educational background is not in computer science. I’m self-taught, but I’ve always loved coding. I was coding from a very young age, sort of choose your own adventure games.

Allen: Oh, yeah.

David: Did you do that too?

Allen: Exactly. Yeah.

David: Yeah, that’s where it started. And then I got really into HTML pretty much as soon as browsers had an inspection tool. I was all over that. I remember I used to copy the full structure of websites and then modify them locally. And so I’ve got super into PHP. Yeah, so it’s always just been this thing. It’s been very close to my heart for whatever reason, didn’t pursue educationally.

Allen: Yeah. I mean, a lot of the stories of the folks that, not exclusively for sure, but folks that have that combined designer developer instinct that I share my path sort of pushed me towards being a developer sort of, but that never lost the designer heart. A lot of people who have those instincts don’t end up feeling as much need to, “Oh, I have to do computing science degree to build this thing because they also like thinking about how it works and how it feels and what it looks like too.” And so it makes it slightly less clear of, “Okay, I need to do a math degree at Waterloo or something.”

David: Yeah, exactly. And I’ve even spoken with people who have started down that path because they’re like, maybe they had advice when they were kids, “Well, hey, you’re great at programming. It’s like a hobby for you. You should make that your career.” And I think that’s great advice. Then they would go into a traditional computer science degree and in the first year they would start teaching them the basics of C++, which they probably never wanted to learn on their own. Or maybe like OOP and Java and just completely turns them off. It’s like data structures. Maybe if you’re lucky, drawing the turtle with lines and stuff. So I think that it’s maybe less of a problem now, but certainly for people our age, I hear that story more and more. And so they ended up doing something else, but then they’re in their thirties or their forties and they’re hacking on the weekends and you kind of think, “Man, if they hadn’t been turned off in that moment, maybe it would’ve been their career if there was a different door.”

Allen: For sure. So you went through this path and you described the path how you got to Ada and that you’ve been doing it for seven years now. Also in a soundbite way, how has that path, obviously the original thing wasn’t, “Let’s use large language models,” which didn’t exist yet to provide great customer experiences. At the high level, what was the origin point? And then, how do you summarize where the business now in terms of what’s the scale of the business? Obviously it’s a private company, so to the degree that you’re able to say, you could probably say how many employees you have or something like that, to give us a sense of what kind of business we’re talking about before we get into the things you’ve been learning building it.

David: So we started the company, well, so the company today is a Series C funded company. We are around 250 employees. We have about roughly 400 customers and many different industries. And the use case is purely customer support, and we’re really focused on driving up as much automated resolution as we can in messaging and invoice and soon email as well. Obviously we didn’t start with 250 people and all that stuff, so we started with two people. And the story begins with the failure of the first company, which I mentioned Volley. We really couldn’t get that product into a place where it was growing enough and without our continued effort such that it warranted continued funding at a Series A level. So Volley was a social product, and so it was always going to be free and perhaps advertising supported, which was also a business model I admittedly didn’t understand going into the development of the product. I just wanted to make something that people loved and got some use out of. I think to that effect, we were successful. And I’m still friends with some of the people that I met through the community. But as a business, it started looking like it probably wasn’t going to be successful. And so my co-founder and I had a tough conversation about that, and we both decided that it would probably be best if we approached our investors and said that, “This probably isn’t going to work out, we’ll give you the remaining cash back.” So we did that and our investors said, “No, no, no. Maybe you can find something else. Why don’t you pivot?” We thought, “Oh, wow, okay, second chance. Cool.” And so we furiously reflected on that two year development process. And I think the two things that came out of that for both of us were, we were more motivated to work on something that people valued more directly. So you’ve probably heard the adage that, “If you’re not paying for the product, you are the product.” I really think that’s true. And so we didn’t want to work on a product that followed that indirect business model. We wanted people to pay for what we were building and know the feedback was driving towards making a more valuable product for them. That was the first thing. SaaS looked like a good model for us. The second thing was we didn’t want to impose too much of a solution, or at least we didn’t want to start with attachment to a solution, and we did with Volley, frankly. It was very vision-driven product and company, and I think that’s probably more successful generally for social products because there is a bit of an intuitive process that probably isn’t as valuable for SaaS and B2B, I think.

Allen: Well, social has to be a leap of faith because you can’t just iterate using science 10 or 15 or a hundred or different concepts and keep your user base. They’re going to try the thing and then they’ll either bounce or they’re going to click with it. It requires, like you say, more force of will I think to bring about.

David: Totally right. And the product that we were working on was also a bit of a two-sided kind of thing where people would ask questions and other people would answer, and so you had to answer that cold start problem too. So it was the worst parts of both and that made it extra hard to scale and grow. So we wanted to start with a customer and a problem, and we wanted to really get married to a problem. And we got really lucky because the people that we talked to right after that moment where our investors said, “Hey, find something else,” we were introduced through them and just through the community that we were a part of to different executives and founders. We just asked simple questions. If you’ve ever read the book The Mom Test, it’s questions like that. What are the biggest challenges you’re having scaling your business? And a lot of people said, customer support. And we thought that’s really interesting. And it turns out that customer support is one of these things where the demand for it is super spiky. There might be some event happening in the lifecycle of your company where you need twice as many people to answer the questions. Like COVID was a big moment for everyone that way.

Allen: Or maybe it’s Black Friday and it’s 50 times as much today, but not yesterday or tomorrow.

David: Totally. We have a different on-call policy for Cyber Monday and Black Friday and that whole four-day period. So yes, a hundred percent. And that’s a big reason why our customers buy our product. Another is attrition rates generally in call centers and contact centers are really high. It’s between 40 and 50%. And if you have a contact center that’s in North America or rather co-located with your headquarters, wherever your business is in Europe or something rather than offshore, then those costs per resolution can be quite high. They can be up to $10 if it’s on the phone.

Allen: Whoa.

David: Yeah, it’s quite surprising. So again, this is all surprising to me and to my co-founder, and so it was really exciting too because we were like, “Wow, this is a perennial need. Customer support is not going anywhere.” And it kind of sucks the way that it is right now. And the last thing is that when we talked with agents, people that work in these contact centers, they all described a similar feeling that they desired to work on the challenging relationship driven problems with the customers. And that between 70% of their time, up to 70% or 80% of their time was being spent on literally copying and pasting a template to how do I change my password. So that was a real clue for me, I think for both of us, but particularly for me as a coder. Because I was like, “Oh, maybe that can be automated.” Maybe if we can do it 30% of the time or 40% of the time, that might be valuable enough if we have a good fallback experience. That was the genesis of the company. And then from there, it took a lot of experimentation. We built two different solutions that we had to kill, and the way that we validated them was we were customer support agents in a set of lighthouse customers. So this was the magic of my co-founder, Mike. He was brilliant this way. He’s so good at this. He basically went out to the same people we interviewed. He said, “Okay, we’ve narrowed it down. We want to really focus on this challenge of customer support for you, and to do that, we ought to be agents in your contact center so we really feel the burden of this problem and really understand the life cycle and the way that this work is done,” and they let us. It was crazy. They just gave us access. It was amazing. I got admin access tokens to be able to scrape previous ticket histories and stuff. They were all in and all on board. And so I think he was incredible in those early days of building those relationships with the customers, making sure that they felt a high amount of trust with two guys in a dingy apartment essentially working on this thing. And that worked really well. We became extremely familiar with the problem. And that also allowed us to build things, test it, and know right away if it was going to work or not in a matter of days rather than months.

Allen: Yeah, I think that’s a core, one of the core tools in the toolbox in terms of you’re trying to get from zero to one. You have a startup, especially if you have done the, as you’ve just described, pretty valuable thing of deciding I’m going to be selling to businesses and not try to make a new start. Advertising supported social network is exactly what you described. Go in and say, “We’re going to solve this problem,” rather than, “Here’s a product, we’re going to start trying to sell it.” “We’re going to solve this problem. Let’s just get ridiculously familiar with the problem and then go in.” And in a way that’s totally unsustainable in the longterm, solve it yourself. And they say, do things that don’t scale. Do the thing that doesn’t scale. Okay, you’re literally answering support emails now and then you feel the pain and then you build the understanding and then you can iterate. So I love that. I love that version of that story.

David: Exactly. Yeah. Those were principles we just took really seriously and we didn’t before as product builders and we learned our lesson.

Allen: Yeah, I love it. So you dug in, that allowed you to build a couple potential solutions that you had to throw away, but you were able to tell that you needed to throw them away because you were right in there side by side with these customers and their customers and seeing what was working and what wasn’t. And now you’ve gotten to the point, like you said, you have a couple hundred employees, I believe you said, and now you’re obviously substantially further down the line in terms of technology.

David: We’re not doing that anymore.

Allen: You’re not doing that anymore. Now you have the magic of “AI” or machine learning or LLMs or probably a combination of those things. And also as with all AI startups, I’m sure, some code that is really just a logic tree that we really just can reason about in a way that doesn’t require the term AI. So what’s the kind of state of the art now of the problems that you’re trying to face in 2023 as a company that’s built up this over seven years?

David: Again, we’re really married to the problem and we see technology and AI as a means to getting better and better at solving that problem. And the way that we measure getting better and better at solving that problem is our rate of automated resolution. So that’s the rate at which without talking to a person, we can not only answer someone’s issue, we can resolve it. And that resolution is measured by essentially assessing the transcript and saying, “Was this resolved or not?” Either by a human annotator or by AI itself.

Allen: Ah, it’s interesting because I see, it’s on your website something about, oh, this percentage of automated resolutions, and as a person who’s been on the other side of maybe not fully sophisticated chatbot support, I was like, “This seems like a dangerous metric because “resolve” like, “Okay, your problem was resolved,” and then I was like, “No, it’s not. Clearly, it’s not.”

David: Yes.

Allen: And so the fact that you have human annotators being like, “Was this really resolved or is this just the chatbot hung up on you or you got into a corner that it decided that you weren’t worth helping or whatever?”

David: So I find that a lot of industries have dirty truths, and the dirty truth of our industry is that a lot of chatbots don’t resolve. They contain, that’s what the industry term for it is, which means that you may not escalate on another channel. Although even that’s suspect because it’s very hard to attribute someone moving from channel to channel. And so what we learned was that it’s more important for us to focus on resolving in that moment in as few steps as possible that person’s issue, or escalating quickly to a human if we can’t. We can’t maybe because our technology is not good enough or more often it’s because the business doesn’t want to or wants to depending on the situation. And so there’s this game you’re playing between immediate cost savings and the long-term value of that customer. You asked more about the progression of our ML, and so I started by saying that for us it’s seen as a means to an end of this automated resolution. And so we are not married to any particular technology and we’re always experimenting. We have a large ML team that’s always experimenting with new ways of doing this. But I think broadly, we basically went from a place where every request to our system was classified into a set of intents. So customers in our product would be creating an intent and then creating a scripted workflow that’s attached to that intent. So they would say, when people ask, “How do I change my password?” Or questions like this, “This is what I want the bot to do, I want it to send back a message saying, “Sounds like you’re having trouble with your password.” I want you to maybe check if the CRM has their information or something, like if they’re in this product SKU or something. And then I want you to reply with the instructions here or the knowledge help article link here.” And that actually scaled to millions of conversations a month through our system and was basically the state-of-the-art way to do this. But there are a lot of challenges with it, but it was basically the state-of-the-art until last year, late last year effectively. And then I think what’s happened is large language models have proven that they can be used both to improve the efficiency of that product, and I think are emerging as capable of completely replacing the need for that. And that’s where most of our energy is now, is on figuring out how do you enable a large language model to safely and accurately reason about a conversation that it’s having with a customer and the available actions it has. So I think the best way to think about it is that we’re trying to build a system now that allows a large language model to build that workflow on the fly for every interaction it’s having with someone rather than relying on a system that prescripts or tries to predict what that workflow should be, what that optimal workflow should be. And for your listeners, essentially when you build a fully capable workflow builder for automated customer support that’s personalized, that can perform actions and business systems, you basically end up with a programming language. And so with some defined bounds for the task, but I mean it really does quickly become-

Allen: You have forks and if statements and loops and all this sort of giant, you’ve created a giant graph with go-tos.

David: Yes.

Allen: It probably becomes just as unattainable as any other, because of course you’re not ideally building that. It’s your customers, these big businesses that have these support needs that are building out these flows. Then are they good programmers that it’s easy for them to maintain these things? Maybe not.

David: Maybe not. And then they start asking for things like, “Hey, it would be great if I had logging and it’d be great if I had maybe an ability to drop down and decode.” And you start going, “What kind of an abstraction am I building here?” So yeah, it’s interesting how the product, it’s almost like I think for the whole industry, this is my own opinion obviously, but for the whole industry, we sort of got to that point in the maturity of that technology that the abstraction did get pulled so close to code. And then it’s almost like right at the moment when we needed it, this new technology was invented and distributed actually really quickly through companies like OpenAI and Cohere and Anthropic such that we don’t need to do things that way anymore, or at least maybe not. I mean, it’s still-

Allen: At least not for all cases. It seems like we’re on the verge of increasingly less needing to do that.

David: Exactly. And then it’s going to be a thing of, well, for which industries and which use cases and which is it possible to build a product where there are no workflows? Or is there always going to be some need for one or two then to stick around? And then should those workflows be code? Because now we have a way of augmenting writing code with large language models, which makes it a lot easier for someone who’s a business user to describe their problem and maybe a contract and then have the large language model write that code for them. So it’s really basically been a fundamental change of how we do things. I suspect it’s probably been as fundamental of a change for other industries other than customer support. But yeah, it’s a really exciting time to be a product builder.

Allen: One thing that makes me excited about that in the space that you’re in is that I think there’s, and I’m sure you have a different view on this being so close to it and you can see what can be done with it, but as someone who’s experienced these support bots and things from the outside, I have developed a slight skepticism of how great of an experience is this workflow really going to provide. I’m like, “Oh, I see what it’s doing. It thinks I want X, but actually I know X isn’t going to work because the reason I even contacted it in the first place is because I have a special type of account or whatever.” But with my experience with LLMs is that there’s some circumstances where I can just be one or two orders of magnitude more effective at cutting to the root of something or getting you where you need to be or being steerable in a useful way than any if statement chain would ever be able to be. So I’d be curious, I guess there’s a two part question of the places where LLM so far have proved really as a huge level up. And then maybe after that we could catch on where are the places where they still are quite, they either struggle or there’s interesting tricky problems. But where have they been a big level up so far?

David: Well, I mean they’re sort of one and the same, actually. I think that you’re absolutely right that every time there’s a special case, the person building a workflow has to predict what that special case is and they have to define rules for that special case and what actions should follow, which is effectively coding. And what we’ve learned is that a large language model is exceptionally good at that. It’s exceptionally good at understanding that, especially with some guidance or some coaching from a user of a product like ours. And so your question was like, well, where is that effective? And I think one or two orders of magnitude is not exaggerating the improvement in effectiveness out of the box. I would say that with our existing product, you can get good results that are within the same order of magnitude as the results we’re talking about with a large language model product. But I think that the base case is massive, the floor is massively improved. So if a customer just brings their knowledge base and nothing else, they’re probably going to create a lot better of an experience than if they had invested 10 or a hundred times more hours in building workflows for each one of those cases. Now the knowledge base is a good example because, and I think people are starting to understand more about retrieval-augmented generation and that architecture. So I think it’s a good place to start. I think hallucination is still a thing, it’s real. It’s not possible for anyone to guarantee a customer the same way they could on a scripted system that this is always going to behave exactly as you would expect or that you even told it to. And so now it’s become a really interesting, what I’m finding is that customers are actually okay with that, which is interesting. We’re speaking with more and more customers. Again, these are early adopters. So on the curve, they’re early adopters, they have excitement about the technology and applying it. But I think what we’re learning is that actually human support agents make mistakes too. And if that range is in the same ballpark, let’s say between one and 5%, customers may actually accept that trade-off of I’m okay as long… There might be industries where that’s not possible, like in healthcare situation, that probably isn’t possible. And so that’s where that curve might be defined by those use cases and industries. But there’s a lot of customers right now who are saying to themselves, “I think my human support agents probably make mistakes about 1% at a time in terms of accuracy of information.”

Allen: You mentioned a few pieces of product engineering challenges of taking large language model and going in and productizing it and iterating that product. And so let’s touch on a couple of them and try to do it in a way for folks that are not yet experienced with trying to build products, large language models that they can follow along, but then also maybe might get a little bit of insight for folks that have been exploring this space. So you mentioned retrieval-augmented generation. So this is the idea where, okay, yeah, the LLM knows a whole bunch of stuff and yeah, it’s been trained on the world, but instead of trying to get it to not hallucinate by hoping that it was trained on the effects that are irrelevant for the case, that based on the context of the conversation it’s going in and pulling from a database, often a vector database of relevant information so that instead of if you ask ChatGPT, “Hey, what is the support policy for Chuck E. Cheese when you cancel this thing?” In this complicated case, it might just make up a plausible policy, but if you then pull in that data right there and say, “This is the real one. Discard what else you know that can work.” So it sounds like what you’re saying is that applied well with current state of technology, often you can really tamp down these hallucinations for the things that you have prepped in the database. Have you gotten to the point yet where you have loops where the system tries to detect like, “Oh, we might have some missing stuff here where there wasn’t really much to retrieve or the stuff we retrieve wasn’t relevant when someone went and started talking about, I don’t know, my son was creeped out by the Chuck E. Cheese song that the Animatronics were playing and it didn’t really know what to say, so it did end up withstanding and then detect that loop?

David: Yeah, it’s a great question. I mean, I think fundamentally large language models are eager, they’re eager to complete. That’s what they’re engineered to do. And so if they don’t have the information that they want or need from their prompt to answer something accurately, they often will in that situation hallucinate rather than reason about their own capabilities and return something more like, “I’m sorry, I don’t know that.” Because that’s just fundamental to the technology, I think. Again, I’m not an expert in this, but that’s my experience using them. Now you can prompt them in such a way where you can say, you should use, “So this is the person’s question, this is Allen’s question. You can use these documents if you don’t have the information you need, reason about it to the best of your ability. If you don’t have the information you need, then just say, “I’m sorry, I can’t help you with that.” Or escalate to a human agent.” That works a lot. That works well, that works a lot of the time. Occasionally, and I found that the biggest issue I think from our team so far has been with numbers, which is interesting. So if there are certain policies encoded in the knowledge base, for example, the referral bonus is $20, the large language model might say 25, and you’re like, “It’s right there, it’s in the prompt. Why?” But for some reason there’s certain things like that. And then of course there are a lot of other variables. Another variable is just the ability to find the most relevant documents to put in that prompt. With Claude, you can definitely put a lot of information in the prompt. I mean, I think it has a hundred thousand token size, token size windows. So you can put many knowledge base, complete knowledge bases in there. However, as the context window increases, it seems like the large language model tends to put attention in the top and the bottom. And so if the information is in the middle, then you might get poorer results. So we try to really get good at condensing the facts that are most important to move to a resolution into that prompt as much as possible and compressing that information as much as possible, which is a retrieval problem. And then of course, the other problem is just that the capability of the large language model. If you’re using an open source model versus something like GPT-4, you are going to get radically different results for this task. And that comes from a place of asking the model to reason about its own capability. Well, you need a model that’s not just built for auto complete and essentially is a sufficiently large model such that it has derived reasoning skills from its understanding of language itself, which is just in itself a fascinating topic that we can-

Allen: Yeah, and there’s two related ones that contribute to that. One is this training that they do after, they call pre-training, which is a make a thing that knows how to complete the Great Gatsby or whatever. And then all the work that they do, and especially OpenAI, and this is a huge contributor why GPT-4 is so much better than most of the other open source models. There’s a huge gap because OpenAI has put in ridiculous amount of time and money into, actually then did that post-training step where they’re actually reinforcing, “Hey, I asked you this thing and then you followed my instructions or you didn’t follow my instructions? Or this is a useful answer, or it wasn’t useful answer.” And one of my instincts, I’m curious if this seems true to you, is that companies like OpenAI and probably some of these other ones like Cohere and Anthropic will get better at steering and coaching in that post-training step for answers like, “You know, I’m not really sure. It seems like maybe X, but I’m not sure if I have enough information on our policy about Animatronics,” to have a half helpful answer. Which if you go on to online forums and stuff, of course there’s a problem that people will upvote things that seem confident. But what we actually want from a chat bot is if it’s not confident to say a semi confident answer instead of just say, “I refuse to help you,” or “It’s definitely 25.”

David: Yeah, it’s interesting to think about this from how much of the application developer’s tasks right now will actually just get encoded into the fine-tuning of these models or the refinement of these models regardless of the techniques used. So in other words, we have another step right now in our product where we look at what the large language model is intending to send back to the customer, the end user of the experience, and then we compare it to the context that we gave it. So we gave you these documents and this is the prompt, et cetera. How semantically similar is this answer to the document? And we actually asked the large language model as well to say, which document ID are you referencing? Which is a common thing, you’ve seen that in Bing Chat I’m sure as well.

Allen: Yeah.

David: What your question elicited to me was how much of that will just start to get boiled into these models as companies like OpenAI decide to specialize them to say, okay, you’re not choosing between GPT-3.5 and four, you’re choosing between a model that we delivered for RAG purposes such that you don’t even need to do that. It’s within 99% of the time, it just won’t give an answer if it isn’t supplied sufficient context in its prompt. Maybe that’s where we’re going. And then they have another one that’s more expensive that’s purely for reasoning. You would never use it for generating text that would be seen by a person, but you would use it to make decisions based on a fixed limit of possibilities. That’s a fascinating future as well.

Allen: We could spend half an hour speculating it because it’s fun, but I’ll try to constrain myself. One of the paths that it might take is just that the way that OpenAI has, and now increasingly new models get trained this way, they get trained to basically understand that you can say, “Hey, please format your output to this JSON blob and then follow these rules.” And then it just does. And it turns out that’s not complicated enough of a thing to require a totally different model. And it might be that as long as the input dataset that it gets trained on and the reinforcement stuff has enough clarity as to what does it mean if I prompt you, “Hey, only answer if you’re really sure and really only try to rely primarily and almost exclusively on this retrieval-augmented generation for what the facts, everything else, don’t speculate.” And it may be that the distance in between where GPT-4 is today and getting it to actually behave that way is really only a 5% tweak in having enough examples of that. And to some degree, we’re discovering new requirements for the input training data on these things that two years ago, if you said, “Oh, what should you train in large language model on?” We probably wouldn’t have said, “Well, it needs to understand function calling and retrieval, augment generation.” We didn’t even have those terms. So no wonder the thing that was trained a year ago doesn’t know about them and do them super well. I don’t know, it may be as simple as that.

David: I think you’re absolutely right, Allen. It’s like we’re flying by the seat of our… We’re learning at the same speed as building. The technology is improving at the same speed as application development, which it’s really cool. It’s really exciting, but it’s also a new thing comes out every week and you think like, “Wow, okay, maybe I could do something with that or maybe that solves certain problem that I have with customers.” And so I think to bring it full circle, that’s why having a clear understanding of your problem… My advice to people looking to apply large language models is having a really clear problem where it fits is the most important thing because you can use that problem to ground any particular chain of prompts or a single prompt or an architecture mixture of different ML techniques. You can say like, “Well, I’m just trying to find the best recipe here.” You’re not trying to also optimize the problem at the same time as a solution, which I think can get into sort of chaotic feedback loops that end up can be a trap for founders.

Allen: In a lot of ways, we are in a bit of a moment in terms of startups in our industry that has some echoes of in 2007, 2008 where there’s all these startups being founded of like, oh, mobile phones are going to change everything, but we’re not sure exactly how. And in 1998, 1999, oh, the web is going to change everything. We’re not entirely sure how. And so obviously some of those companies, it’s very plain, or very much just like, I don’t know, recipes with AI? LLMs forgets. Some huge percentage of those are going to either have to pivot or they’re not going to exist. And my sense, and of course I’m biased on this in New York, because this is the way we go with businesses is thinking problem first instead of solution first tends to make a more durable product.

David: Yeah, I agree. And look, I mean there’s all kinds of strange incentives at play here too. I’m not unaware of the fact that the Fed continues to increase interest rates and cash is more and more scarce and founders need to hire teams still to build things. And so I think there are incentives that are driving people towards putting the AI stamp on their project or their thing, and I think we will see how many of those things actually needed it and how many things didn’t. But I also think that the technology itself is changing so rapidly that what we think of as AI, we still have these terms narrow AI and stuff. I could see a world where even things like GPT-3.5 are starting to become more of a narrow AI or a more flexible narrow AI as it’s fine-tuned for different tasks. And so I guess my point there is just this is all changing so quickly and so it’s very hard to judge, but at the same time, I think the best way to build things is to be committed to the problem and not the solution.

Allen: Yeah. You hit on something I think is likely, which is that we’re many decades into in our industry of AI really meaning stuff that we used to think of as people stuff that computers recently became able to do rather than any other meaning.

David: Yeah, exactly. Yeah.

Allen: When I went to university, it was like AI was the things that now we wouldn’t even remotely consider. I remember saying in a class being like, “Well, if you’re trying to determine the ideal route in between two paths on a map like with basically Google Maps says today is NP hard and really you would need AI to do it really, and it may never be done.” And then three years later it’s like, “Here’s Google Maps. You can drag the pin around it. Just in real time, make a new route.” Because whatever, we figured that problem out and we’re not solving it in the NP hard way, we’re doing it in a more clever way. And so we don’t think of Google Maps as AI anymore. That’ll probably to your point, be thought of as like, “Oh, well, this isn’t AI, it’s just an LLM that’s just able to answer questions about some certain set of things, but it’s not really AI. AI is whatever, the thing that just came up in the last couple of years.” So I think that’s a good observation on your part.

David: It’s an elusive… Well, thank you. But I think it was both of us coming to that. Defining AI is quite elusive.

Allen: Yes.

David: And also because as you, in my experience, you use models as powerful as GPT-4, you start to wonder, “Well, how can I do this in a way that doesn’t take 12 seconds per request and doesn’t cost me a dollar every time I do it or whatever the costs are?” Then you start going down, “Okay, well, actually I can use 3.5 for this, or I can fine tune 3.5 and perhaps that will work better. Or I can use Anthropic or cohere as well.” You have so many more tools that you didn’t have before, but I wonder how many projects that are coming up now because it’s so much easier to write a task in a prompt and send it to 3.5 than it was to train using PyTorch or something, a more “narrow AI” for that task, even though it might be way more performant and actually maybe even perform better on a certain task like predicting the weather tomorrow or something like that.

Allen: Oh, for sure. This is an entire category, and this is an infamous thing about LLMs, is that of the various things that they make sort of a fundamental change in capabilities, there’s certain types of problems where they are just so much easier to prototype things with.

David: Yes, exactly.

Allen: Just in a wild way where it’s like this thing has been possible for “in ML” for 15 years.

David: Exactly.

Allen: But no one got around to prototyping the specific UX or the workflow or the thing or giving it this particular data before. Or at least you’re not familiar with that in your industry, you’re in your product space. And you can just whip up a demo in an hour and then for, as you say, a great cost per server implication, it does something. Then you can use that to put in front of customers and figure out, was this useful? The answer might be no. Or it might be like, “Yeah, it would be useful if we could do it for 10 cents a call instead of $1.” But so many product revolutions, and this is why I often think of, I find really interesting how LLMs in this space affects product thinking in some ways more than the like, “Yeah, of course. It’s very important. We have data scientists and all this sort of stuff.” When we think about the product level, it enables this prototyping exploration in a way that, “Yeah, okay, if it ends up being useful, then you have a whole bunch of ways to maybe make it cheaper or maybe it’ll get cheaper when the GPUs get better in three years or whatever.” But that’s part of what makes it exciting to me.

David: That’s a great summary of what I was scratching at and circling around and trying my darnedest to get to. But that’s exactly what I was trying to say was that the field of AI has become more accessible. Some amount of what AI was already capable of doing is now more accessible through a large language model interface than it wasn’t before to the average hacker or the average product developer. And that’s really great. That’s an amazing thing, and I think we’ll see some things kind of boil down to now, okay, well how do I make this more efficient? And then people will just progressively discover how to use more narrower AI techniques and other things will stay stuck on, well, I shouldn’t say stuck, but will demand the capability of GPT-4.

Allen: And some things will demand GPT-5 because there’s still stuff, like you say, even just math eats stuff where it’s trying to reason about numbers. Although actually it’s a thread I want to pull on as we start to run a little low on time, still more curiosities that I want to satisfy on some of those approaches, which I think I’ll also be interested to folks who are learning how to build products with this stuff. You mentioned the problem where you can do a retrieval-augmented generation, you can tell, put into the context for this LLM, “Oh, here’s a bunch of our policies.” And they have maybe numbers in them about, “Oh, well, if it’s been 90 days, they can do this within 90 days and then they get $20 and whatever.” But then actually working out at the end of the day, “Oh, well, how much of a discount does this person get? Are they within 90 days or was it within 90 days of this date?” Or that kind of stuff. Even GPT-4 will struggle with that. And one of the tools that we have in the toolbox, and I’m curious if you’ve experimented with this much yet or if it’s still on the horizon, is to equip the language model with basically a calculator or a tool or the ability to run code. Because that’s one of the demos they have with GPT-4 that you can play with in ChatGPT. I believe this general access now that if you are in ChatGPT Plus, you can say, “Well, enable code interpreter.” I think they call it data analyzer now. But you can then say, “Hey, what is the ASM code of this or that?” And then it’ll be like, “Well, let me write some Python code and run that.” And then the Python code is a hundred percent reliable at doing math. And that maps onto it. If a customer service agent was told, “Oh, okay, please add this and remove this 90 day cancellation, add and add and add.” The customer service agent isn’t going to use their head to calculate, “Oh, okay, well, your refund is $37.23.” And so it’s sort of analogous to that. So I’m curious, is that something that you’ve played with yet or is that just a known potential future path?

David: Oh, definitely. So first of all, personally played with it, yes. I’ve used the code interpreter stuff or the data analyzer, if that’s what it’s called now. And I was pretty impressed. It’s awesome. There’s a lot of interesting applications for personal use that that satisfies. But from a system building perspective and for customer support automation, I would encourage system builders to think, at least I’ve found that this is successful for me. Maybe it will help other people. But I would encourage someone trying to build these systems to think, “What do I need the language model to do? What can I not build code to do and what can I actually write code to do?” And so that’s a useful question I think because there are some things that are actually better codified as rules in code. So for example, you mentioned function calling. That’s a relatively new feature of OpenAI’s chat completion endpoints and function calling is really radical, I think. It’s pretty cool. You give it a bunch of available functions and you’re asking the language model to choose them. And it’s on your system to then execute the function and spit back the result to the language model. And I think a lot of applications will be built using that architecture because it’s a great division of responsibility. You’re basically deferring the reasoning to the large language model and about which thing to take given the context of the conversation, which step to take next. And then you’re allowing your system to do the processing of that data and decide, “Is this an acceptable inputs to my system or not?” And I think that feedback back to the large language model is also very interesting. You could say, “Well, for that customer’s SKU, I can’t actually process those inputs. That’s too big of a refund for someone who we just met.” It could even be fraud or something like that. And telling the large language model that allows it to then talk to the customer and say, “I’m sorry, I can’t do that right now, but I can escalate to a supervisor,” versus giving it no information or letting it say the wrong thing effectively. So I don’t know if that answers your question.

Allen: No, it does. It is a great example of that tool which is the, yes, large language models suck at some things. Can we give them tools for the things that they are actually, computers are already good at adding.

David: I think that’s where things are going for workloads, for tasks that require that, that require access and interfacing to backend systems. Now there is a whole other emerging category where people are building that interface and not using API defined interfaces like the OpenAI function calling one. They’re building that interface through to the GUI, which is interesting too. So they’re probably doing some screen reading and some reasoning about the screen. And of course OpenAI is now showing that the large language models are capable to some extent of understanding the context of a picture. I don’t know how effective that is yet. I haven’t tried that yet. But that could be a rapidly emerging architecture as well where instead of having to build a JSON interface between a backend system or a server-side execution environment or a REPL and this large language model, maybe perhaps it’s more of a UI. Maybe it’s running in a headless browser or maybe it’s literally running on a on-prem Windows desktop machine from 2003 that only runs a certain version of a certain software that’s used to control a power plant or something.

Allen: Nobody else would want to touch it, but the LLM is fine. They’re not going to complain. Although I’m not sure if I want LLM controlling a power plant yet, but something maybe more boring than that.

David: Yeah, I’m not sure I do either, but I think that is where things are going. Maybe. I saw a cool startup, I can’t recall the name of it now, but I saw a cool startup yesterday that’s doing that, that’s building it for GUIs, which is like UI path essentially for large language models, which I think UI path itself will probably be on a direct collision course with, I would imagine.

Allen: I think there’s a whole space putting set the power plants almost in the opposite direction of that. But I think also really powerful is there’s a whole set of things that you have an initial reaction when you start playing with stuff and say, “The LLM can’t help with this. It makes a mistake sometimes or it doesn’t fully understand or whatever.” But then there’s almost like a mind-bendingly large number of things where it can do the 80/20 where the LLM can 80% make sense of this thing and go in the right direction and suggest three options. And one of the three is great, and the other two are kind of not great nonsense. And then a human being can easily look at that and be like, “Oh yeah, that’s good and better than I would’ve come up with myself. And I can tell that that’s the correct one.” And maybe even if we never got a single improvement over GPT-4 today, that I think it’ll be many years still in us getting better at building products and just our own personal habits at being able to use these models to improve the speed that we can use to get to a good thing, even if we’re using our human judgment on what the right choices. No language models need to pull the power plant levers.

David: That’s right. And that 80/20 that you’re talking about is good for us to touch on for a moment, I think. Because in my experience it’s really important, again, so first asking yourself what do I need the large language model to do to solve my problem? And what do I not need it to do? What can I write in code? You want to think about where the boundaries of your system are in terms of what do I need the large language model to do and what can I write in code effectively? And then you want to have a clear understanding of, well, what’s my failure mode of this system? So when I’m asking the large language model to do something, what happens when it doesn’t do it right? And how do I know? How do I know if it’s not performing correctly? That can be offline, but that ideally can also be online. Maybe you have some safety checks or something. We do check for toxic and unsafe generations and we also, like I said, we check to see based on the context, does the generation make sense? So you could do the same thing for function calling. Given the context, does it make sense that it’s choosing the function? Is this function even available? So I think there’s a lot to think about in terms of building reliable systems that way. And then of course, if you’re building a system where there’s not a human in the loop, that becomes even more important. And of course what we’re building is one of those systems, like to do proper automation, you effectively don’t want a human in the loop. But a lot of products might actually not have an ROI that relies on that model.

Allen: Yeah. Maybe this is a good last thing because it’s future looking is one of the things that I’ve become quite curious about both when I’m looking at startups and then also thinking about the impact that businesses like yours are having is that as we look to a future where LLMs and technology that we’re all building is making either great customer support cheaper or making so that you can provide a better customer support experience for a given cost. If that seems like it’s going to have an impact over time on certain business models, especially business models where customer support is a big or maybe the majority of the cost of providing that thing. And so I’m curious if you have any sort of guesses or insights on how the changing cost structure slash availability of providing customer service? Especially if you think of something where there’s some businesses where it might be like, well, yeah, you could do this, but then your customer support costs would be astronomical and you would never be able to provide good support because it would be so laborious or whatever. And then maybe now you can. Do you have any thoughts or guesses on how this change might change any business models or certain verticals over the next five or 10 years? Not to be the like, what is the future of AI? But I find that as an interesting thought experiment.

David: Yeah, it is a very interesting thought experiment. Let’s experiment with it together. I’ll give you some of my predictions.

Allen: Let’s do it.

David: So firstly, I think there’s a whole group of people, call them single employee businesses or two employee businesses, indie hackers, that kind of thing who will actually were not able to give customer support. Effectively, they relied on things like discussion forums or something to augment their products or just ignored it effectively or didn’t do a good job of it and got by without it, who could now. Imagine a business that’s a small app on the app store that has a phone number that you can call and you can speak to a great AI agent that actually understands the full breadth of capability of the product, understands the common problems that people have implementing the product or using it, and very rarely has to escalate a little email ticket or something to that founder. So I think that’s a pretty cool future in my opinion, provided that experience is great. And that’s where that measure of resolution is really important. And I think then for really big businesses who have big customer support costs today and who have large teams of agents, and by the way, the biggest businesses we speak to, that attrition problem becomes essentially a training problem because the breadth of information that their agents need to understand can sometimes take longer to understand than they stay on the job. And that’s a place where your agents then become somewhat less capable than you even think they are at answering questions. In other words, an AI might actually be doing a better job of resolving than a human in those cases because it can understand and retrieve more of the information than a human being could ever do in that role. Now, those are only the biggest enterprise companies. I’m not saying that’s a super common thing. But that said, a lot of our interaction that most people find kind of negative with companies is with larger organizations. It is with companies that are massive enterprises that have that problem. So I think essentially my vision of the future is one where the experience of customer support is faster, meaning you get to a resolution sooner in whatever channel you like. If you like calling, great. If you like messaging, great. If you like email, that’s fine too. So it’s faster and it’s radically better. Right now, if you call an airline, you’re going to be on hold for 40 minutes, at least that’s my experience. Maybe if you have great status or something, let me know how you do that, but that’s my experience. And so I’d like to get rid of that. I don’t think that should exist. That’s how I look at it.

Allen: I like that. I mean, certainly I love a future where I’m not having to wait on hold as much and that I can resolve things without, even when often some of the chat conversations you have, you spend a lot of your time. “Let me look into that for you.” And then seven minutes goes by because they’re answering seven calls simultaneously. One of the other things that I’ll start to add to that, just riffing on that, and some of this is just the way my mind works, maybe it’s a little wishful thinking, but I’ve loved the idea of being able to be a relatively self-serve SaaS company where people just sign up. Something like a Notion or a Slack or something like that that can provide really great user to user support in real time about how to use the product and how to get the most out of it, and how to think about some of the issues they’re seeing. “Is this a bug or am I not able to find the feature?” Be able to provide that level of support that currently you’d have to be call for pricing enterprise plan person, and it’s like, “Oh, if you’re a call for pricing enterprise and who knows how expensive that is, then you have real time support that will actually help you with the product.” But I can imagine in five years or maybe in two years that I’m just like a $29 a month per seat or $5 per seat, per SaaS product that you just come in and you sign up and now the people on your team have this thing. And then when you’re like, “How do we set this up?” Especially anything that has configuration, which is the devil of any SaaS business. If you try to build a business on something that supports other businesses and they come in and they’re like, “Well, we need to hook this up to Crystal Dynamics reporting or something,” or some bizarre random thing. And you’re like, “I don’t know about this.” The amount of cost to be able to try to support people to do that thing, so often you end up with these businesses. They’re like, “Yep, we’re completely strictly, we only do this. And if you have these other problems, go, you have to figure it out yourself.” You could in theory, be able to have a much more aggressive target for what does great customer support look like when you have this assistance of LLMs that can basically understand your whole product theoretically down to the source code of like, “Well, how does this thing…” It’s like, “Oh, actually, good point. I found the bug that you’ve just pointed out,” or either. I’m making up silly situations. But I like the idea of not just of course making support cheaper and faster, but also the ability to provide much better support at certain price points than products can provide today. I think that would be a cool feature.

David: I completely agree. I love the way that you frame that and what it did for me. And I think the theme of what we’re talking about is that it’s like that phrase, the future is here, but it’s unevenly distributed.

Allen: Yes.

David: Great customer support exists, it’s just unevenly distributed. And the great customer support experiences with an airline happen for the most elite members. And I think what we’re scratching at, again, if you use Notion or something, I mean I shouldn’t use that as an example because I don’t know specifically, but something like Notion. And you only pay $29 a month for it, what’s your level of experience? Your customer support experience is probably impersonal, and it’s probably via email and it probably takes days to resolve your issue. And by that point, you’ve probably found a solution on some other product or you found something that you just hack something together or something. So for that product, it’s more valuable if they can help, if they can offer a great resolution that’s personal. They just can’t. It’s too expensive right now. And so I think AI and large language models and products like Ada, the opportunity is to more evenly distribute that great customer support experience for the people that you want to serve.

Allen: All right. VIP, real time customer service coming to every product. Let’s make it happen. Let’s do it. I love it.

David: Yeah. I mean, that’s spending a lot of my waking hours trying to, that’s for sure. I really am. And I’m really motivated by my enemy, is that long hold time. It really is.

Allen: I’m looking forward to not having to face those in the coming years. Thank you so much, David. This has been a great conversation. I had a lot of fun. Where can people go to learn more about you and your work?

David: Well, first of all, thank you very much, Allen. I had a great time as always talking with you and getting to share some of my thoughts and vision on this amazing space. You can learn more about Ada at ada.cx. That’s our website. We’ve got a lot of different information about our product, but we also have developer documentation there as well and product documentation. And I blog. I’m old school. I have an RSS blog on dhariri.com. I’m also on Twitter, but I prefer the blog so you can email me too. My email’s on my blog.

Allen: Excellent. Thanks for being on the show. It Shipped That Way is brought to you by Steamclock Software. If you are a growing business and your customers need a really nice mobile app, get in touch with Steamclock. That’s it for today. If you can give us feedback, follow us on social media, rate the show, that’s all super helpful, helps other people find out about the show and also gives us sense of what you’re enjoying, what you like about the show, what you want to see. You can go to itshipped.fm/contact. And until next time, keep shipping.

NEXT EPISODE

The Culture You Want to Build, with Charity Majors

PREVIOUS EPISODE

The Puzzle of Engineering Leadership, with Michael Lopp

Supported by:

Hosted by Allen Pike. New episodes every 3 weeks.