Hosts Eamon McErlean and Joe Devon interview Edward Aguilar, CEO & Co-Founder of Echo Labs. They discuss Edward's personal connection that led to the launch of Echo Labs, integrating AI into accessibility, customizing audio description, and more.
OUTLINE:
00:00 Opening Teaser
00:44 Introduction
01:12 Launching Echo Labs
04:40 Personal Connection To Accessibility
07:05 BuellerBot
10:57 About Echo Labs
14:37 Keeping Up With AI
16:43 Accessibility and Generative AI
20:26 Autonomous Agents
24:44 Artificial General Intelligence
29:48 Will Accessibility Job Be Affected By AI?
33:12 Working With People With Disabilities
37:59 Accessibility Customization
42:47 Inspiration and Passion For Accessibility
49:01 Luna Audio Description
54:29 Captioning Timing (Forced Alignment)
56:56 Wrap Up
--
EPISODE LINKS:
Echo Labs
https://el.ai
BuellerBot
https://github.com/EdwardIPAguilar/BuellerBot
arXiv
https://arxiv.org
Geeks, MOPs, and Sociopaths in Subculture Evolution by David Chapman
https://meaningness.com/geeks-mops-sociopaths
alphaXiv
https://www.alphaxiv.org/explore
--
PODCAST INFO:
Podcast Website:
https://accessibility-and-gen-ai.simplecast.com
Apple Podcasts:
https://apple.co/46eflnv
Spotify:
https://open.spotify.com/show/4eEwo3jUSo3aS7wGhlcxs2
RSS:
https://feeds.simplecast.com/nCrQiw1t
LinkTree:
https://linktr.ee/a11ygenai
- Luna is the world's first AI audio description engine. Let's say you're a school type, which is our focus market. You must have audio description on all media you create within the next three years if you're a large school, two years if you're a smaller school. I had a meeting today with like, I think it's the second-largest public institution in the country. They were about to go drop $10 million on audio description to help become compliant. And they're gonna have to pay like a fraction of that. And they're gonna get it done in like, the timeline is so crazy that we are validating it for them right now by literally just doing half over the next couple days and we're just gonna blow their socks off. And I think that's what technology should feel like. It should be exciting.
- Welcome to episode four of Accessibility and gen AI, a podcast that interviews the newsmakers and thought leaders in the world of accessibility and artificial intelligence. I'm Joe Devon, joined by my co-host Eamon McErlean and today we are interviewing Edward Aguilar, the young talent behind Echo Labs, whose mission is to democratize media accessibility through AI. Edward, welcome to the pod.
- Thank you for having me.
- It's my pleasure. And now, I wanna start by taking you back to that pivotal moment when you decided I am gonna start a company, Echo Labs. What went through your mind and your heart as you took the plunge?
- It's a little bit crazy 'cause I somewhat agree with Jensen Huang's approach to this, where it's, you have to be a little bit insane to start a company. So when we were originally putting together the idea, by we, I mean myself and my co-founder Sahan Reddy, who today's my CTO, this was about, I wanna say 13, 14 months back. I'm in my dorm room back in college, and I'm working on some personal research, and I'm like, well, I wanna play around with this captioning model. And so I do a very small local demo and I'm like, hold on. Like, these benchmarks look like a little bit weird. And I sent it to him that same night. 'Cause we've been friends for years and he's fantastic with this type of stuff. And I sent it to him and I'm like, hey, Sahan, can you take a look at this? I don't think, like there's something wrong with these benchmarks. 'Cause the numbers were just like completely off of like what I expected them to be, especially for such a small training run. And he looks at it and he is like, no, like it seems like you're doing the eval correctly. What is this? Like what model are you using? You know, expecting that I'm gonna say like OpenAI or Google or something. I'm like, no, this was built locally. Like this is something I just finished training. And the more we started looking at the model, the more we started playing around with it and refining it, we realized it's like it was so good that we almost needed to get it out there. It's a type of tech that, you know, if you apply it in the right places can help quite a few people. And I think we both kind of realized that in that moment where it was like, this is too good to not pursue. And so the next day after it hit me that night, I didn't sleep at all, I called him over and I'm like we need to try this. I need you to fly to Illinois. He's in Georgia and the next day he books a flight to Illinois. We meet in person. And so we hatched the plan that night to figure out like how we wanted to build out the company. But that's originally how it started. I had no intention of starting the company. It was that our initial research ended up performing so well that it was like, wow. Like we would have to kind of be a little crazy to not try and do this.
- Wow. And which benchmark was this?
- This was just doing word error rate. There was, I think it was LibriSpeech or one of the other public-like academic data sets. Basically, we were working on captioning for like just an internal tool that I was trying to build at my school. And as I was putting together this research, you know, there's a few evals you wanna go against. And so I start playing with models. I'm like, well, maybe we can actually build something interesting. And it ends up performing like a lot better where you basically have this very short demo clip where it's like, you know, what we built in-house in such small time, very, very limited resources compared to what's literally coming out of like these major AI labs. And the comparisons of those two things is like, how is it even possible that you can build something that's gonna outperform and like, you just need to change a lot of fundamental assumptions. But honestly, I thought the direction that we were changing most of those assumptions were going to break it, but I guess we came out quite lucky that it actually, it proved out like some pretty interesting tech that I think will actually apply to, you know, the way that people build a lot of other models in the future.
- It's amazing how things can start like that. It really is, you know. Yeah, again, Edward, thanks so much for joining us. We know you're an extremely busy man and we are gonna absolutely dive into Echo Labs and all the phenomenal work and progress that you guys have made. But before we do, just wanted to take a second to ask you to share as much as you're, you know, willing to share and open to share about your personal journey and maybe specifically around any experiences you've had directly with individuals with disabilities and what motivation or influence that had on creating Echo Labs.
- Yeah, it's an interesting question. At least for me personally. I grew up with a lot of people in my direct family being hard-of-hearing. And so you see the things that they need to deal with given the limitations of technology today and the way that those limitations have existed for, you know, forever at this point. I mean, the Americans with Disabilities Act passed 34 years ago, and it feels like the technology available to actual people is basically the same. You either pay quite a bit of money to have a human follow you around and, you know, get very accurate transcripts of the world around you, or you have to deal with very, very bad technology, which is affordable, but for most cases, fairly useless. And it's a horrible decision that we force people to make. And in a lot of ways, it's crazy to me that, you know, it being 2024, that this is still the case, not just for the deaf and hard-of-hearing but for so many of other communities where accessibility is essentially treated as an afterthought. And I think one of the exciting opportunities we had with this technology is, you know, all of our investors, all of our, like, you know, the initial people who knew the company, when you come to Silicon Valley and you're like, hey, we beat these benchmarks, they're all like, fantastic, it's time to go and sell this to Zoom, to Tesla, to all these other companies. And I'm like, I like those companies, I respect what they do, but you know, my background is in computer science and nonprofits. I wanted whatever we were going to build to have like a legitimate social impact. And I don't think you need to make that choice constraining, you know, the revenue of a company versus the social impact. And I think especially as a startup, when you need to choose a niche to begin in, you know, for us beginning in a space that I knew we would have like tremendous social impacts was like very personally rewarding for the people in my life that I grew up around. But also I think for just everybody on the team, like, you need a good reason to wake up in the morning and to keep trying hard. And so it was the type of thing where we found a lot of personal motivation for.
- Yeah, it makes such a difference to have that personal connection. It it really does. And you obviously always want to monetize, but monetizing for the right reasons at the right time is a good combo.
- And we'll definitely double-click on that social impact and business side as well. But first I wanna ask you a question. There is another piece of technology that you built that I think is pretty much as cool as that model you were talking about, and it's called the BuellerBot. Can you explain what the BuellerBot is?
- So BuellerBot is, well, I'll tell you how it came into existence first, so I can give you some context. I am in college, I'm a freshman at the time, and I cannot stand the lectures that I'm in. For context, I love my professors to death. I'm still in contact with quite a few of them but I cannot stand sitting in this class for an hour where, you know, if you've already done the reading in advance, it's like, why are we even here? And it's quite a bit, a waste of time, but it's, you know, it's pedagogy and so everybody's quite expected to sit there and do it. So I wanted a way out. I had a lot of things I was trying to do at the time, and so I wanted to find a way out and it was kind of clear at the moment that there were so many interesting open-source projects that existed that if you kind of cobbled them together, you could theoretically find a way to build a replacement of yourself for online Zoom meetings, which, you know, post-2020 COVID has like taken over education. So, you know, freshman version of me found this to be an incredibly exciting idea, one that was worth like putting off all of my studying for finals and everything else for, and I sat down, I didn't sleep for like a day and a half. It was rough. And it was basically this weekend project where I took a lot of these open-source technologies where you have like Whisper for some captioning. You have ElevenLabs for like some text-to-speech and then some, you can use some local queries to control the movement of your computer where basically BuellerBot was born, which was this kind of entity that would live on your computer, that you could download, that you would train on your voice clips like 30 seconds of your own voice. It would sound like you and you could tell it what Zoom calls you had. And it would show up to your Zoom and then it would pretend to be you, camera off, obviously. Phase two would be, you have to find a way to make a video avatar, but it would listen for your name to be called, if it hears your name called during a lecture, let's say a questions asked, attendance, whatever, it would move the cursor on your computer to unmute you. It would say, hey, I'm here, answer the question as a student, it would, you had little things where you know it would try and buy you time if it didn't have an answer, it was generating the answer with GPT in the background. Like saying, one second I dropped my drink, and it would say these things, and then it would answer in your voice, and then it would mute itself. And it could do like follow-up questions intelligently. It was actually surprisingly, like I think we were at the moment where, you know, a lot of the students knew about AI, what it was capable of because the incentives are, you can find a way to get outta class or do your homework or whatever, but the general public really didn't. So it was kind of perfect because I had about three weeks where I just didn't go to school and I just had this like running in class. I never heard anything ever. Nobody ever said anything. I'm actually quite amazed. I don't know if that says more about the technology or the educational system, you could just send a bot on your behalf. Anyway, I open-sourced it and I was told right after I dropped out that apparently we had like six or seven other students in other classes, they ended up getting in trouble because they had been using it in theirs. And so it's still on the internet, people use it from time to time, but it's cobbled together using completely open-source tech and I think like, you know, I'm actually surprised that nobody's built a company around this. Like, you could make a lot of money doing this. It's a bit of a silly product idea but I thought it would be funny. So we open-sourced it.
- And it's Ferris Bueller I assume.
- Oh yeah, totally. I'm obsessed with that movie.
- [Joe] Save Ferris.
- [Edward] BuellerBot, yes, exactly.
- Within our podcast, Edward, we always wanna share as much as we possibly can with our audience. So if you'd like to give us an overview just of what your products are at Echo Labs and maybe share a brief demo of the features and functionality.
- Sure, of course. So the way most people think about us is we build AI native accessibility suites. So if you think about, you know, the world of accessibility today how institutions, governments, enterprises need to handle it is you basically have a lot of vendors who will do some things and very few people who can do all of it. And we felt that even of the people who could do all of it, or the ones who are doing individual aspects is that the cost is way too high. It takes a really long time to get these things done and the intuition needed to actually build these out is really not there. A lot of these products look like they were designed by the DMV in the 1990s, which, you know, when a lot of institutions are already kind of thinking about accessibility as this very, very expensive thing that they're gonna have to dedicate, you know, sometimes an entire department to, it really should be as easy as possible. And so the entire focus of the company is how can we do the best we can to make accessibility universal? And for us, the answer to that is leverage the cutting-edge technology that exists today, and build our own if it doesn't exist, and then package it in a dead simple platform where we integrate directly with, you know, your media platforms, with YouTube, all these other places. And you can do captioning, audio description, everything you need to become fully ADA-compliant immediately. And so the end result of this platform is like, not only is it a heck of a lot easier for people to become compliant and accessible, it's honestly, quite a bit more affordable. We save, right now, we're focusing on universities and higher education. We save them anywhere from 75 to 90% of what they're currently paying today. And the end result is like if you are today, you know, an institution that's, you know you're only five to 10% compliant with what you need to be compliant with, without changing your budget, you can cover everything overnight. And we've actually done this for dozens of institutions now and we're working with about 200 total since our commercial launch in April. And for them, the entire focus has been how can we transition to this world of universal accessibility where all media is presented to all students of all backgrounds at the same time? So yeah, this is what we're working on.
- Love it. Love it. And has that model had to change like regards to complexity with different APIs and making sure that you're compatible based on what universities utilize themselves?
- Definitely, yeah. There's a surprising amount of complexity behind the scenes, which is my assumption as to why there's actually very few people trying to innovate in this space.
- Yeah.
- But the whole idea on our end is that we want all that complexity to exist so we can basically be implemented into any existing workflow, but we don't want them to feel it. So for example, the YouTube integration that we have will automatically, you know, you just sign in with Google, it will show you all your videos, you select which ones you want and that's it. It'll automatically import, caption, audio description, send them back to YouTube, don't need to do anything. We've had people do, you know, 10,000 videos in two days and that's it, with like 10 minutes of their own time. But obviously, the engineering needed to do that is really complicated. But I mean, the number one thing that people say is that it's just so stupidly easy. It feels like magic. So we try and abstract as much of that as possible behind the scenes.
- It's incredible, you know, following all of these new activities, all of these new innovations in AI, I personally find the pace breathtaking. I was trying to put together a daily view of the new papers in AI and accessibility on www.archive.org and then I saw so many that I counted that it's about 10 AI accessibility research papers per hour. It is just impossible to really follow. So I'm wondering being as steeped in it as you are, how do you keep pace with what's going on?
- I think with such a big focus on AI recently, you kind of have this idea of, there's this fantastic essay, I think something like the Geeks and the MOP heads or something like that. And it's this idea that, you know, you have these very, very small industries that are obsessed by a handful of researchers, people who are the geeks, like they live and breathe that space and they have for years, before it was cool. And then the moment that that becomes interesting, the people who realize they can profit on it try and masquerade as the geeks because they realize that those are the people who have like all the clout in that space. And then you have a tremendous inflow into that industry where there's this just very, very high noise. Like the signal-to-noise ratio is basically like just off the walls where it's very, very difficult to understand like what is meaningful in this space, what isn't meaningful? So I would say I'm actually surprised at the pace of innovation with accessibility in AI is not faster, but I am surprised that the pace of publishing as if it is happening is just tremendously high. Yeah, I think accessibility specifically needs like a much, much higher degree of investment than what exists today. Most papers I've seen are some combination of existing items, you know, here's everything that's happened or they're like demos that don't have any clear path to commercialization, which is like cool for the people who will read the paper, but that's about it. One cool thing that's helpful if you haven't taken a look at it yet, it's called alphaXiv, it's run by Stanford, it's like www.alphaxiv.org. And it's basically a archive but it allows people to comment on top of it. So they've turned archive into a social network and right now, it's only filled with the geeks, which is why I love it. Like it's just all researchers and so you really see like what is exciting and they've organized it into trending and things like that. So you can really, you know, it helps filter out a lot of the noise. So it's a great platform and I'm not paid to say that. I just, I use it every day.
- Well, we'll keep it just between us, just so that, you know, it doesn't get that influx.
- Gen AI and accessibility can be a polarizing discussion. It really can be. And you know, some people are extremely concerned and understandably so, and some believe it could be the, you know, the silver bullet so to speak. What's your thoughts on that as it relates to accessibility, as it relates to both the concerns and the opportunities that we have moving forward?
- Yeah, it's a great question. I think the longer that I spend in this industry, the more I realize that I don't think there will be a silver bullet, whether that's AI or anything else. I think this is only going to be fixed by a lot of lead bullets. Like, that's the only solution here is going to be a combination of awareness, of investment, of research. AI will absolutely play a role in that. I think, you know, our company and the growth that we've seen is kind of testament to, people need to be rethinking this. Our ability to create media has entered into the 21st century, but our ability to make that same content available to everybody is stuck in the 20th. Like, almost everything that we produce today is still done by, you know, a human manually going in, reviewing the entire piece of media, every website update for legal reasons, has external consultants jump in. It's like, it's not only incredibly cost prohibitive, it takes so long that even the people who can afford it are still, you know, there's still like these walls of days between the people who get it now and the people who will get it once they make those updates. And I don't think that's a tenable solution in the long term for, you know, whether or not we wanna decide that we care about, you know, everybody in this country or, you know, for legal reasons or whatever the backing is, different solutions will be needed. And I think AI helps solve quite a bit of this in the general broad sense of like the same morphous concept of artificial intelligence. But I also think just taking, you know, good technology skills, and speed, and willingness to iterate and applying it to this industry can do phenomenal things. We've discovered that outside of the machine learning research that we do, just purely building a good technical product is not something that's been done in this industry. You have a handful of companies that have been basically given this walled garden over the last 34 years after the Department of Justice has ruled that, you know, with the Americans with Disabilities Act, everybody must do this and it's gonna have to be at human-level quality, which basically bars all of these starts from competing with them until we came around. And we're the first ones who have, you know, exceeded human-level quality and met it on other benchmarks, which means that we're kind of the first ones to come in and poke around and realize like, how are you operating like this? There's so much that could be done. And it's not even AI, it's like very basic, you know, things that you would expect would be going on in an industry this large. It's like the major players in this space today do not operate like Google. They do not operate like Palantir or Android or any of those very fast-moving companies. So one aspect is the research. I think the other aspect is, you know, if this is a space filled with Boeings, we wanna operate like a SpaceX. And that just means immense amounts of iteration in speed because ultimately, that's the only way you're going to build a product that can genuinely live up to this idea of universal accessibility. It is such like a tremendously large mandate. There is no way around it except to move at incredibly hot speeds.
- You give me so many thoughts. I could go in 10 different tangents, but first going faster than humans. I call that crossing the Rubicon of AI and accessibility where some field of AI is doing things faster or better than a human being. And every time that happens it unlocks a lot of good things for people with disabilities. It also causes some fear among people in the industry about their jobs the next day. But I think that's the AI story in general. But speaking about all that iteration, it feels like I blinked for a second and didn't look at the docs of the major AI companies. And all of a sudden they all have functions in their APIs where you can call a function by speaking to the chatbot and then saying, send an email and it's all structured into these functions, calendar invites, that kind of thing. And all of this feels like they're preparing the terrain for autonomous agents. Would you agree with that? And do you see Echo having a role when it comes to agents? Are you gonna build agents? Are you more focused on foundational models? What are your thoughts on that?
- Yeah, agents are the type of thing that, you know, you add them to your business plan and then it's gonna 10X your valuation. And so I think you see like a ton of companies have entered in this space and they're calling APIs and traditional workflows agents 'cause they realize that it's like this very exciting thing. From what I've been able to see, there's only really a handful of companies that are building like genuinely new agent technology, which is necessary because I think at this point I've seen every public and a handful of the leading private agent demos and I can say flat out, nothing works. Nothing today works for the agents that they show you in the demos. Whether that's like, I'm gonna talk to my phone, it's gonna order me food without thinking about it, or it's gonna book me a trip to Caracas or whatever. Like, yeah, it'll send you an email, but nobody's gonna want it to send you an email until you clear it. So it'll do a draft and that's like the useful thing for it. But you don't really need an agent for that. This is one of the areas of AI that I think it's actually quite difficult to filter out. But this is not to say that I don't think that this will eventually work. I think, right now, the best benchmark I've seen is about a 56% success rate. I believe this is just on basic tasks. I forgot the data set, but I think it includes ordering food, emails, booking travel, responding to messages locally, across text message, email, and a few other things. 56 is really not good. If you hired an assistant today that only got 56% of things right, that would not be a good assistant. I very much think that it will be a very different story at, let's say by the end of 2028. I think agents will be in a lot of different places and I think there will probably be one to two companies that make a tremendous amount of money basically building the infrastructure layer of the internet for agents to easily interface with all of these things. And perhaps that's as simple as like better labeling so it can use it. If agents are the self-driving cars of the internet, somebody should go out there and label all of the lanes because it's gonna make it much easier for it to perform actions on that. In terms of whether or not Echo Labs will have anything to do with agents, I think the end goal of the company is we want to make accessibility universal, and a big part of that is automation. A big part of that is I want to be able to help the one, you know, accessibility person who is in charge of a hundred thousand student campus who doesn't have a massive department to help. And I've met at least three people in this situation where they are like, seriously understaffed. I wanna build technology that can help with most of those because I speak to so many experts in this space every day, something like 90% of their day is spent moving files around or ordering files. These are things that we've been able to do proactively with good technology for the last decade. It doesn't necessarily require agents, it just, you know, it's scraping, reactive checking. There's a lot of things you can do here to take a lot of that weight off of these accessibility trailblazers in a lot of ways and let them focus where they should be, which is on students. And I think this will be generally what happens to the accessibility industry over the next few years is that you're gonna see this fundamental shift of humans spending their days behind screens to actually being with these students one-on-one, figuring out the human aspects to providing accessibility and the overall quality will go much, much higher while all the menial tasks won't take up their day anymore. And I think you might be able to call that agents depending on how we deploy it, but the end outcome will be, we should be able to automate this for most people.
- And tying directly into that, do you see like any short term or what is the roadmap and timeline for AGI?
- I've spoken to the research directors at most of the frontier AI labs and they would know much better than I would and they don't have good timelines. Like the average is probably 2028 is what people expect. But I don't think anybody has internalized this because if you actually internalize what that means and what they're building toward, if it's a true idea of AGI like this idea of like a machine that will be, you know, better than most humans at doing most economically valuable work. Something along those lines is like the general definition of AGI at these labs. Like the impact this will have is so unclear and so widespread that a lot of them operate with like this suspension of disbelief where they're not actually internalizing the work that they're doing. So I have no idea when it's going to come. It's unclear that we have any current technology that will reach that. I think we're gonna build fantastic specialized pieces of technology. Absolutely. Like I think that's a given at this point. We will get much, much better across all these specific benchmarks. And at a minimum, the worst outcome is that we have fantastic assistance for almost any specialized task. And then I think the next movement will probably be into embodying those into robots into the real world. Great technology has been built there. But it's unclear that we have a straight-shot path toward building like a generalized intelligence. Like you look at the most frontier, which is probably Safe Superintelligence with Ilya, former OpenAI. And he's, you know, hands down perhaps one of the best in this industry from a research perspective. He just founded his own company in this space. And most of the budget of the billion-dollar seed round they just raised is gonna go toward research which tells the public that they don't actually have a path forward. They have maybe ideas of how it'll work, but nobody has the straight-shot to AGI, genuine AGI in the next few years. What people are betting on when they say 2028, 2029 is that we will figure out a way to do it. And in some respects I believe them, there's never been anything that humanity has wanted enough that it didn't get. So with the degree of investments that are being put into this, it's highly likely, though unclear what it actually will mean.
- That's a very smart answer. Very smart answer, it is, it is. I think it's a realistic answer as well.
- Well, yeah, I don't like saying that it's going to be, the best thing you can say is like, I run an AI lab, AI is coming tomorrow.
- Next year.
- Yeah, next year. It's always next year. Every quarter and next quarter. Yeah, there's no clear path forward right now. That's not to say it won't happen, but people are doing some really, really crazy stuff behind the scenes, I think. So I mean, the one thing we're guaranteed is that it's gonna be an interesting next few years,
- That's for sure. And it's kind of funny that we're talking about this today because today as we're shooting this, Geoffrey Hinton, or is it Sir Geoffrey Hinton now? I think I saw that.
- [Edward] Yeah.
- He just won the Nobel Prize for physics, which was a bit of a head-scratcher for his AI work. He's very worried about AGI, but it's funny because I think that ChatGPT 3.5, that was the Turing test moment where these chat bots kind of reached that Turing test point. And I had an interesting exchange recently on LinkedIn because I've been seeing more and more comments that you look at it and you're like, was that a human? And I bet you it does fool most people who don't understand how this technology works. So there's one person that was just a tiny bit off, so I did a little research into his post and I was like, yeah, nah, I think it's AI. So I made a blog about this. I did sort of a reverse Turing test on him. I don't know what to call this yet, but essentially, you're too good to be, when an AI is too good to be human. I was like, how am I gonna trip this AI bot up? So I'm like, I know, they know everything. So I started to talk about hockey, and then from there, I switched to a deep question in physics and I asked it about it, and then I went into a deep question on I think it was biology or archeology. And because there is so much deep expertise by the AI bot, it was too perfect to be human. So it's sort of like the reverse Turing test. So I think that that is the direction that we're going. We're gonna see more and more of this, and we alluded to it earlier that we're getting these Rubicon, crossing the Rubicon moments where AI does captions better than humans, alt text better. And what does that mean to the accessibility industry? What I'm trying to say is, are we gonna see a lot of jobs being lost in the accessibility industry specifically, or is it just gonna change what the job is or is it gonna create more jobs? Because if you're providing captions for so many more people, you're opening a window for a lot more people to get involved in anything they dream of. What are your thoughts?
- The way that I look at it right now is we have very minimal impact on jobs because there's so much pent-up latent demand in this industry where we actually, we just commissioned the world's largest accessibility study to date. And we're gonna be publicly releasing the results here in about two weeks. But one of the headline figures that we're already seeing is that about, this, by the way, this is self-reported data. So self-reported institutions on average across the country. By the way, we have a sample size of about 25,000 so it's tremendous, are saying that about 92% of their content today is not legally compliant with the laws they know about. So the ADA but there's also state-level, there's local-level stuff.
- And that's, sorry, Edward, that's domestic or?
- This is in the US.
- [Eamon] You're just domestic?
- You're saying the average institution has about 92% of their content not compliant. So that's either no captions, no audio description, often it's a combination of both. And you know, this is average. So what it means is that you have a handful of people that are maybe only 30 to 40% compliant and most are like nowhere near that. And it's taken us 34 years since the ADA passed to get to that 8% compliance that we have today. And the Department of Justice earlier this year has said you now have two years to cover the remaining 92%. Most of the conversations, private conversations I'm having with CIOs today, or any, you know, leader at these institutions, they're calling it the accessibility crisis where they have no clear path to fixing these items. And you know, then they hear about us and they're like, okay, this is incredible. We can keep our budget the same, cover all of it, maybe create a new budget if they didn't even have one before, which is often the case. And then they have a clear-cut solution to how to do this. And also the time, obviously, with us automating most of it makes it possible. So the end result, is that you're gonna be able to cover all of these students. But you know, kind of getting to your question, that 92% of demand is something that would've never been served because of how expensive this is. It's something that just wasn't available. And so when you offer a product on top of it that can then take care of all of that, the result is that you have so much more market. And ultimately, our end focus, our end dedication is to the students. The way that we measure our success at the end of the day is how many students do we give access to their media, to their education, frankly, versus how many we had yesterday. That's our primary growth metric. A lot of people will index ARR or number of customers and that's nice for investors and financial reports. But ultimately, I think we will be successful if we focus on how many students do we get access to their education tomorrow versus today. That's our entire focus as a company. And I think the only way to maximize that number as an industry, accessibility or education, we wanna say that we believe in universal design learning is to do it this way, is to automate a big chunk of it. Otherwise, you know, you're gonna have what you had for the last 34 years, where so many students just will pay tuition for education they do not have equal access to, which is absurd, especially with how expensive tuition's getting.
- What does your interaction look like with those individuals with disabilities, both from a student perspective and do you have employees with different disabilities or just like, what does that engagement look like both internally and externally?
- Yeah, absolutely. I don't think you can build a good accessibility company or at least call yourself, you know, ethically a good accessibility company if you don't invest heavily into this and you don't surround yourself with those people. Internally we are, I mean, incredibly well-represented. I think it's one-to-one and the remainder have, you know, we have people here who were court stenographers, we have people who are special education teachers. We have people whose direct family, their daughters, their family members are hard-of-hearing or deaf. And I think for them it's a big reason of why they come and work here is they realize that we can have this, you know, phenomenal impact on people that just isn't possible anywhere else. Like it almost feels like nonprofit work, I think to a lot of people, which is, you know, why I love it personally, 'cause that's most of where my background comes from. But I also think outside of, you know, having that perspective internally, you also have to have it externally just because of internal biases. So anytime we release a product, we always go out and we partner with, you know, whatever the major nonprofits or institutions are in the space and we're like, we want you to test this to kingdom come. Before this ever sees a student or the light of day, I want you to validate and be willing to put your name publicly on this product before it ever goes out. So before we did caption, we partnered with some of the deaf and hard-of-hearing nonprofits, the institutions in that space. And then the end result of that is that we had a vetted platform that we knew had been used in mass by people who it was built for. Which I am astounded is not like default practice in this industry.
- Should be.
- Where it's like, you know, if you build cars, you should have people who drive cars, get inside of them, and try them out. And this is not standard. So we try and put it through as much rigorous testing as possible. And I mean, generally, like, I mean, it would be incredibly hypocritical if we didn't do this. Like we update our VPAT literally every month at this point because of how many new features we ship out. And technically with a VPAT I think you have like this very weird amount of like a timeline, like six to 12 months or something that you don't need to update it even though your product can radically change. You know, I think we're generally like way too lax on some of the requirements we have because people have been so far behind that nobody wanted to say the actually moral thing, which is that everybody deserved this 34 years ago if not way before, before you know, all the digitization happened. And this is the standard that we should be meeting. So I think it's much easier politically to tell people just, you know, get the accommodations done, just do this, and then once they get there, they move to the next and the next step. And I think, you know, as a company, we have an interesting opportunity to try and show like how we should operate. So we're also constantly asking for ideas on how we can do this better. Like for example, we recently started doing like very, very advanced screen reader usage on our platform, which I don't even think there's like any sort of certification for. And we're working with some outside institutions to see if we can create a product certification specifically for advanced screen reader usage where there's some coverage under VPAT for like amateur-level VPAT or screen reader usage. But it leaves so many people out. And so I think you have a unique opportunity where if you have the funds of a company to invest into this type of thing, you probably should
- Again, this brings up so many thoughts in my mind, particularly VPATS it's actually something I've been working on, modernizing the VPAT because it's not agile, it was built in a, you know, in a whole different time when everybody was doing the waterfall methodology and we still have it today. It's kind of ridiculous that, you know, we have two-week sprints and then six months, one year, sometimes two years VPAT updates. So I'm glad to hear that you're innovating from this perspective as well. You know, I try to put myself in other people's shoes, especially when I'm doing something like an interview for a podcast and I was imagining the, you know, what it was like from your perspective, you're getting a video, you're doing these captions, and you know, you have to deal with these hallucinations. And that brought to my mind, wouldn't it be cool to personalize all of this so that when you are getting captions and we'll get to audio description as well, some people prefer it to be more descriptive. Some people like to know more about the music that's playing, some people less. And then it hit me, wouldn't it be cool if you did all this captions for let's say a series and then the person said, you know what? I really like this series, can you keep going, and, you know, create for me this story, for next season and keep it going? So I'm curious if you see that once you've reached the point where AI is doing all of this, transcription, closed captions, the whole deal, that you go to personalization and maybe on-demand captions where you start and the captions for some reason, I don't want to see all that music stuff. Can you personalize it, and stream this to me, and customize it? Do you think that that we're gonna see that day with Echo Labs?
- I haven't considered this before, it's a really interesting idea. So personalization of all accessibility outputs. So I think with captioning specifically, our guidelines are, we're actually rated like from a legal perspective on our accuracy to the text. Like how true are we to what people are saying. So we can never change the text itself but we can make updates to things like atmospherics you referred to, like, you know, the sound, the background, the music. This is interesting. It actually wouldn't be that difficult to personalize it. I know that we're actually launching for Luna, our audio description engine, we're allowing people to personalize the degree of detail because you have, you know, very, very heavily detailed items if you have like a biology lecture, but you don't necessarily need that for a movie clip. And it intelligently does that today. But in the case that an individual has like a personal preference, we let them change the voice of the audio description and we let them change the degree of description, whether or not it's like very, very advanced or very short. And these are actually very easy updates to make. I mean, you're basically passing your updates into whatever generative LLM model you want and then you pay them a fraction of a cent and now you have the personalized output. Like it wouldn't be that difficult to roll this out. I think personally, like most of the features we wanna focus on are going to be about how can we make this material more useful to people in the long term where let's say you focus on education specifically, you have these captions, you have this video description, how can you create study guides which aren't just based on the transcript, but let's say you've got a professor on a whiteboard, you know, drawing up these like economics diagrams. How can you grab it off the whiteboard, put it onto that document, and then create a study guide or like these teaching materials that would take a TA like a day to do on their own. I think that's a very clear-cut path toward like driving some like very serious value for people in terms of like automated material generation. And there's other points of value I think as well, whether or not that's like generating the next lecture based on the existing lecture. I think the professors might get a little mad at us if we're like, don't worry, you don't need to come in tomorrow. We're gonna create, you know, your next 40 lectures or maybe they would appreciate it and we give them the transcript and they go read it out. But no, it's an interesting future. I think, you know, the cost of intelligence is so low and it's just gonna get lower that, you know, there's so many things that we just wouldn't have even thought about doing before that today or in the near future will be everywhere. Like I didn't take many classes when I went to college, but one of the ones that stood out to me in econ was this idea of like what you can do with oil based on the price of it, right? So when the cost of oil is like a hundred dollars per barrel or something, people are only going to use it for, you know, machinery that will have very, you know, high-value outputs, right? You're gonna use it to, you know, for your cars, you use it for whatever economically, like very valuable means that oil will be put into. But when the cost of oil decreases enough, that's when people start making rubber ducks, you know, from the plastic. And that's so absurd. It's so over the top that if you lived in a world of only, you know, $500 barrel oil, you would never even conceptualize doing. Well, now, the cost of intelligence has dropped dramatically. And I think it's just gonna be this super interesting renaissance to think about what does that allow from a UX perspective, what does that allow from, you know, entertainment to be created in real time on the fly infinitely many times? Like I don't think anybody has a coherent vision of how it will impact the future because it's just so, it's such a big update. Like with that technology, you could theoretically generate a million versions of a Netflix episode and then run through everybody's watch history or an individual's watch history to try and pick out from those million generations which one you think will be the most effective version of it. So you can generate engaging versions of entertainment from scratch. And then I don't know where that leads us ultimately, like kind of spooky, but it's so interesting what you could do with it. Yeah, I don't know. Again, it's just gonna be a very, very weird next few years.
- Edward, I could listen to you all day. I really could, but we wanna be really cognizant of your time, but the two, kind of an AB question I wanted to make sure to get in was A, has there been, what's it been like and the truly, I'm sure there's been multiple inspirational moments through your career up until this point with the work that you're doing that's really, you know, just been, made you proud of the work that you're doing? And B, how has that progression and that growth, that like exponential and like incredible growth that Echo Labs has taken over the past two, three years, how you've been able to maintain that startup mentality, and that hunger, and that culture as well? I know that's two separate questions or two pretty heavy questions, but I wanna make sure to try and get them in.
- I think at least to the second question, it's true that we've been able to achieve quite a bit I think over the last year. Like from an absolute perspective, I think we've grown something to, I think it's actually an order of magnitude faster than what our like most optimistic investor predictions were for our first year in business. But our goal is not a relative improvement or growth. The end goal is a very simple mandate of we wanna make accessibility universal. Like that is the end goal of the company. That is what we will live and die by. So if we make a hundred million dollars next year, but we don't do that, then nobody in this company will feel relief to stop. You need to have that type of goal. I think all great companies have like a very clear end goal, like this is what you are fighting towards, and sometimes it's just an abstract concept. But the point being that you will never be able to reach it. You must always keep pushing toward it. I think perhaps one of the best examples of this is just SpaceX's, it's just so concrete. Like, we want to go, we wanna make, you know, humans a multi-planetary species, we want to go to Mars and they will, they will somehow go to Mars and do all these crazy things. I think, you know, it's not an official company principle, but I had somebody come into my office yesterday and they gave me, they had like grown their startup from like nothing to I think it was 150 million ARR in like two and a half years, this crazy growth. And he told me that him and his co-founder had never celebrated like not once, they never celebrated the entire time. And everybody around them thought they were crazy. And the reason is 'cause to them, they had like this thing between them and their co-founder of like, what's next? Like what are we actually doing here? It's not to celebrate, like what is the, okay, we hit this milestone, what's the next thing we need to do? And I think you see that as a common thread in people who do these very, very big things is that they seem very big. But if you ask them what they're actually working toward, it's often so outside of reality in its ability to eventually be achieved that it's like they will always push themselves to just keep running toward it. You have like a never-ending hamster wheel in a good way, like-
- In a good way because I think like money is not your key goal, money's not your core goal. Money's not your core goal.
- [Edward] No.
- And having a mindset to achieve something else that can be much longer-lasting and much more fruitful in the long run.
- No, I think most founders treat their companies as like a game of ski ball. The goal is to just keep throwing the ball and the idea is like, you want to hit just in that perfect place, and then, you know, right in the middle get all that money, and then you take your tickets and you go. I like playing it like pinball. The goal of pinball is to keep playing pinball. Like you just want to keep playing again and again and again, keep putting quarters in. Like the reward for being good at pinball is you get to keep playing. And I think that's the best way to do it because if you're thinking about the money, it means that you're thinking about what you want to do with that money. And it probably means you're living some sort of deferred life track where what you're doing, you know, eight hours of your day at minimum is, you know, preparing to go do this other thing. And it's really unfortunate and it's one of the biggest reasons why I dropped out to start this is that I realized it's like I was telling myself, oh, I need to do this for four years, go to university, then go get a graduate degree for another four years. If I'm lucky, we'll do it in six and then I can go do what I want to do. There's no reason to do that, in most cases, there's no reason to do that. I mean, I'm not a doctor, so I'm not working in like a legally prohibited area. So the metric is can you do it? Like, can your results actually function? And if that's the case, as most parts of life are then you just wanna figure out like where you wanna keep playing pinball, I guess. And so that's the way that I look at the company.
- It's pretty cool. It really is, man. It is.
- It's very, what is that founder mode? What was it called, founder mode?
- Oh, really? Yeah, everywhere. I have a friend who put on a hat and I don't know, people are-
- And made a million dollars, right? Selling the hat.
- Yeah. But come on, it must make you feel good at the same time. Like, you know, meeting the students, seeing the daily impact that what you have created, the positive impact that can have on a daily basis with students, millions of students. Like that's pretty freaking cool, it is, you know.
- Yeah, I mean I spend 15% of my week at this point just talking with the end students or with the customers and it's like, you need to have a reason to wake up in the morning and just do a ton of effort all the time. If you don't have that, you're not gonna get very far. Especially if what you're doing is very, you know, stressful or whatever it might be. So yeah, having that is like a very, very clear motivation, I think to everybody on the team where it's like, I will do all of this and not only will I make a good amount of money doing it, but I know that I will solve this for students. And whether or not the company is successful, it will not go back to what's before. I mean, we're playing against, you know, these titans of industry that I respect in a lot of ways, but have, you know, one to two orders of magnitude more employees than we do, and the way that they will do business will be forever changed regardless of whether or not we exist two years from now. Like, I don't think this industry will ever slide back after the work that that we've put out here. I don't think that's gonna be possible. And so I think that's also a very motivating factor as well.
- Could you go into a bit more detail on Luna, your audio description project? Because I think this is one of the fields where there's not nearly enough audio description, it's almost impossible to scale that unless there's an automated way to do that. So I'm pretty excited about it. And also please go into the differentiator. Why is it, what is your approach that makes you a leader in beating these benchmarks? I'd love to understand that.
- Yeah. Okay. So Luna, Luna's pretty cool. I don't know if you've seen a demo yet, but to kind of explain it at a high level. Luna is the world's first AI audio description engine. Nobody has ever built an audio description engine before and it is fantastic. It beat our version of the Turing test where we put it in front of a thousand students and we gave them, half of them were Luna, half of them were, you know, $18 a minute human expert audio description. It's crazy that it costs that much. And 87% of students preferred Luna, they thought that we were the human version. So we passed our own version of the Turing test there, which was incredibly exciting. I mean, our goal was to get 50%, like we wanted to be on par, that would show that people couldn't tell the difference. But if you get 87%, they know the difference and they prefer it. Which was, I mean, that was so elating I think to the entire research team. Saves us a lot of month of R&D, like ready to go to market. And the exciting thing about Luna is, let's say you're a school type, which is our focus market, you must have audio description on all media you create within the next three years. If you're a large school, two years if you're a smaller school, the Title II DOJ regulations put that into effect. The only way to do that today is to spend between 15 to $18 per minute and to wait five days per video. Best case scenario, if you're uploading a lot of videos, you're gonna wait forever and you'll eventually get it back. You will be dramatically poorer than you were before. And the quality of those videos is not very good at all. They will have humans go in and write the items and then they'll have a very, very robotic voice do the description itself. And the end result, is something that nobody wants to listen to regardless of whether or not they need it because it's so, like we've had a lot of people describe it to us as like nails on chalkboard. Like they would rather have not had the audio description despite it being an accommodation request. That's like how low the bar was. So the exciting thing about Luna is that it sounds incredibly human. It's beautiful in the way that it slides into the video itself where, let's say you've got a video that has this background noise and let's say it's like a symphony, so music in the background and the scene changes, so it's time to describe it, the video will pause, but we have intelligently found a way to extend that soundtrack behind the scenes while we do the description. And so it feels like it's actually a part of the video that it was done in post-production in that lab when the original piece of media was created, which is how it should all feel. It should never feel like an overlay whether or not it's for a website or for a piece of media. It should feel coherent. And the great thing about all this is like not only is the end quality preferred, but the cost starts at $4.49 cents. So it is literally like a fourth, or fifth of what people are paying. And for some people get it even cheaper than that through like Internet2. We have a partnership with them and it comes bundled with captions for that price. So it's like even cheaper and it generates in less than a minute. So you can upload a 10-hour long video in 60-something seconds, you'll have this end piece that's ready to go on par with the subject human matter expert. And the result is that like these schools that we work with are finally able to like first off breathe a sigh of relief. Like I had a meeting today with I think it's the second-largest public institution in the country. They were about to go drop $10 million on audio description to help become compliant. And they're gonna have to pay like a fraction of that. And they're gonna get it done in like, the timeline is so crazy that we are validating it for them right now by literally just doing half over the next couple days. And we're just gonna blow their socks off. And I think that's what technology should feel like. It should be exciting. So I think Luna's gonna be incredibly, I think for education, it's gonna change the way that we think about accessibility dramatically. And I think once it's eventually applied to other industries, it will mean that you have this absurd degree of accessibility for people of all backgrounds where, you know, every movie should come with this, every show should come with it. If you can generate it that quickly, you can have a live stream of it happening in real time, which is unheard of for AD, there is no human that can do AD in real time. I think there's gonna be a lot of fantastic applications of it. And so, right now, we're starting in higher education, giving it to the people who need it the most. And over the next two months, I think we're rolling it out to about 50, 60 institutions that have signed up for like beta testing. And at the end of the year, we're gonna have about 150 so far. So I mean it's gonna be great. It's gonna be everywhere. This might get into the weeds of it a little bit, but we put so much training data into this thing where it was like really fantastic humanly labeled training data that the way that it performs on all sorts of background, like even the most technical, like, you know, we have literally, people are running quantum physics lectures through our captioning engine.
- Oh my God.
- Like yeah, no, it's very crazy. And it's just fantastic. Sometimes like it catches stuff that like even the human doesn't think is accurate and then we have to go through and be like, no, if you like really turn it up, like that's what they said. It's kind of, it's spooky sometimes, but yeah, no, I think it'll be great. Let's definitely, let's do it.
- Synchronizing the captions to the sound. One big pet peeve I have is I'm watching a comedian, you know, give a talk, I turn on the captions and the punchline always shows up on the captions before it is spoken. Have you done any work when it comes to synchronization and how hard is that, and is that something you've been thinking about?
- This was actually one of the easiest things for us to solve. It took us about a week to build our own, it's called forced alignment, I think is the research term for it. And there exists open-source forced aligners. They were academic projects and they got like 80% of the way there. And the ones that they got off were off maybe by like 30 seconds tops. So most of the industry just decided, hey, we'll use this and then, you know, we'll be fine and we'll save some time there. Like they decided it was good enough. So there's very little actual research on how to improve it. And when our team got on it, it took us, I think it was like not even seven days. And that's not to say that we have a fantastic research team, though we do. But it was because there's been so little, like this is a big part of my frustration with the accessibility like technology that's available today is like you could tell that the reason that that's even an issue is because people were not willing to invest into it and to build new technology for it. Because if they did, they would've found very quickly that it's actually a very tractable engineering problem. And so it's quite unfortunate, but the good news is that the end result is that, you know, you have a version now that does work that will not spoil like comedy for you before it happens. And the hope is in the next couple years, this will be default in any environment. Whether or not, you know, you're watching this guy on YouTube or on Netflix or you know, in a lecture for some reason. You have a very, very fun class. You get to watch comedians. That's the end goal for that. But I think there's a handful of items like that that it's like, it's crazy to me that it's not already standard. Like I think the world is fundamentally rate-limited by the amount of people who care enough to go do a thing. Like it's not money, it's not intelligence, it's not ability. It's like not enough people care enough to just make it their daily in-and-out grind. And I'm glad that we get to build a company that this is the only thing we do, AI accessibility.
- Please put an API on this because I would love to build something that takes your API, takes the comedy that I listen to and then I'll just superimpose those freaking captions onto the comedy 'cause it'll be so much more fun. So thank you so much for joining. This was really a blast. I'm really impressed with the work that you've done and I'm sure we're gonna have you back again for updates. We wanna hear what else you're doing.
- Thank you for having me, I had a lot of fun.