Accessibility and Gen AI Podcast

Josh Miller - Co-CEO & Co-founder, 3Play Media

Episode Summary

Hosts Eamon McErlean and Joe Devon interview Josh Miller, Co-CEO & Co-founder of 3Play Media about digital media accessibility and the use of AI to enhance captioning, subtitling, and audio description. They discuss how AI helps to improve efficiency and accuracy in transcription and translation, while also emphasizing the continued necessity of human involvement for quality and nuanced content.

Episode Notes

OUTLINE:
00:00 Opening Teaser
00:44 Introduction
01:24 What's More Challenging... Coaching Your Employees or Your Son's Little League Team?
03:05 How Was 3Play Media Founded?
09:35 How Pedals Are Used For Captioning/Transcription
10:37 Tracking Metrics of Time, Quality, and Accuracy
13:46 What's The Difference Between Captions vs Subtitles?
18:48 Impact of Accessibility and AI
25:03 Importance of Human Involvement with AI
27:13 Current State of Automatic Speech Recognition (ASR)
32:27 Use of AI Tools Internally At 3Play Media
34:09 How 3Play Media Maintains Its Culture As The Company Expands
36:30 The Different Roles of Each Co-CEO (Josh Miller, Chris Antunes)
39:53 European Accessibility Act and How 3Play Media is Addressing It
44:12 How Gen AI Will Affect Globalization of Accessibility
47:59 How 3Play Media Has Been Recognized for Their Work in Accessibility (Audio Description)
51:44 The Benefits of Audio Description
52:54 Wrap Up

--

EPISODE LINKS:

3Play Media
https://www.3playmedia.com

Automatic Speech Recognition (ASR) Report
https://www.3playmedia.com/blog/annual-state-of-asr-study/

European Accessibility Act
https://commission.europa.eu/strategy-and-policy/policies/justice-and-fundamental-rights/disability/union-equality-strategy-rights-persons-disabilities-2021-2030/european-accessibility-act_en

3Play Media's ACCESS 2025 Event
https://go.3playmedia.com/en/access/2025-save-the-date

3Play Media's Linktree
https://linktr.ee/3playmedia

Josh Miller on LinkedIn
https://www.linkedin.com/in/jgmiller3/

Episode Transcription

 - I did an interview with Heather Dowdy from Netflix, and she shared that they're starting to see audio description get used a little bit more than usual with certain series, "Bridgerton" in particular, because it adds a little spice to the experience for people who are watching it. And I thought that was really great that, you know, here we're finally finding reasons for people to use audio description, kind of the way we can say that lots of people use captions 'cause it enhances the experience. That was not exactly the way I thought it was gonna come out, but it was a fun one to hear about.

 

- Welcome to episode 10 of "Accessibility and Gen AI," the podcast where we talk to the people shaping the world of accessibility and artificial intelligence. I'm Joe Devon, joined by my co-host Eamon McErlean, and today we're speaking with Josh Miller, the co-CEO of 3Play Media, a company that's at the heart of making digital media accessible with the use of AI. Josh, welcome to the pod.

 

- Great. I appreciate you having me on. This will be fun.

 

- Yeah, my pleasure. We've done lots of panels together and you were definitely one of the people I was waiting to bring on the pod. And with that, let's start with baseball. You played Division I baseball, and now you coach your son's Little League. What's harder, baseball, becoming like really good at baseball and coaching kids at Little League or running a company and dealing with the coaching that you have to do with your employees?

 

- Oh, this is a fun one and I have to be careful here. So I think that the benefit of the company is that, to some degree, we are making very conscious decisions about hiring and who we're bringing on to be part of the team. And they're all in, right? That is their full-time. What they're devoted to full-time and we're all kind of marching towards the same goal. I think the challenge with Little League, as much as I enjoy it, is that you've got a whole lot of different priorities there. You've got some kids who are really, really serious about baseball and you've got some kids who are there as an activity, and you're trying to create this balance of fun and instruction for all of 'em. And that's a hard thing to do when you only have a little bit of time with them. So it's fun, it's rewarding. I hope I can help make it fun for the kids, but it's very different and probably the more challenging one.

 

- How old is your son?

 

- So he's 12 and he's on the more, the competitive side, taking it seriously. And we're now into summer baseball. And so for him, it's a lot of fun, and he likes to compete, but that's not gonna be the case for every kid on the team necessarily.

 

- Yeah. But it's enjoyable. It is enjoyable to see them grow.

 

- Yes.

 

- Well, as Joe mentioned, listen, thanks for your time. We appreciate how busy you are. Do you mind if we dive in a little bit into how 3Play Media was founded and even tying back into the MIT project? Was it OpenCourseWare?

 

- Yeah. Yeah. So I like to say that we kinda had a customer before a product, which is an unusual place to be and a very fortunate place to be. So there were four of us at MIT Sloan together who are working on a couple of projects and ended up stumbling on this one where one of our co-founders had a good relationship with some people at MIT OpenCourseWare, and MIT OpenCourseWare was really the first group putting lectures up online for free. And these were MIT lectures. So this is really impressive content, very rich, and a pretty amazing opportunity for people who wanted to consume university level lectures for free. So what happened was that their funding sources, one of the foundations, basically said to them one day, we're gonna continue supporting you as long as you make the content accessible. And their reaction was, what does that mean? And so they saw a bit of a learning expedition where they figured that, okay, we need to start captioning this library of content. So they went out to different providers in the market and at the time it was 2007, so YouTube was still very new. All of the captioning in the world was done by companies focused on the media industry, 'cause that's really the only content that needed captions or at least, you know, certainly the only content that was being regulated for captions. And that was very quickly gonna be impossible because it was gonna be half their annual budget. So there's just no way they can make that work. The prices then, I wish we could charge the prices that they were finding back then, but that is not the case anymore. But it was honestly, it was absurd. The prices were very, very high and it didn't make any sense, and it felt like an opportunity to disrupt. So we started to investigate that and trying to understand what went into captioning. And we started working with Jim Glass, a professor in the spoken language lab at MIT in their CSAIL, Department of Computer Science and Artificial Intelligence Lab. And looking at all the different iterations of how you could use speech recognition in different parts of the process. So whether it be starting with a human generated transcript and aligning it, starting with a speech recognition generated transcript and correcting it. We even met with Larry Goldberg, who was at the time running the access center at WGBH and understanding how are captions actually created today. And we got to witness things like people using foot pedals and all the kind of traditional technology that was used for creating captions. We were pretty set on this idea that if we were gonna try to solve this, it had to be done with software, it had to leverage speech technology and it had to really be something we could scale. And so we ended up designing some tools to edit speech recognition output very efficiently and then QC it and basically turn it into captions. And we went from a measure of we thought, you know, kind of 10 times human time to runtime into what was close to one or two times. So what I mean by that is, and this is a measure we think about really carefully, is how much human time does it take to caption like a one hour video, right? So if it takes 10 human hours to caption a one hour human video, that's a 10 times real time. And so that's what it was at the time. It was 10 times human time and that's where the cost is. So how could we make a human more efficient? And so first it was, okay, now we can caption one video way faster. How do we do that 10 times, 100 times, 1000 times over and over and over again and also measure the quality and make sure that we weren't sacrificing anything.

 

- And how long did that take, like that process?

 

- So that process was, it was a number of months that we really were developing that first software interface to enable a human to do the correction. We started, I mean it was a combination of testing with the OpenCourseWare content, but then also finding other customers, and we basically said, okay, we think there's a market here, you know, this education space in particular where accommodations need to be taken seriously as they start putting more and more video up online. We need to be there for them and we need to actually help solve this problem because this is a problem that needs to be solved. And so we actually got really excited about the idea that not only was video gonna explode on the internet, but there wasn't really anyone addressing it in an exciting way. So we viewed that as an opportunity to differentiate and didn't touch the media market because that's where there was saturation. But in this kind of more streaming or what was soon to become streaming world, no one was doing anything very compelling. So that was kind of our first foray into this idea of thinking through, not just can we win, but do we have the right to win? And that's kind of how we approach everything we do, which I'm sure we'll talk about other services too, but we think very carefully of are we doing this differently? And at that time, we were very convinced we were doing it differently. And so that's how we really got it going. And then we're able to start testing on different types of content and building the tool and tailoring the tool to not just one customer but to many. And I think that's the thing with any startup in any space, is those first few customers, you're kind of a consultant, you're kind of at their beck and call. It's just like, what do you need? We'll do it. And we needed to get out of that as quickly as possible and make sure that we were actually building something that is more generalizable.

 

- I always find it fascinating when companies have a metric that they're trying to improve on and that sort of, that's their north star at least for a while. And I'm gonna ask you another question about that, but first I want you to answer what you meant by pedals, 'cause I didn't quite get that and I'm sure the audience would like to know.

 

- Yeah.

 

- And then the question will be, is there a new metric that you're looking at now with AI changing everything?

 

- Yeah, it's a great, great question. So foot pedals in the captioning world, and this is the same for court reporting and such, certainly, with any recorded content, there's basically need to be a way to very quickly pause a video, move back to rehear what's being said so you can type it because there's just no way you're gonna hear everything perfectly the first time around. So that's what the foot pedal allows you to do so that you can just reduce the amount of motion or kind of steps taken so you don't have to take a mouse and click on a button and the video player, so it's tied into that video player and allows you to go back 10 seconds or whatever you set it to so you can just rehear it quickly. So that helps speed up that process of making sure you hear everything correctly as you're typing. In our world, we built in all kinds of macros and basically different tooling to do the same concept, but just in a full hands-on no other equipment just in your computer setting.

 

- Got it.

 

- And yes, the metric, so times realtime is the metric. I mean when we think about all the different services, we think about that, that is the holy grail. How do we reduce times real time in everything we do without sacrificing quality? So there is this kind of other metric, you know, quality metric and that's I think honestly a very hard one to measure on its own right, 'cause you can define quality in so many different ways, what's quality to in educational customer is very different from what's quality to Netflix, you know, they think about it very, very differently, and we have to deal with that 'cause we play across all industries now. But that is definitely what we try to understand is, what are the metrics that our customers care about and then how do we tailor that operationally?

 

- So you've got that time metric, human or let's say speed of transcription versus real time. Then you've got like word error rates, I saw in your State of ASR report, something FER, format error. I don't know what that stands for.

 

- Yeah.

 

- But so essentially those are three metrics. So those last two are like quality metrics.

 

- Exactly. So word error rate is the very traditional or the most commonly used metric that we hear about when we talk about speech engine accuracy rates. So when we talk about whether it be Whisper or Speechmatics or Google, I mean we're always talking about word error rate and that's what is pretty much universally understood and it makes sense. So the inverse of word error rate is accuracy. So if you have a 8% word error rate, that means you have 92% accurate rate. So that's an easy one to play with. And it's exactly what it sounds like, you know, for the, if you have 100 words, how many did it get right, how many it get wrong, and however many it got wrong is word error rate. Formatted error rate, FER, is where it gets really interesting from a readability standpoint. And we think about, for captioning and an accommodation use case where you really have to be able to follow along what's being said with just reading, formatting matters. So if you put the comma in the wrong place, and we like to use the funny term, let's eat, grandma, if you put the comma in the wrong place or don't have the comma, like you're saying a very different thin. I don't think anyone wants to suggest for actually eating grandma. So, you know, little things like that matter. And that's where formatted error rate comes into play. So capitalization, punctuation, even speaker identification, things like that really, really matter from a readability and actual consumption standpoint that I think is missed by a lot of people, quite frankly. And so when we think about how speech recognition can do the job, the answer is, well, it depends on the content, right, and how critical is it that people can actually turn the sound completely off if needed or not be able to have the sound on for whatever reason and be able to really follow what's going on.

 

- I'd love to learn more about how you've grown and what 3Play Media do today in regards to the services you offer. But just before we go into that, can you clarify for our audience, 'cause actually it comes up all the time, the difference between captions and subtitles and even the nuance between them, terminologies in the US versus Europe.

 

- No, I was just gonna say, are we talking as if we're here in the US or if we're in Europe?

 

- Exactly.

 

- Yeah, so you hit the, there are a number of issues there. So traditionally, we think of captions as same language kind of interpretation. So from an English source video, you have English text or from a German source video, you have German text. So same language text to that language audio. So that's how we would start when it comes to captions, subtitles, we usually think of as a translation of the language into another language. So starting from English into Spanish or English into German, that would be subtitles in the way we think about things. Now, to your point where it gets really confusing is if you're in Europe, subtitles covers everything. So subtitles is actually the term used for what we would call captions, but it's also used for the translation use case. So it does get a little more confusing, which is why we like to keep it to captions and subtitles as two different things. I would argue both are accessibility use cases. We, you know, from the beginning, and I should have mentioned that when we started, there are two things we got really excited about. One was caption data, if you break it down into data, is time synchronized text, which is really rich information for what's in the video and where. And so we were playing around with search tools that were really cool to actually take you to different parts of the video. But same with subtitles, it's very rich time synchronized text in another language, and when it comes down to it, we are talking about enabling content consumption and viewing and enabling people to enjoy content however it's best for them to enjoy it. It is access to that content. And I do think that sometimes accessibility gets pigeonholed into this compliance conversation when really we should be talking about usability and consumption and enabling people to, you know, interact with content in what's the best method for them regardless of ability and all those things. And I do think that the language piece gets left out, but in reality it's a really powerful tool to enable more people to consume content.

 

- And that ties into your suite of solutions, right, between your predictive captions, your-

 

- Yeah, exactly.

 

- AI audio and then your live summarizations.

 

- Yeah, so we started with recorded captioning and transcription off the bat, and as we built out our customer base, we certainly started getting asked for other solutions, things like subtitling was certainly an obvious one. And it's interesting if you talk to localization providers, they'll tell you the captions are just a means to an end, 'cause ultimately they want that subtitling business. So we launched subtitles, we started getting into audio description. So that was one that for a number of years in the beginning we even said that is not in our wheelhouse. We don't know it well enough and we don't have a right to win and be different. The more we got asked about it, the more we started to look into it and really understand what is so difficult about audio description, how can we make it better, how can we do it differently? So we launched an audio description offering almost 10 years ago now and thought really carefully, how can we do this more efficiently and make it scale better? And then got into live captioning. So that was something that we launched about 2018, maybe 2019, and then expanded quite a bit through an acquisition. So we've been building that out more and more and now getting into even more language capabilities. So, you know, for many years, we've been known as an accessibility provider and really focused on the kind of captioning, transcription, live captioning, audio description, the subtitling piece was a very small, small part of our business to be honest, until more recently. And so we've been building out much more advanced language capabilities, including kind of a more advanced version of our subtitling capabilities. But even getting into dubbing multilingual accessibility, so captions and audio description in other languages. So over the last year or so, we've been heads down really expanding our language capabilities. So really thinking of ourselves as a localization and global accessibility provider, not just an accessibility provider.

 

- Being involved in the accessibility and disability community, everything that you're saying makes a lot of sense. But now let's try and go out from like 30,000 feet to somebody that has no knowledge of, you know, neither the disability community or what accessibility or closed captions, none of that. Can you translate for the general audience what somebody that needs these services, how what you currently do changes their life and how what's coming up with AI is going to, like what really excites you about the new features that are gonna be possible when it comes to AI and digital media?

 

- So I think first and foremost, there's a lot of content out there and in some cases is critical for people to be able to consume the content, whether it be for a class or an emergency alert or whatever it may be. Or certainly just leisure and an enjoyment of media content. If it is not captioned, start there. I mean, how is someone supposed to consume it. And even think like a podcast, right, so a podcast is often up until maybe a couple years ago is an audio only concept or radio, right? So how is someone who's deaf supposed to consume that without text? I haven't figured that out. So, you know, there's an enormous amount of that content out there. And so just at a basic level, you know, we should be able to make it possible for people to consume that. And I do think podcasts and radio, because of how clearly they're often recorded and how the people on them often speak relatively clearly, they actually lends itself pretty well to speech recognition better than maybe a lot of other cases. So that's a great thing when we think about AI and how it can contribute to improving people's lives, that is a very real thing right there. And I think it's the same for language. I mean we are seeing now, thanks to the streaming revolution, we can watch movies from France and people in Korea can watch shows from the US or like "Squid Games" here in the US you know, blew up. That's not made here. But people loved it and it's because of how it was treated in that kind of translation localization process to make it an enjoyable piece of content. So I think it goes in a lot of different directions, but certainly there's a huge population of people who literally cannot consume content if we don't look at how to apply what we're doing in a thoughtful way. And you know, just even the last few minutes we've been talking, we talked a lot about the operational side of how we think about how we can apply what we're doing to the content, but part of it's also how do you make it easy for the publishers and the content owners in what they're doing? 'Cause if it's not easy for them, they're gonna find all the excuses in the world of why not to do it unless they are, you know, very convinced they'll make more money off of it. So we have to really think carefully about that side too is how do we make it easy so that there are no excuses and that this is just done all the time.

 

- You believe AI is gonna play a pivotal role in that?

 

- I think AI can play a role in a lot of the different aspects of it. So I think that AI can play a very real role in the production of these assets, no question in terms of access, you know, whether it be captions or subtitles, there is definitely a role for AI there. We've been using AI in pretty much everything we do in some form and we view AI as a very powerful tool. And our whole thing is about how do we apply a human with the AI along with the software tools we've built to really make it scale so that we can, you know, basically cover more content in a cost effective way. You know, not all content is equal. So there's gonna be content that can be processed with almost just AI, right? Or maybe just AI. Period. There's other content that needs human touch, no question about it. And there's some content that needs a lot of human touch. It all depends on what's the purpose of the content, who's consuming the content, who's the audience, and what are the consequences of getting something wrong too, right? So, you know, that could be brand recognition and brand issues. It could be that the people who are consuming the content really need to consume that content and get it done accurately. So the difference between a single student in a classroom receiving an accommodation versus, you know, people watching a news broadcast, there's very different experiences there and consequences of getting a word wrong. So we have to take all that into account as we think about what content could be handled with AI. And certainly, we weren't having these conversations 10 years ago and there's just no way. And the advancements in AI and what we're seeing in what speech recognition can do today is why we're even talking about this for sure. And even in the language space, it's gotten a lot better, but that's harder. I'll say that is much harder.

 

- Yeah. And when you were talking about getting a word wrong since you mentioned Larry Goldberg, he created the weighted word error rate-

 

- Yes.

 

- To handle those kind of things where you're changing the meaning, where you go not and it's supposed to be yes, like don't social distance versus social distance, right?

 

- That's spot on. And I think that's something that people miss and doesn't get captured in the traditional word error rate measure at all. And it seems like such a small error but it completely changes the meaning of something.

 

- And speaking of AI impacting all of this, my dream feature to see one day, I'll know we've arrived, when they can change the video so that the characters are signing each other as opposed to even having a little sign language person in the bottom. Do you think that's gonna arrive sometime?

 

- So we have seen some crazy things now in the dubbing world where there are engines that can manipulate mouth movements to fit the dub, which has been created, which is wild. And so there's certain legal issues you have to work out there depending on the type of content. But if you can do that at some point, why can't you move an arm in a hand and a finger, right? I mean at some point, it's gonna happen.

 

- You touched upon the depending on the scenario, right? It all depends about what content you're translating or communicating, what the scenario is, who the audience is. You know, I believe there can be a general consensus out there sometimes that AI's gonna solve it all and you don't need that human interaction, you know, AI's got it. Personally, I think that's a dangerous precedent to set, a dangerous thought to set, be it a platform, feature functionality, content translation, whatever it is. I do believe there's still a critical piece to ensure that human interaction still there. I'm assuming that based on your initial response you agree.

 

- We are right there with you. Yeah, no, I think that for professionally produced content or content that has a very real purpose in terms of why it's being created, we need a human involved for most of it. And maybe for all of it. I think you hit on something that's really very real in our business right now that we are hearing from people, which is kind of a desire for AI to solve everything. And that could come from the excitement and the hype of AI. It also can come from the financial decision making process. And that when you see that there is an option for a much cheaper way to do things, you get kind of fixated on it and convince yourself that it's close enough. And I do think that's one of the hard parts of the world we're in is there is this question of what's good enough. And I think it's very easy for some people to convince themselves that good enough for them isn't necessarily good enough for the real audience. 'Cause what we're doing is ultimately being consumed by someone who is not the buyer, right? So people are making decisions on the services we create, but it's someone else consuming it and there's that disconnect that we have to try to wrestle with.

 

- Unfortunately, that reason there is the reason why accessibility took a while to catch up and to be where it needs to be. Yeah.

 

- Yeah.

 

- One of my favorite, I always love looking at state of industry reports when an organization decides that they're going to tackle a particular topic, they're gonna spend some money in order to analyze what's going on in the industry and sharing it. And one of my favorite ones is the state of ASR, or automated speech recognition report, that 3Play comes out with every year. Can you share what you found in the past year?

 

- Yeah, so I'll even go back a couple years 'cause what we saw post ChatGPT launch was a very real step function improvement in what these foundational models could do with speech recognition. It went from mid to low teens in terms of error rate, all of a sudden single digits. And that was one of the most significant jumps we've seen in the last 15 years. So that was kind of step one. And we've been tracking it very carefully. What was interesting about this year was much more marginal gain. So we track this very carefully, and I'll explain why in a second. But what we saw from different engines was nowhere near that step function change from three years ago. It was really, really, really small. And if anything there was, and I say that on the whole, but there were certainly certain engines that gained more traction and made a little bit more progress than others. But again, not in a step function change type way, more just kind of on the margins. And what we see that I think is really interesting is that different engines are performing differently on different types of content. And it all comes down to what's the training data that's being used and fed into it certainly. And even the methodology underlying the engine. And what they're building towards, right? So what is the purpose of their engine? Who are their main customers? And all that factors into how the models improve. But we've seen the same, probably two or three engine providers kind of right at the top the last few years. So that part has not changed. There are a couple who maybe are kind of establishing their place at the top a little better than others. Assembly being a good example of that. They've been consistently in the top one or two the last few years. And even the Whisper model has gotten better and better and reduced some of the hallucinations, which is important. So that's good to see. And we think about this and the reason we started doing this is we were doing this internally. So we were doing a benchmarking analysis internally 'cause we take the approach of take an engine off the shelf and then build around it. So we build our own models, we have our own data science team to build on top of what's out there. And we did this very consciously years ago when we first got started, really, it was probably about 2010 when we first really made a very conscious decision to do this 'cause we looked at what was happening around us and like, well, we're not gonna compete with Microsoft, we're not gonna compete with Google, we're not gonna compete with Amazon. We can't hire hundreds of PhDs to do this at our state, at our size, and our stage. But we can build on what they're doing and we can take the best of what's out there and essentially build a platform that's engine agnostic. And even all of our services and what we're doing with dubbing is a great example. We have multiple voice engines that we're using for text-to-speech to accomplish dubbing for a number of reasons. And so that's how we built our captioning and audio description, and audio description as well did multiple voice engines. And the idea is that we can fine tune them for our use case and continue to make them better and better with the content we actually work with. So large language models and the big models are great, but they're really powerful when you start to fine tune them.

 

- I noticed that Gemini 2.0 is what, you know, when you were doing the testing, you were at Gemini 2.0, and everything moves so quickly that now we're at 2.5 Pro and it went general availability. Have you had a chance to play with 2.5, 'cause I heard that it got a lot better with this?

 

- So we've been playing with 2.5. We haven't, I don't know that we've played with it for the speech recognition aspect. We have been playing with it for some of the video recognition and what it can do there. And that was one of the biggest additions to 2.5 that we were excited about and what it can do with video analysis. So we're doing some cool stuff there with audio description for sure and really excited about where that's going. But we have not retested the ASR with 2.5.

 

- Yeah, you mentioned, excuse me, you mentioned Whisper there, two months ago, we released our voice input to Now Assist because our ultimate goal is to have a full conversational end-to-end user journey. Which again, you mentioned before a few years ago, you were like, nope, that would not happen.

 

- Yeah.

 

- But over the past six months, just that growth has been phenomenal. Do you use many AI tools yourselves through that development lifecycle?

 

- In just the services we produce, we definitely use AI tools. We look at how we use AI internally in a bunch of different ways, both in the engineering process but even in our go-to-market processes. And it's the same methodology I would say as our service offering, which is, it's a tool and it's all about how do we apply it and how do we get value out of it. I don't think anything we're doing with AI is like a standalone solution in any way, but it's a very powerful tool. So yes, I mean I was actually talking with our CTO earlier today about this question, and he thinks that at least 50% if not 60% of the code being written on our end is done with the first draft of AI. And then a human is certainly reviewing and editing and all that kind of stuff. But there's just so much power in what's available from an engineering perspective and writing code, which has been great. I mean, we've been moving a lot faster. It's really, really powerful. And then similarly go to market, there's so many different ways to use AI, whether it be drafting emails or identifying kind of lookalike targets. I mean there's so much you can do. And again, it's a tool. We have asked our team to not generate an email and not read it. You know, you must read the email before you actually send it, but it's very useful.

 

- I'm gonna ask this, I'm gonna jump off the AI track just for a second 'cause I'm always intrigued by a company like yourselves that's grown and continues to grow 2008 and sometimes at that exponential rate, culture. How do you keep that culture in a business like yourselves that's grown that fast over a period of time?

 

- It is such a great question and it's a real challenge. And we introduced a very, a bigger challenge by acquiring a couple companies a couple years ago 'cause that is a very real culture shift. I think it's communication and access. And so it comes down to really making a conscious decision as to who do we wanna be and how do we want to, not just externally, but really internally, especially, for Chris and I, my co-founder, who are we working with day to day and are we still making an effort to work with employees of all roles, all ranks, and such? And I think that is something that we've worked really hard to do is to make sure that people do have access to us, at least in some way across the board. And I think the other part is hiring, you know, and being really, really thoughtful about who we're looking to bring onto the team and making sure that our entire leadership team understands that and that they also see it the same way and it trickles down. So I think we've, you know, it's never perfect and we can always get better. I mean, I think that's something that we definitely communicate to everyone in a lot of different ways. And we have a core value called Teach and Learn, where it's always this idea that we're constantly helping each other and we're all constantly learning from each other and we all should be striving to learn and none of us have figured it all out. And I think by demonstrating that ourselves, we're able to get real buy-in that that is important. And then in terms of who we bring on, you know, that's gotta be a key piece of it. But it's hard. It's really hard. But, you know, communication I think is the number one thing and really being thoughtful about how we communicate our goals and how people can con contribute to those goals is a big part of it.

 

- So speaking of communication, you've got co-CEOs, how do you communicate the difference and what is the difference between what you and Chris do? How do you divide the responsibility?

 

- We are four of us who started, two are still good friends, but not involved day to day. We divvied up different roles and responsibilities early on, so among the four of us. And then as things evolved, those things changed and certainly as they left the business, it was very clear that, Chris and I were still very excited about where we were going. But at that point, it was already pretty well defined, kind of which parts of the business we each owned. I very much own all the go-to market side of things. He owns operations, engineering product. Yet at the same time we are, you know, constantly talking, constantly communicating and figuring out how do we help each other? How do we weigh in on, you know, each other's efforts in a very productive way. And it comes down to the fact that we just trust each other and that we know we're working towards the same goal and we put our egos aside and it's like, how do we make this successful? That's what matters most. It's how do we build a great company and we enjoy working together. So that's how it ultimately works. I mean, there's no question, to do it well, I have to be willing to let him step in on certain, like, sales conversations, and he has to let me weigh in on product decisions. And we do that. That's really, really important. But it comes back to the fact that we just, we trust each other and we're not in it for the glory, for our personal glory, I should say. We're in it to build a great company doing, you know, having real impact on the industry. And so-

 

- The real like hard part I think when you have two different leaders is sometimes if an employee goes to one and doesn't get what they want, they'll go to the other. Or sometimes they already kind of intuit, well, I'm more likely to get what I want from this one versus that one. Have you ever dealt with that, and how do you make sure that that doesn't happen too much?

 

- So I certainly deal with that with my kids. And so I think the good news is in the way we've divided up the roles and responsibilities, we don't necessarily see that issue so much. I think the real issue we do deal with is repeating ourselves in a way. So like we might have a conversation with someone on our leadership team and realize that we both need to be part of that conversation or someone else on the leadership team needs to be involved. And so we find that sometimes we're spending a little more time than ideal in terms of just getting things going. But I think we've gotten a pretty good handle on that too. But that is a very real piece of it is that, you know, sometimes people will look to one or the other and maybe not sure which one of us who do it. So I think your framing of it is actually probably accurate in some ways, but not so much about getting the right answer that someone might want. It's more just kind of a little more unsure where it's gonna come from sometimes. Yeah. So not in a negative way.

 

- That's conversation to a little bit of something to do with legislation and a significant legislation that's coming up the EAA, the European Accessibility Act. Would you mind sharing with our listeners maybe first of all an overview of what it is, what the EEA is and then how your sales, how you're addressing it or looking at it?

 

- Yeah, so the European Accessibility Act, it really covers all digital assets in a pretty comprehensive way that everything needs to be accessible. And the guidelines that are being used for the most part are WCAG 2.1 AA standards, which in our world means captioning and audio description, and must be on every video. Now, where it gets challenging for people who have content is there's a whole bunch of language that explains what content does need to be accessible, what content is exempt from being accessible. And it varies a little bit from EU member to EU member, but what's interesting about this is, it is one of the broadest pieces of legislation when it comes to accessibility and digital accessibility specifically across all of Europe. So what we have interpreted, and I think it's important to say that, you know, it's interpreted, we are not attorneys and we can't say for sure, but our interpretation is that certainly all media content for entertainment purposes where the content is the main piece of, or sorry, the video is the content and is what is being consumed, that must be accessible. And that one is probably the most straightforward. Where it gets a little more hazy is, say, a large enterprise who has content being distributed, maybe it's for learning purposes or marketing purposes, should that be made accessible. And it depends on who you ask. So certainly, if the case again is where the video is the main mechanism to communicate, then yes. Whereas if the video is just this kind of supplemental demo on a webpage that's talking about some other product, maybe not.

 

- Yeah.

 

- So it gets a little fuzzy, you know, we would always say you should make everything accessible. Period. And the hope is that we make it easy enough to do so. But there are, you know, it's a daunting task for people who've never done this before. And then I think that that is something we have to be realistic about, that you can't go from 0 to 60 when it comes to these things. You have to do it thoughtfully. And I don't think we want people to go from nothing to everything without thinking it through carefully. So, you know, we're trying to be a resource. We've spoken with now a number of organizations who basically have heard of it but have no strategy. And so if you're listening and you're not sure what to do, you're not alone. And so we're trying to just be a helpful voice in the effort. And we have started working with a number of large organizations, media companies that are preparing for it. So we're kind of seeing a little of everything right now.

 

- Yep. The same from our side. I think as always, the public sector taking it to heart a lot quicker and getting down.

 

- Yep.

 

- Completely agree. It's definitely nuanced in certain areas, but overall holistically for like, from a macro level, I think it's a good thing. I think it's a step up as it relates to really taking it seriously, trying to make sure-

 

- I agree.

 

- 2.1's followed through. So looking forward to the implementation and taking accessibility in the right direction overall. It's a good step.

 

- Yeah.

 

- Yep.

 

- Definitely agree. And and to be fair, I mean a lot of the organizations we're speaking with right now are US-based companies distributing content in Europe. And so it's a little easier for us to have those conversations, and they're the ones who are probably the most confused by it, to be fair. But what's also interesting is the differing penalties, you know, each country-

 

- My country, yeah.

 

- Yeah. And so we see everything from jail time to fines, luckily no beheadings, but there's some pretty drastic penalties in some cases if it's not followed. So it'll be very interesting to see how it is actually enforced because I think that will actually-

 

- Set the term.

 

- To dictate what really happens.

 

- So discussions today, it's been awesome. Josh, again, we've appreciate your time. But before we start to wrap up, I wanted to get your feedback on where you think Gen AI is gonna push localization from a perspective of content, translations, and what you see in the future in that arena.

 

- Yeah, it's a great question. It's one we're really excited about right now and spending a lot of time on. I, in some ways, feel fortunate that we're diving into it now as opposed to five years ago because we can really come at it with a fresh perspective and think about it with AI first, right, in a way. So again, we think a human's gonna be in the loop to some degree to do some parts of it. But there's so much you can do now with AI. And let's be honest, every large language company was doing something with machine translation in some form or whatever and they might not have been advertising it a whole lot, but it was certainly a big part of how people were already doing subtitles where it would go through machine translation and a human would would touch it up and if not more. So I don't think that part's new, but what is new is what the large language models can do when it comes to prompt engineering and asking for different flavors of translation in seconds. And that I think is really, really interesting when you think about different types of content. 'cause there's a huge difference between a product tutorial and a comedy video, right? And you have to be able to translate what could be the same phrase into very different ways. And you can do that pretty much instantly now and choose what's best or adapt from there. And so that I think is fascinating. That is really, really cool and we're having some fun with that and playing around with the best ways to handle that. And then the other is text-to-speech. And what's happened with voice technology and you know, we are constantly now in demos saying like, here's a human version, here's an AI version or voice, tell us which one's the human. And people don't get it right very often, or it's really a coin flip. So it's pretty wild. And so a human is still needed to do the translation and time it right and make sure that it comes out really well 'cause there are very real timing issues if you just do a straight translation and turn it into text, at least in a dubbing use case where time is finite. But it is very powerful and I do think it's gonna change a lot of video. We, you know, a big reason why more video content's not dubbed is it is just so expensive to put a voice actor in a studio and have them do the voiceover in another language, that's going away, at least as a requirement for a lot of video. I don't think it's gonna change the way the next "Star Wars" movie is gonna be dubbed. 'Cause I mean I'm sure there'll be like six more of 'em. But the video today that's not dubbed because it's just not affordable will all of a sudden be able to be dubbed. And I think that is something that is really fascinating. And I think we're gonna have to, you know, kind of question the paradigm of like what are user preferences? I think we've all been conditioned to either really like subtitles or really like dubs for one reason or another. And a lot of it's also because there was no choice. So I think we're gonna see more choice in the way people can consume content and I think that's gonna be really fun.

 

- Now, moving on to a slightly different topic. Those of us that have worked in media, we get jaded after a little while because the business side of it kind of is so, can be so cutthroat and you just take for granted that you're doing something that might reach millions of viewers and be very influential even if it's just one cog in that machine. But over the years, having worked in media, is there anything in particular that you're proud of, that you've been a part of, that you look back on and say, that was really cool, or maybe a cool, behind-the-scenes Hollywood story that you're allowed to actually say publicly?

 

- Yeah, it's interesting. We got nominated for a number of audio description awards, and I think that's really cool because, one, audio description is not talked about as much, but also it's a lot harder. And so being, you know, being recognized for that, this past year I think we were nominated or we had three different pieces of content that had been nominated. We didn't win unfortunately. But the fact that we're getting noticed for that is very, very cool.

 

- What's the criteria for that, Josh?

 

- There are a couple of organizations that look at different aspects of content distribution. So it could be soundtracks, it could be dub quality, it could be audio description. So there's a whole bunch of different aspects to video that are included in some of these award shows. And this one, I'm trying to remember exactly which group it was, but there are a few different groups that will actually put these things out. So one is more of an industry group, Digital Entertainment Group, and then another certainly the American Foundation for the Blind, and a number of other accessibility-driven organizations are doing things like this. But they're getting into the actual criteria. It is all about user experience and kind of how well was that description crafted. And it often is related to content that's hard to describe and that is, you know, really intense content that makes for a really compelling description. I will say I was actually, I did an interview with Heather Dowdy from Netflix and she shared, and I don't think she'll be too upset about me sharing this, that they're starting to see audio description get used a little bit more than usual with certain series, "Bridgerton" in particular, because it adds a little spice to the experience for people who are watching it. And I thought that was really great that, you know, here we're finally finding reasons for people to use audio description kind of the way we can say that lots of people use captions 'cause it enhances the experience. That was not exactly the way I thought it was gonna come out, but it was a fun one to hear about.

 

- That's very cool, very cool. Anything we haven't asked you that you wanna touch upon? Be it, you know, through your own career or any current ventures that 3Play Media are working on?

 

- I mean I think we've hit on the fact that we're really expanding aggressively into languages, and I think that's a big thing and it plays into our initial vision of one day all content will be accessible to everyone and maybe not cat videos, but certainly all content that really has meaning behind it. And I think we've been able to say that, like I said before, language is part of accessibility too. And really enabling people to consume all content in all ways feels like we're kind of making much more exciting progress towards than we have in the past. So that's really exciting for us.

 

- As we wrap up, I'm just gonna share one little anecdote 'cause you were talking about audio description and it just kind of dawned on me. I have a fantasia or at least pretty close to, you know, the worst where you're not able to visualize in your mind's eye and the first time I got access to audio description, it was a Disney movie and I was at a theater that had special like, you know, microphone or headphones and as I was watching it, I noticed all kind of things that I didn't notice before in the background 'cause I just don't think visually the same way. And then as soon as the audio description was gone, I realized that there was so much little detail that was missing, but when audio description is bad, you don't have any of that. But when audio description is good, it can help people, you know, in ways that you would never really imagine, other kind of differences in abilities. I think that that's pretty cool.

 

- It's a great point that it often gets left out. It's so often pigeonholed to blind and low vision, but there's so many different reasons why that added color is really useful.

 

- Yeah.

 

- It's a great point.

 

- And with that, I've really enjoyed the hour with you, always do. But can you share with us where our audience can see more about your work and 3Play Media's work?

 

- Yeah, absolutely. So we are active on social platforms, but our main website, 3playmedia.com, has quite a bit of content. We have a blog, we have webinars, and we have our upcoming access event, which is a virtual conference all about video accessibility coming up in a couple of months. So we will continue to put content out there for free to help educate everyone about what's going on in the world of accessibility and hopefully teach you a thing or two.

 

- Awesome. Thank you so much, Josh. Really, really enjoyed this.

 

- Thank you, both. I'm glad to be here and really enjoyed it.

 

- Likewise, thanks, Josh. Take care.