Accessibility and Gen AI Podcast

Eamon McErlean & Joe Devon - AIMAC: The AI Model Accessibility Checker

Episode Summary

Hosts Eamon McErlean (VP & Global Head of Digital Accessibility and Globalization at ServiceNow) and Joe Devon (Co-Founder of GAAD) present an overview of AIMAC, the AI Model Accessibility Checker. Launched by Service Now and the GAAD Foundation in May 2025, AIMAC is an open‑source framework designed to evaluate how accurately different AI models generate accessible HTML code. AIMAC prompts 37 models to generate pages across 28 categories with no accessibility guidance. Then it runs Axe-core, the accessibility testing engine used by Microsoft, Google, and most major tech companies, to count violations against WCAG 2.2 Level AA (the standard required by US and EU accessibility laws). Jennison Asuncion (Head of Accessibility at LinkedIn and Co-Founder of GAAD) joins Eamon and Joe to demonstrate how he uses a screen reader and reveals how common AI coding errors, such as missing alt text or unlabeled form fields, create barriers for blind users.

Episode Notes

OUTLINE:

00:00 Opening Teaser
00:50 Introduction
02:23 The Importance and Goals of AIMAC
06:03 Service Now's Priority for Accessibility
12:09 Challenges of Developing AIMAC
17:25 What is AIMAC and how does it function?
21:41 Part of What Inspired AIMAC
24:50 Using OpenRouter - The Unified Interface for LLMs
29:08 AIMAC's Leaderboard of AI Models and How They Ranked
37:03 Examples of System & User Prompts and the Generated HTML Pages
39:56 Using AIMAC For Your Own LLM
40:54 What Surprised Joe The Most (The Results)
43:32 Webpage Previews Generated By All Models For Multiple Categories
44:56 How To Provide Feedback or Contact Joe re: AIMAC
48:56 Pareto Frontier Chart Measures Quality & Cost
51:35 Demonstration of How AIMAC Benefits Screen Readers For People Who Are Visually Impaired
01:06:07 Results of Using MoonshotAI: Kimi K2.5
01:09:57 Top 6 Accessibility Issues Caused By AI Models
01:11:58 How Em Dashes Generated by AI Model Affects Accessibility
01:14:13 Wrap Up

--

EPISODE LINKS:

AIMAC: The AI Model Accessibility Checker
https://aimac.ai

ServiceNow Accessibility Statement
https://www.servicenow.com/accessibility-statement.html

GAAD Foundation
https://gaad.foundation

Accessibility at ServiceNow
https://www.youtube.com/playlist?list=PLCOmiTb5WX3q5qCG_QqYip2IZrvao7Bwp

Webpage Sample:
https://aimac.ai/output/reports/current/assets/html/gemini-3-pro-preview/sports-1408

Axe-core - Accessibility engine for automated Web UI testing
https://github.com/dequelabs/axe-core

A11y LLM Eval
https://microsoft.github.io/a11y-llm-eval-report/

CodeGen Model Eval and Refine Tools
https://github.com/aarongustafson/CodeGen-Model-Eval-and-Refine-Tools

Evinced
https://www.evinced.com

WebAIM Million - The 2025 report on the accessibility of the top 1,000,000 home pages
https://webaim.org/projects/million/

ArcTouch - The State of Mobile App Accessibility Report
https://arctouch.com/state-of-mobile-app-accessibility

OpenRouter
https://openrouter.ai

Moonshot AI
https://www.moonshot.ai

Github - AI Model Accessibility Checker
https://github.com/GAAD-Foundation/AIMAC

MiniMax-M1 - The world's first open-weight, large-scale hybrid-attention reasoning model.
https://github.com/MiniMax-AI/MiniMax-M1

The Pareto Frontier For AI Agents
https://cobusgreyling.substack.com/p/the-pareto-frontier-for-ai-agents

Em Dash, En Dash and Hyphen: The Guide I Needed
https://hrot.substack.com/p/em-dash-en-dash-and-hyphen-the-guide

Eamon McErlean on LinkedIn
https://www.linkedin.com/in/emcerlean/

Joe Devon on LinkedIn
https://www.linkedin.com/in/joedevon/

Jennison Asuncion on LinkedIn
https://www.linkedin.com/in/jennison/

Episode Transcription

- I think what a lot of people would love to understand is, when you first come to a page like this, how do you attack it?

 

- So the way I explain it to people, it's like, so I'm completely blind. It's almost like when I go into a room for the first time, you know, I need to know what the room looks like, if you will. So I need to know where are the tables, where are the chairs, is there carpeting? Things like that. So the equivalent for a webpage would be like, are there headings? Are there links? Where are the edit fields? So I literally, what I will do is, I will typically use my arrow keys to basically traverse the entire page to get a feel for what's there.

 

- Hello and welcome to a special episode of Accessibility and Gen AI podcast. My name is Joe Devin and I'm joined by my co-host Eamon McErlean. And today we're gonna be speaking about AIMAC, the AI Model Accessibility checker, which is a benchmark to test how different AI models generate their HTML, whether they're accessible or not. So it feels a little weird to say this, but welcome, Eamon.

 

- Thanks, Joe. Yeah, it's a different format than we used to, but looking forward to having the chat, and just trying to communicate with the audience and listeners, I think, as simply as possible, what AIMAC were originated at, and really what it's all about.

 

- Yeah, and just keep in mind folks, some of this is a little bit technical, there's no way really around that, but we're gonna try not to stick too deep into the technical stuff. We'll share that for the people that are interested in it. But don't worry, we're not gonna stay too deep, too long. So bear with those little pieces as they come up. So, Eamon, we kicked around this idea of a benchmark. We've been working together for a couple of years now, and we clearly saw that AI is gonna really change the game, and it was clear that there was a need to create some kind of benchmark, but support was clearly needed, and you came in and you really pushed this project forward. So, I want to hear from you what your thoughts are on why you felt this was important, why you supported it, and what goals you had from it.

 

- Yeah, I think you and I were on the same page right from the start. I think, and through the accessibility and AI podcast and all the phenomenal leaders we've had as guests, I think there's been a, the common theme throughout those discussions has been kind of two-fold. One, around the speed that AI is moving at. The exponential speed and the impact it is having and will have. And two, around the concerns around, is that speed a good thing or a bad thing? And the impact it can have on accessibility. So I think that's when you and I started discussing, okay, how do we raise that awareness? How do we ensure that accessibility is no longer an afterthought? I've said this before numerous times, but our biggest concern was going back to the speed. We know accessibility's always been an afterthought, and it was in the late nineties, and early 2000s in the dot com boom, and many companies are still playing catch up. And that's the truth, they are. With the speed that AI is currently moving at, we just can't afford to make that same mistake again. We really can't. So, when you and I put our heads together, it was like, okay, how can we make that scalable exponential impact? And when we thought about it, we realized, from a development perspective and a coding perspective, everything's gonna be based on the LLMs. So then the next step is, logical step is, how do we ensure them LLMs are as conformant as possible. So I think it was just a process we went through to understand where the impact was going to be and how we could address it.

 

- Yeah, and I gotta say, you're really good at naming things because, whenever I have to name something, I probably spend weeks changing my mind. And then when I finally get the domain and put up the website, I just say, I hate this. And we went back and forth with a bunch of names, but can you share what AIMAC, like the acronym, what AIMAC stands for and how you came up with that?

 

- You see, that's why we work well together, Joe. 'cause I'm a very simple guy and you're the technical guy, so it is a good, it's a good synergy. I wanted to keep it simple. I think simplicity is a good thing across the board. So, the idea that I had was, we know it's model, we're based on model accessibility checkers and we're focusing on AI. And it was literally putting them two things together. I was a little bit concerned using that MAC term for obvious reasons, but I think, literally calling it what it really is, as simple as that sounds, goes a long way. And that's where we come up with AI-MAC, AI Model Accessibility Checker.

 

- Yeah, and for anybody that wants to follow along on the website, it's AIMAC.ai. So AIMAC.ai. But you might wanna wait until the end because I just sent you to the URL, so that you won't pay attention to what we're saying next. So, that was dumb. What else? I would love to, just before we get started, I would love to kind of hear from ServiceNow's perspective. I'm seeing ServiceNow in any place where folks are talking about AI. I am seeing ServiceNow really doing a lot of advertising and clearly it's a very important aspect of the work at ServiceNow, but can you touch on how that relates to AI and accessibility from your role?

 

- Yep. I'm fortunate to be working in a company that ServiceNow that has and continues to make sure that accessibility is a priority. We've done a lot of work and the team's grown over the past four years with a primary focus around conformance, but we also wanted to have a comprehensive approach. So, we just didn't focus on conformance, we focused on usability, we focused on that direct customer engagement and engagement with our employees with disabilities, which is invaluable. We took a look at how do we measure success around our analytics and reporting. We wanted to ensure that we leveled up all of our employees' knowledge and skillset. So we made actually accessibility training mandatory across the organization. So, suffice to say that leadership support from ServiceNow has been invaluable, and we are extremely appreciative and thankful of the support. Going down the AI route, the truth is, and not being overly ambitious, but we really do want to be recognized as a leader in accessibility. And to do that, we have to be champion AI and accessibility. I firmly believe that accessibility can be a competitive advantage if you do it properly for the right reasons, you can reap the rewards, and the same when you put AI and accessibility together. So, apart from leveling up conformance and everything I just mentioned, we're doubling down on that AI highway to make sure that we add our own platform AI features as well. We're going deeper into voice and the conversational AI. We're looking at a new AI screen reader for an output to simplify the screen from an end user perspective, we're looking at AI dynamic guidance as well from an end user, we're even looking at the ability to ASL and sign the text. So-

 

- Would you mind defining those, some of those terms for readers who may not know what a screen reader or ASL means?

 

- Sure, yep. ASL, American Sign Language, for those deaf individuals. So they communicate primarily through some specific sign language, either American Sign language or British Sign Language, multiple different sign languages. To enable them end-users to interact with a UI via sign language, with an output of text. Again, we're just, we're starting down that route right now. We wanna create a POC around it. But it's just one of the areas where it's like, we believe AI can significantly help us. AI dynamic guidance, meaning via a conversational interface with our platform, you could pull up verbally a specific record, or a specific case, or a specific knowledge board article. If you wanna find more around a specific topic, you can get guidance to where you could find that article and where you can retrieve that article. Just that conversational AI, it's one thing, a two-way interface via voice. It's a whole different thing as it relates to voice-to-voice interaction, understanding your tone, understanding your real need as an end-user, and making that as intuitive and as fluid as possible. So yeah, we want to scale our impact, and that's why we've been so happy to partner with yourself as it relates to AIMAC, but we also want to level up our platform capabilities overall in parallel.

 

- That's great, and the voice is so important. I think for everybody, it's an interesting new interface that I'm seeing a lot of AI developers that don't, aren't even in the disability space whatsoever, they're using voice to code, using voice to control their home devices while they're on vacation. And needless to say, it's a really great accessibility service as well.

 

- Yep. Yeah, it's- And the key thing there, one of the key things there, even around, you know, voice is, engaging with those individuals with different disabilities, you think about an individual has a speech impediment, so, you'll maybe have to have different nuance and different settings for those individuals for them to utilize and reap the benefits of an interface like that. So, I can't overemphasize, as much as we're going down the AI world and the automation world, interaction with individuals with different disabilities, including neurodiverse, is truly invaluable, and you can't do it early enough. That interaction to understand the real needs and nuances of different individuals is, it's invaluable in many ways. And as you well know, ultimately everyone normally ends up reviewing the words, because of if it's truly accessible, it normally means that it's very simple, and intuitive, and fluid, and highly usable for all.

 

- And speaking of screen readers, later on in this session, we're going to have Global accessibility Awareness Day, co-founder Jennison Asuncion who we did interview early on in this podcast. He's going to be a special guest, displaying what it's like to use a screen reader to actually read the web pages that the different AI models create.

 

- Jennison is a rockstar, by the way. I absolutely love Jennison. Every time I meet him, I just love hanging out with him. He's a dynamic individual. I continue to learn from him every time I have a chat with him. So, love partnering with him in any way we can.

 

- Yeah, he's absolutely a rockstar.

 

- So, onto AIMAC, before we dive into the actual UI and how it works, as you were building this out, did you have like any initial challenges or how did you think about it from a perspective of the technology and how you wanted to build it out and scale it?

 

- Oh gosh, there were so many challenges. I would say early on, the biggest challenge was, well, first I had not been coding for about 15 years. I'd been running my own company, so I was more on the executive side. And so, being super rusty, what I noticed was that, coding with AI allows you to not worry about syntax so much. You could have a conversation with it, and learn, learn a lot. However, the hallucinations are very tricky, and so it would write code for you, but it would, I'm more of a database guy, you know, in my past, and so, I would let it write a lot of the code, and then I saw these bugs come up. So then I looked at the database and I thought, oh my God, this is terrible. The foundations were really off, and I had to start over again probably four or five times, and I decided to make it very database heavy just because that was my, you know, my know-how. So I wrote the database pieces of it much more in the later versions. And then once I had that foundation, I was able to keep it from making as many mistakes. Another aspect is that the models were pretty bad, when starting this. And so, every time there was a bug, and I would ask AI to fix it, it created two bugs that were subtle for every bug that it fixed. So, the further I went along, the worst the code got, until I just sort of got back into coding, put a lot of, what do you call it, like a leash on the AI, and made sure to really, every single time it made a change to make sure that all of that was captured and that I could see the differences in the code, and steer it properly. And I'd say in the last December, the models got way better and they keep improving. So, now it's much, much better, much easier to build.

 

- And was that jump an improvement across, was that like across multiple models? Did you see that improvement across the board?

 

- It's actually very interesting. So, OpenAI's model was great conversationalist and a good coder. And then, I would say the Claude models are very good communicators, very good at the writing. And then, there was a leap at Opus 4.5, where that model got far better, but it still hallucinates a lot. But because it's very good conversationally, you see most people claiming that, that's the best model. But it turns out that, OpenAI came out with an amazing coding model. Their 5.2 series are fantastic at coding, but they're terrible at conversation And so, there's this indie developer, Eric Provencher who created something called Repo Prompt. And it's amazing what a solo developer can do, because now, I use the Claude code to ask Repo Prompt to put together the context, like which files should be changed. And then it talks to the OpenAI's models who does the actual coding and then passes the information back to Claude Code. So now you have the great conversationalist, and you have it all in a really neat, you know, a really neat way of working with this. And he's just a solo guy, but I've never had such good customer service. He will answer no matter where you send him a DM or whatever, he'll fix bugs, you know, immediately. And if any of you try it out, and see accessibility problems, he told me that, that he will be happy to fix any accessibility problems. That's not his specialty, but-

 

- That is so cool. And what period of time are we talking about? Because you put in the hours, you and I have been fortunate enough to work with you for a while now. You do, you put in the hours. What period are we talking, weeks, months?

 

- Oh, I mean, I would say the first version which I showed you, it took me three days to do.

 

- Okay.

 

- And then it progressively took longer. We had that horrible period where all those bugs were accumulating. And I guess one of my problems, you have a lot of coders that are very practical and they come out with code that does the trick, but it's kind of ugly inside. And for a while I really tried to overdo the perfectionism, but I've gotten a little bit better at that, in the sense of being practical, and not necessarily having the best architecture in the world. That's just a personal feeling.

 

- Find that optimal balance, right?

 

- Yeah.

 

- Before we dive into the demo, can you share with our listeners, what is AIMAC? Like, what does it really do?

 

- We'll try to show you through the demo that might make it a little bit more real for you. But at a high level, what we're trying to do here is, ask the different models we're testing, we're testing 37 models, ask them to generate an HTML page. And then we take that HTML page and we run it against Axe core, which is an automated accessibility testing tool. We look at the errors that come back, and then we provide a grade for each of the models. And then on that basis, we release this benchmark. And to understand why a benchmark is important is that the AI model companies compete on how well they do on benchmarks. So you might've heard, as they come out with different models, they will tell you how well they did on, for example, the medical exams, to become a doctor, or to become a lawyer, those are all benchmarks. But there hadn't really been any prominent accessibility benchmarks. And that's why we really needed to create this, so that the AI model companies will compete.

 

- And is there anything else like that, like what we have created out there at all?

 

- You do have a couple of employees from Microsoft who released their own benchmarks. Michael Fairchild is one of them, and Aaron Gustafson. And those are really cool. But I think it's just a little bit different when it's, you know, we partnered, we being the GAAD Foundation partnered with ServiceNow, and because it's sort of under the ages of a foundation, I think that it's a little bit different than when you have a large company that's coming out with it. So, I think it just sort of has a different tone. I don't really know what their agenda was for coming out with it. I think they probably wanted to have that internally.

 

- And, you know, you mentioned support before. I think it's one of the beautiful things about accessibility work in this area. People are so collaborative, they do wanna help for the greater good. You know, even working with some of our partners and even competitors to a certain degree, like we want to do the right thing in this arena, we really do. And that's where we do share best practices, ensure that we all move in the right direction for the greater good. As idealistic as that sounds, it is important. It really is.

 

- Yeah, and I'll also add that, Aaron, I'm on a weekly call with Aaron. We talk about the benchmarks all the time, and Michael Fairchild as well. And they both offer to collaborate and help in any way they can. So, we're all in it for the exact same reason there. There's no competition here at all.

 

- Have we seen the utilization of AIMAC increase over the past several weeks?

 

- I haven't actually looked at the statistics. That is something that is worth looking at, but I'm not so worried about the raw numbers, as I am the people that are looking at it. And there have been some pretty prominent folks have reached out and said that they really enjoyed the research. And I know for sure that it's being looked at by some important folks. There's at one one big model company that is looking at it very closely, and a couple of others, I'm pretty sure they are, let's put it that way.

 

- Yeah, we use Axe cores ourselves at ServiceNow. We're also, we'll be starting to use Evinced, but when you and I talked about what we wanna do with this as regards, should we productize it, could we monetize it? I think there was no doubt at all right from the start, we said, no, this is gonna be an open source tool, period.

 

- Yeah. But people can use it to help decide which models they should use. And one of the reasons I used Repo Prompt more, was the result of- As the results came in, it was like, well, maybe we should be using the GPT 5.2 models more as we'll see shortly.

 

- Okay, so speaking of which, do we wanna dive in and have a quick look?

 

- Yes, absolutely. First I think it would be good for us to look at the inspiration for part of this, which is the WebAIM Million report and the WebAIM million report. Here they looked at the top 1 million homepages. So, they grabbed it from the Tranco ranking, and they said they have their own accessibility checker, automated checker themselves. And they said, why don't we take a look at the top million pages and grade it against accessibility. And what they found, you should check it out yourself, it's at webaim.org/projects/million. But the main thing that I want you to take a look at is there is a chart of WCAG conformance. And WCAG stands for Web Content Accessibility Guidelines. And what they have found is that, when they first started around 2019, they had something like 97.8% of the webpages failed on accessibility. And they've been running it six, seven years now, and now it's at 94.8%. So essentially you went from 98 to 95% conformance, which is really pretty bad. And when they looked at how bad these pages were, they typically had, from my memory, something like in the 50 plus errors. And we have a little chart here up that shows you what the top errors are, which is low contrast text, is about 80% of the errors. Missing alt text is 55%, and then the next four are missing form input labels, empty links, empty buttons, and missing document language. And what's really interesting is that, we had very similar results on AIMAC in the sense of the low contrast text. But the last time I looked to see how bad the alt text was, I don't think we even one model missed the alt text. They all had alt text, which is fantastic. And, I think on the document language it's almost perfect as well. But the other items are very similar. Then I'm gonna do a little shout out here to the State of Mobile App Accessibility Report. This is, ArcTouch came out with this, and there they found that 72% of the user journeys of the top mobile apps were inaccessible. So now you can see between WebAIM and the SOMAA, the State of Mobile App Accessibility Report, that it's pretty bad when it comes to the accessibility. And we know that AI is taking over in a really big way. I think 2026 is gonna be huge, huge change across industry. So we obviously had to make sure that accessibility is taken a lot more seriously. So, to that end, the way that we start is, there is a company called OpenRouter and they are an interface to different large language models. And this way you just have one password, and that gives you access to over 300 models. So the way it kind of starts is, if you click on the models tab, you can see they have 616 models that you can reach with just one, what's called API key, but it's basically a password. And the timing is actually, right, to show you all of this because a new model came out called Kimi K2.5. And I'll just show you the different models that, this is a, I believe it's a Chinese, independent, open source model company. And they, we have been looking at the Kimi K2 models. September 5 is the one that we had been looking at and Kimi K2 Thinking. And so, now they have a new one called Kimi K2.5, and this seems to be their state of the art model, based off of the Kimi K2 model. And what's a little different here is that they're very agentic. So you might've heard that term where you have models that can run for a really long time, but it seems to be based off of Kimi K2. So, we're gonna remove the Kimi K2 model and we're gonna start looking at the Kimi K2.5. So I just wanna show you what the process is, of actually running a data collection and then I'll keep going with the demo and then we'll be able to see what the differences are. I'm not gonna go into production, we're just gonna show you a staging website so you can see, what it will look like when we replace one of the models. I think that will be helpful for you to understand the process.

 

- Perfect, yep.

 

- This part is a little technical, but don't worry about it, there's no test here. This is the actual command that we run in order to get the models, and this is the list of the current models. And it's a little hard to read because it's just commas between them. But you can see those two models I mentioned, kimi-k2-0905, and Kimi-k2-thinking are currently in the list. And we wanna replace one of them with Kimi K2.5. So, I'm gonna run a new command where I replaced the Kimi K2-0905, with Kimi K2.5. And then I'm going to run this inside of a terminal.

 

- And we can add as many models to this as we wish.

 

- Yes, but it's expensive.

 

- Yep.

 

- It costs, to do a full test, a full collection is about $150, but we're just removing one, these are cached. So in other words, we're not rerunning models where we have the data already, but we probably should do that once a month. So I'm gonna hit Enter. So now you can see, this is a lot of technical detail you probably don't need, but, essentially, you can see here that there are 37 models that we are trying to reach out to. We have removed the Kimi K2-0905, and we've added Kimi K2.5. This is 1,036 different HTML pages. We do not have a cache for- A cache means like we do not have a previous data for this new model, so we're going to ask for new models. And it seems like some of them came back a little bit quickly, but there are a bunch of errors. So, we're gonna be retrying them and we'll see if this works or not. I think Kimi K2, it's one of those open sourced models that very often fail, we might have to run this a couple of times. But this is generally what it looks like on the technical side. You don't have to remember any of this obviously. So now let's take a look at am AIMAC.ai is where this is. And different benchmarks have what's called a leaderboard where you can see how well the different models rank on whatever is being tested. So, over here you can see that, we have from 1 to 37, we have different models and they have what's called an AIMAC debt score and a total cost. So, it was very tricky to name the scoring, because most benchmarks, the higher the score, the better. And it's intuitive when you're looking at a score that a higher score is better. But in this case, the way the scoring works is, if there's an accessibility bug, then that's a ding. And so, every new accessibility bug raises the score. And so higher scores are actually worse. So, after a lot of conversation and thought, we finally came up with AIMAC Debt, I probably should have asked you Eamon to name it because, I don't know that this is the best name. But AIMAC Debt or AIMAC Deficit is what we have now. And so, the top four scores, the best are from GPT. You have GPT 5.2 Pro, which has a 2.77 AIMAC debt score, GPT, and I'm reading it all out for folks that might be listening to this on audio, or otherwise can't see the screen. So GPT 5.2 Codex has a score of 3.05. GPT 5.2 has 3.05, and then a small model that's less expensive, GPT 5.1 Codex Mini has a score of 3.78. So this is OpenAI really dominating this with the top four models. And then out of nowhere, a model that is getting lots of good reviews, an open source model called MiniMax M1 is sort of tied with GPT 5.1 Codex Mini, which is at fourth place. It has the same score, but it, you know, we have a like, tiebreakers on minor details, so it's officially coming in five. And what's, and then number six is the OpenAI's o3 model. So they have five models in the top 10, which is incredible. And then, one interesting thing to note is that we've got a cost column. And that cost refers to how much it costs to generate the HTML across all 28 different categories that we look at. And those 28 categories map to the WebAIM Million, they categorize their million pages in categories. So, we're matching to those 28 categories. And so what's really interesting here, is that GPT 5.2 Pro, has a great score of 2.77, but it costs $95 to generate those pages, just for one run, which is obviously too expensive for most cases. It's the best model out there, that you only want to use it when you have really big problems, you'll use it here and there. For the most part, your day-to-day model, you want to pick something else. And so, if you look at number two, GPT 5.2 codex, it costs a $1.94 to generate and it gets an AIMAC debt score of 3.05. So it's pretty close in quality, but the price is much lower. And then, you can get a smaller model and the smaller models tend to be much faster. So you could go with GPT 5.1 Codex Mini for 39 cents, or you could go with an open source or open weights model, the MiniMax M1 for 82 cents, which also has a pretty good score. So that just gives you a sense of it. Do you have any, was this explanation clear or-

 

- I think from a listener perspective, can you share like what would be the tangible difference between say the GPT 5.2 at 2.77 score, versus something like, let's go down a little bit, the Claude sonnet, number 19 there, at 4.7? So, difference between 2.7 and 4.7. Is it just numerous additional issues that are found within that LLM that are generated? Is that ultimately it?

 

- Yeah, it's the number of issues that you have. But just to give you a sense, I clicked on the GPT 5.2 Pro, and I'm looking at the table. So, when we scan the table here, you can see that, you've got like 2.0, 3.53, business is 0, career is 0, education 0. All of these zeros mean that there's zero errors for those categories. And the worst, if you can sort this, so the worst category would be sports, that has a 10.0 score and we actually split out the actual violations. So it has one critical violation, and 25 serious violations. Right? And now if we drill down into the sports category, you can actually see the webpage that was generated and you can see which issues they had. So remember this is the worst category for GPT 5.2 Pro and there is one critical aria-required-children. And then color contrast, most of the models struggle with a color contrast to be honest. So that is GPT 5.2 Pro on sports. So if we want to make a comparison, a fair comparison, here, you can see the table for the Claude Sonnet 4.5 and the scores here are like 4.80, 3.05, 3.98, 4.37, 4.64. You can see that there's a lot of variability. They do have one that's zero at gaming, but then they have some bad ones, let's just do a little sorting here. Health and fitness is 16.94, with four critical violations and 13 serious. But I think we did, was it shopping? Which one did we look at?

 

- Sports.

 

- Sports.

 

- So let us look at the sports, which didn't do as bad in the sense that there's zero critical violations and nine serious ones. And so I opened up the page here for Claude Sonnet 4.5, sports. So you can see the actual page. And this one is actually not too bad because it only has nine serious and it's color contrast, which is very common. But now let's take a look at one of the bad ones, like health and fitness. So we've got four critical violations with select name is missing and it also has the color contrast with 13 serious issues. And then we have the page over here. So, it can vary a fair bit, but this is really the foundation of where it comes from.

 

- Yeah, it's that breakdown, that more granular breakdown by layer, first of all by category, and then by specific defect/issue is phenomenal visibility. It really is, Joe.

 

- Yeah. And then I think it would help to actually go into some of this detail here, because some people don't quite know how this works. So there's something called a system prompt. And what we do when we're asking these pages to be generated, is we first create a system prompt, which is the same across all 1000 pages. And in there, we are giving a role to the AI, and we say that to the AI that it's a lead web developer at a digital agency and that its job is to create a one page website. And I wanted something that would work as like a full page that you could get a sense of what does it look like, when they generate a page. And so, I had it ask for creating placeholder images and different fonts that they could use, essentially different ways to lay out the page that would look good so that we could compare. And then, there's something called a user prompt, which is a prompt for that specific category. So in this case, we asked it to create a page in the sports category, and then gave it, we randomized some of these things like what description to use, what links to use, and then these two prompts together get requested by all of these LLMs, and then they return the HTML code behind a webpage. And so all of this, we surface this, so that you can validate did we make any mistakes in our approach? How does this all look? And we'd love to have any feedback, if you catch anything, we'll be happy to fix that. But after the page is generated, then we actually show you that page, and I'll come back to it in a second. But we send that Axe-core which we mentioned earlier, and they come back and they say, okay, here are the errors, the web content accessibility guideline errors, the violations. And so you can read everything behind how this scoring was created. And then you can look at the actual webpage and this is what it looks like, and as you can see you have sort of a logo thingy here, you have an image background, and all of these things are possible because we created that system prompt that gave the LLM a way to render out a placeholder images in the background. And I think you can see very clearly how nice the pages are, or not nice they are. And this is another unique aspect of the benchmark. Did you wanna say something?

 

- No, that's awesome, I love that breakdown. Yeah, I do. One of the things that our listeners may be thinking about is, if they are either want to use an LLM that's not on our list, and or have their own LLM, what would their best solution be?

 

- So, I think that they're gonna struggle to find one LLM that's not in here, because all the major ones are really covered. But there is, this is open sourced, so, if you click on this code on that page, you can see where you can actually download this code and run it yourself locally. You might need to, you could use AI to help you do this if you're not a developer, but this is typically, you know, something that developers will have an easier time with. But we do have all of that open source so people can run their own code themselves.

 

- But what was your biggest surprise as you've went through this whole process, adding more and more LLMs, what has been something that wouldn't say shocked you but surprised you over the past several months?

 

- Yeah, I would say the results. So if you look here, you've got OpenAI did really well and, you know, they were a long- They did give, Be My Eyes, who we interviewed. Our first interview I think was with Mike Buckley from Be My Eyes, they were launch partners with OpenAI. So they clearly are speaking to our community, to the disability community, and it shows, I was also surprised that, you know, I would say there's three, you can argue four major model companies that have a lot of money and a lot of compute, and that's OpenAI, it is Anthropic, it is Google, and then the fourth would be X, which is Elon Musk's company. And if you scan the top here, you can see that OpenAI dominates. And then you have these open source or open weight models from China, mostly MiniMax, Qwen, Kimi K2, KAT Coder. They dominate the top 10, and then Anthropic, their best model is Haiku, which is not even a great coding model, and that comes in at 11. And then you look at Mistral, they did not do as well, that's a French model. And then at the very bottom of this list, is the number one flagship model from Google called Gemini 3 Pro. And it came in at 37 of 37. That one kind of surprised me. And also Anthropic, I would say surprised me that they were kind of middle of the pack because they do have quite expensive models. They are openly talking about how they are the ethical AI company and yet they released like a designer extension slash plugin that was really rather inaccessible. And they don't seem to have any attempt to make things accessible. And you know, I'm not one of those that likes to focus on the negative and all the name and shame, but the results are the results and there's, you know, that's what we're here for. So, you know, that's the unfortunate fact. They're good people. You know, I have connected with them, you know, many of their folks, and they're very nice. But I would love to see a little bit more emphasis on accessibility. Eamon, can I show one other other thing that I think would be-

 

- Yes, absolutely.

 

- would be helpful? So, when you click on one of the models, if you go to the gallery page, you can see that there are screenshots of the different pages. So you can sort of see, okay, let's say I am in the government space, what does it look like in the government space to generate a model? And so you can click on it, and then you can view the live page, and see what it looks like for that particular model. And this one is Claude Sonnet 4.5. So like I said, it's not the best coding model, so the result will probably be so-so, but what I wanted to get at more than that was, if you click on government itself, you can see how every model looks in terms of generating a page in that government category. So across 28 categories, you can have this gallery look overview, and then see, okay, well what if I wanna go with MiniMax, what does the page look like? And you can just click on it, view the kind of page that they generate. And even if you weren't in accessibility, I think it would be rather useful to be able to see, well, what do the different models look like visually.

 

- Preview, yeah, love that. I do, I love that. For those listeners that have any maybe feature suggestions or feature asks, how should they go about submitting those?

 

- Good question. So, I do have my email listed over there. They can reach out to Joe@gaad.foundation, and send a note, or hit us up on LinkedIn, I think we'll be good. I'm at Joe Devon on any social media that I'm involved with.

 

- And going back to where we started about the impact, and I think you mentioned this on the New York Stock Exchange interview. The volume of new code that's being generated, you know, now versus a year ago, it's just, it's like you say exponential. That's maybe an underestimate for it, you know?

 

- It absolutely is exponential. I don't even know how to begin to describe to you because, it's growing so quickly that, you know, we could probably spend half an hour, an hour, discussing what happened this weekend. There is an independent developer who created a PDF tool that he sold to a large company, made a lot of money, and he just decided to open source, create his own agent. And everybody's trying to create these agentic models that, where you could just give it tasks. He used Claude Code as well as Codex, Codex did the writing. He did something similar to the process I described with Repo Prompt. And he generated the best agentic harness in existence. And you can talk to it through a variety of, like, you could iMessage your own personal agent. You can control your computer while you're on vacation. It will suggest things to you if you give it access to your email account, it will send you notices about what's going on, about emails. You could have it send out social media posts for you. It is just simply incredible. And he did it for free, because, you know, he already had his exit so he doesn't need to make money off of it. And he's been running, he has I think five or 10 separate $200 accounts with both Claude Code and Codex. Their maximum is $200, and he runs through them in like two days. So, I've run out of Claude code a couple of times, and I manage my time. He just like maxes it out, spends a few thousand dollars and 24/7 he has agents that are writing code all the time. And this one solo person, who doesn't need to work for a living, is just doing it for fun, has blown up so much, that this weekend everybody bought a Mac mini to run it on.

 

- Oh, wow.

 

- Some people are joking that this was funded by Apple, 'cause they wanted to sell their Mac minis, but it's in fact just a solo developer. So that's just to give you a sense, I mean there's like, there's about 3, 4, 5 people that are maxing out their subscriptions, and have, you know, running 24/7 the agents. So needless to say, these folks are ahead of the curve in terms of enterprise. It's gonna take enterprise a little time, but every major programmer, every well-known programmer has been coming out saying that they don't really code much anymore, they just guide the models.

 

- Yeah, they just prompt.

 

- And because the model, yeah. And because the models code so much faster, by the end of 2026, it's a whole new ball game.

 

- Yeah, again, going back to the rate, but going back to the productivity that one person can have, creating their own agents and agents for those agents, it's a whole different ballgame, it is. Anything else you wanted to cover before we wrap up?

 

- I wanted to share the Pareto Frontier model. This is, again, a little bit technical, but I think it's important for people to understand that when it's very common with benchmarks that you have this quality layer, which is the score that we're talking about, and then the cost. And there is something called a Pareto Frontier chart that visually shows you on a chart, you know, how should I say this? The best combination of value. So we visually here have a chart in front of us that shows that GPT 5.2 Pro is the best model. It has the lowest AIMAC score, but it is all the way on the right because the X-axis, the X axis is cost, so it's way off on the right. And so you don't want to use that as your day-to-day model. And then, all the way on the left, close to zero, you have a whole bunch of models that are doing really well. And then visually you can see that, way up on top, you have the Gemini 3 Pro is not very good at accessibility. So it's an outlier on top, and then it isn't even the cheapest. It costs a little bit of money. And then what you basically want in this chart is the bottom left, you'll see GPT 5.2 Codex probably has the best combination of lowest cost, and best when it comes to AIMAC. And so you'll see these Pareto Frontier charts a lot, and this was really, really hard to make accessible. It took me weeks to get this thing to kind of look okay, you know, look okay, show you enough data that you could actually read and yeah, oh, I want to add one other thing. The Claude Opus 4.5, that is the flagship model for Anthropic. And you could see that it's both not that great, it's middle of the road when it comes to cost. So it's a little bit pricey and it's pretty high up there in terms of the AIMAC debt score. So it's not real good at accessibility and it's not real cheap. So this gives you like a visual view and I think that as we keep adding models, that will be a really core chart for people to look at if they want a visual look at this data.

 

- Absolutely, it's a great consolidated way to show, like where you can get your biggest bang for the buck, ultimately, with the quality. Love it, love it. So, in order to understand how AIMAC works for screen reader users, I'm delighted to welcome our very own Jennison Asuncion. For those of you that do not know Jennison, Jennison is Joe's partner in crime and co-founder of GAAD, Global Accessibility Awareness Day, I've had the pleasure to meet Jennison a couple of times in person, and thoroughly enjoy this company, and welcome Jennison.

 

- Thanks Eamon. I'm going to be demonstrating to you a couple of pages using the JAWS screen reading software, and what that essentially does is it reads back to me what's on the screen at any given moment as I'm traversing it, either using my arrow keys or the tab key.

 

- Just to be clear though, what we're trying to do here is to show what the HTML that the different AI models create, what it sounds like with a screen reader. And we're gonna show a couple of pages that have a few issues on it. So Jennison, would you like to start with a Gemini 3 Flash preview in the real estate category?

 

- Perfect. What I want to do for viewers is, for you to first hear what the screen reader sounds like, in using my, the speed that I typically listen to with the screen reader, so let's do that. I'm gonna refresh the page.

 

- So that's the speed that I listen to JAWS at. I'm gonna slow it down to what I lovingly called seeing people's speed.

 

- [Automated Voice] Slower, slow, slow, slow, slow, slow, slower, slow, slow, slow, slow, slower.

 

- And we'll try that again.

 

- [Automated Voice] Vanguard Prime Realty vertical Bar, elevating your home journey.

 

- So I'm gonna hit the down arrow,

 

- [Automated Voice] Vanguard logo graphic, Vanguard Prime, elevating your home journey. Navigation region. Same page link, market. Same page link, listings. Same page link, neighborhoods.

 

- So I'm gonna stop there, but and this is the speed we'll keep it at, as I demonstrate some of the accessibility issues on this page. So what I'm gonna do now, is I'm going to go to a form field on the page.

 

- [Automated Voice] Location slash zip edit, blank.

 

- Notice here it said location slash, zip edit. So, it's telling me that's the name of this field, it wants a location or zip code. And it said "edit" to tell me that, this field is ready to receive text from me. So I'm gonna hit the tab key, and keep traveling through this page,

 

- [Automated Voice] Combo box, residential, to change the selection, use the arrow keys.

 

- Now it said "Combo box, residential". So, with no context here, I'm not sure exactly what I'm selecting. Is this the type of neighborhood, is it the type of property? I'm not sure what this means. All I know is I'm at something called a combo box, which I know to be a place where I can use my arrow keys to select something, so I'll hit my down arrow key.

 

- Commercial.

 

- Commercial.

 

- Land.

 

- Land.

 

- Commercial.

 

- Commercial.

 

- Residential.

 

- Residential. So I know that those are my options, but I don't know exactly what I'm supposed to be selecting here. So what would be helpful is, if it would've said, type of property combo box. But as you heard-

 

- [Automated Voice] Load combo box, residential to change-

 

- All it said was, "Combo box, residential." So I'm gonna tab again.

 

- [Automated Voice] Find agent button. To activate, press Enter.

 

- Find agent button, so that sounds fine. I know what that is supposed to do.

 

- [Automated Voice] Browse all 142 listings, right towards arrow link, to activate, press Enter.

 

- Okay, so that's a link.

 

- [Automated Voice] Get professional appraisal help, link. To activate, press Enter. Edit spin, box 500000 to set the value used in the arrow keys or type the value.

 

- Okay, so notice here, it just read out a number with no, like, I'm not even sure what this is supposed, like what does this number represent? What if I change the number? What am I changing it to? Are these dollars? Is this the property value? I'm not sure. So, it's missing, this form field is missing some sort of label to tell me what exactly this value represents, and if I change it, what I'm changing exactly. So I'm gonna tab again.

 

- [Automated Voice] Edit spin box 6.5, to set the value, use the arrow keys or type the value.

 

- Okay, so again, it said, "Edit 6.5". Again, and I think it also said it was a spin box, but I don't even know what that 6.5 represents, because again, this is missing a label. So it should be saying something to me, as to what this 6.5 represents. There's a concept inaccessibility called name, role, state, and value. And in both of these situations, this one and the previous one, the name was missing. So I get the role, I know that it's a text edit, no, this one's a spin box. So I can hit the down arrow for example.

 

- [Automated Voice] 6.4, 6.3.

 

- I can change the value of it, but I'm not sure what I'm changing the value of is, because there's no name to this particular control, or form field. So I'm gonna tap again.

 

- [Automated Voice] Combo box, 30 years, to change the selection, use the arrow keys.

 

- So combo box 30 years. Again, I'm not sure like is that 30 years for what? For me to pay for it? I'm just not sure. Now I know it's a combo box, so I know I can arrow up and down, and so I'm assuming that when I arrow down it's gonna change the years.

 

- 15 years.

 

- Yeah, 15.

 

- 10 years.

 

- 10.

 

- 15 years, 30 years.

 

- 15, 30.

 

- But again, I'm not sure what this is representing, because there is no name for this element. The role is clear, and the values are clear, but there's no name.

 

- [Automated Voice] Recalculate payment button, to activate, press Enter.

 

- [Jennison] So yeah, that's a button, that's fine.

 

- [Automated Voice] View portfolio link, to activate, press Enter. Contact Marcus link, list with three items, how to handle home inspections, rightwards arrow link. To escrow milestones 101, rightwards arrow link. To activate, press enter. The closing cost checklist, rightwards arrow link. To activate, press enter. "What is quote 'Earnest Money' quote, and why is it required?" Button expanded, to activate, press Enter.

 

- That's interesting. So notice here it said that the button was expanded, which is helpful because that tells me that there must be some items to select. If I hit the space bar though, I would expect it to say that it's collapsed. So let's see if that actually works.

 

- [Automated Voice] Collapsed.

 

- [Jennison] Nice.

 

- [Automated Voice] Expanded.

 

- So that wasn't necessarily an error, but I just wanted to demonstrate that, you know, the screen reader, if things are coded correctly, the screen reader will provide that level of detail information, and that was provided through ARIA.

 

- [Automated Voice] "How does a mortgage contingency protect me?" Button collapsed. "What happens during the appraisal phase?" Button collapsed. To activate, press enter.

 

- I'm getting the feeling that this is a frequently asked questions section just based on what I'm hearing here.

 

- So that was really interesting. I think what a lot of people would love to understand is, when you first come to a page like this, how do you attack it?

 

- So the way I explain it to people, it's like, so I'm completely blind. It's almost like when I go into a room for the first time, I need to know, you know, I need to know what the room looks like, if you will. So I need to know, where are the tables, where are the chairs, is there carpeting? Things like that. So the equivalent for a webpage would be like, are there headings, are there links? Where are the edit fields? So I literally, what I will do is, I will typically use my arrow keys, my up and down arrow keys, to basically traverse the entire page, to get a feel for what's there.

 

- Okay, so you don't first like, pop open a popup with headings itself, and then-

 

- Because the headings on their own, would be taken out of context. Like I could certainly, I could do this.

 

- [Automated Voice] Heading list dialogue.

 

- Right. Sure, I could do this.

 

- But all this would tell me is, yep, there are headings. Like that's all that would mean to me. Unless I'm going-

 

- [Automated Voice] Link at heading level three.

 

- Within the actual, within the context of the page-

 

- [Automated Voice] Loan amount- 500000 at its spin box.

 

- Oh, that's what that is. when I hear the heading in context-

 

- [Automated Voice] Bold heading, level three finance estimator.

 

- This at least, like this is more meaningful to me navigating through the page than just looking simply at a list of headings, out of context.

 

- Okay anything else you want to share with us about-

 

- Not on this page. Maybe we can go to the next page to look at as another type of error.

 

- Yeah, so you want to see Gemini 3 Pro preview a sports page?

 

- Yep, so let me pull that up. Alright, so here we are at a table,

 

- [Automated Voice] Beginning of row. Team, column two of six, pos, column one of six.

 

- So the first thing is the position

 

- Team column two of six.

 

- Team.

 

- [Automated Voice] W, column three of six.

 

- [Jennison] W, whatever that stands for.

 

- L, column four of six

 

- L is another column heading.

 

- [Automated Voice] PCT column five of six.

 

- PCT.

 

- STRK, column six of six.

 

- So I'm gonna start, I'm gonna go to the first row of the table

 

- [Automated Voice] One, row two of six.

 

- [Jennison] So that's the position.

 

- [Automated Voice] Team Boston, column two of six.

 

- [Jennison] Team is Boston.

 

- So you did not, the screen reader did not say the word B, which I know from looking at the code, that's an image actually, but it just shows a sort of like a B for Boston and there was no alternative text, right?

 

- All I heard was Boston. So Joe, are you saying that there is an icon?

 

- Yeah.

 

- On the screen visually?

 

- [Joe] There's an icon before Boston and all of the rows have an icon before the city name.

 

- Okay, so yes, because the alternative, the alt text, was it just missing or was it alt equals quote, quote?

 

- It was missing altogether.

 

- Okay. So I wouldn't have even known, unless you told me I would, I would just assume that, all that was in this particular column, or this location is just the word Boston.

 

- Okay, so the image is completely invisible to you.

 

- Correct.

 

- Okay.

 

- If I went down to the next one, I'm curious-

 

- Miami-

 

- So for Miami, what is the icon next to Miami, M?

 

- There's a red M.

 

- A red M.

 

- So that would be useful information, like I would wanna know, that the icon for Miami is a red M.

 

- Right.

 

- That's information I would like to know. But it's not being provided to me because the alternative text description was not included in the code.

 

- Right, okay.

 

- And that's all I have to demo.

 

- Well great, thank you so much Jennison for showing us what the experience is like as a screen reader user. Really fascinating stuff.

 

- No, you're welcome. And I guess for me, the one thing this demonstrated was that, you know, while this was generated using like one of the, vibe coding tools, clearly it didn't hit the mark completely when it comes to accessibility, right? You saw in the other page it was missing the label names, and here, the alternative text descriptions were missing as well. So, there's still some work to be done.

 

- For sure, for sure. And hopefully we're gonna have some AI model companies listening to this, and saying, all right, we're gonna fix this, so-

 

- Absolutely.

 

- Let's see what happens.

 

- Great, thanks Joe. Thanks Eamon.

 

- All right, now let's check on that new collection that we had done. So you might remember, that we had a whole bunch of little errors where, there was a provider return to error. So we retry that, there are 28 different for each category. There are 28 different requests that go out to Kimi K2.5. And then, we had various retries as you can see. But in the end, it all came through, and the stats are, we had 42% retry rate, 12 requests had retries, but in the end it all came through, which is great. And then we had a collection happen. All of this is just technical stuff that's not really important or relevant, and at the end of it we had a success. So I think we can go from here, and let's take a look at the webpage. Here is the leaderboard for the staging site. And just as a reminder, the leaderboard previously showed Kimi K2 at number nine, and now we've got, Kimi K2 lost that number nine spot, Anthropic is back in the top 10, and we can see that it came in 19. But before we're going to push this to production, we have to make sure that it passed the check, because if they're not enough, if the size of the HTML is too small, we find that it's not a fair comparison. So, by clicking through here, you can see on the table, we list the HTML size, and the highest is 58 K, the lowest is 37 K, and our cutoff is 10 K. If it's under 10 K, we will not show it. And then if you look at the actual results, let's take a look at shopping, because that one is commonly difficult to do well. And we click through, we can see that, the biggest problem was just their color contrast. There were 30 different instances of a bad color contrast. And if you click on the actual page, this is what that model looks like. And just scrolling through here, if you can only hear me, what I would just say is that, it actually looks pretty attractive. It's a good looking page, and as long as you deal with the accessibility errors, it definitely looks like a really good contender. But on the footer you can see that there's a whole bunch of color contrast issues and that's why this page got dinged a little bit. And then, another element I'll add here is that, we have a lot of analysis of this page, and some of it is automated, some of it isn't. You can see the Pareto frontier chart. Let's see, it might not have been significant enough to make it in here. Yeah, I don't think that we've displayed it because if we display all the models here, this chart would be two crowded. So we only list the major ones. So some of this we may have to adjust the writing on, but some of this is actually dynamic. So over here you can see Kimi K2.5 is one of the models tested, that's dynamic. But the update history, that has to be done manually. And that's it for the update. And a couple more things to add here. There, we initially showed you the WebAIM page, which inspired some of this work. And when you look at what trips the models up, we re-shared the top issues that WebAIM Million had, where it was low contrast text, missing alt text, missing form labels, et cetera, down to missing document language. And although there are similarities with AIMAC, there are a few differences too. The big dominant one is low contrast text. It was 83% share of violations. And note that there are some minute differences in terms of the comparison, because the WebAIM Million, we are looking at the share of homepage that have more than one or one error in these categories. And on our side on AIMAC, we're trying to see what are the top, the share of violations, what percentage of violations are there. So the methodology is slightly different. But low contrast text is 83% versus 79% for WebAIM Million, missing all text does not even make the list. But empty links on our end came in second most frequent. And it's ten percent of the pages, compared to 45% for WebAIM Million. Then we have missing select labels, which is more prominent for us. Missing document language didn't even make it on our end. And then the rest of them are pretty small, target size too small, missing form labels, ARIA structure labels, all of those are pretty much, you know, 1%, 1.2%. So the biggest elements that the AI models have to fix is absolutely low contrast text and empty links. And they'll be in, you know, much better shape. So I thought that was kind of interesting. And once they solve these, then we can start to work on future benchmarks. And then as an aside, I never seen anybody else do this benchmark, but a lot of people speak about the fact that AI models generate em dashes. And what an em dash is, is you have a regular dash, then you have what's called an en, dash E-N dash, and the reason they came up with this name is, the en, the width of the N is the size of a dash, an en dash. And the em dash is even wider, it's the width of an M. And very often you'll see when somebody does a quotation, they'll do an em dash and then the name of the person. And what people have noticed is that, a lot of AI models insert M dashes very frequently in text. And that is sort of a tell that you're dealing with an AI model written text. And if you wanna say that this text has been written by yourself, that can be a little bit embarrassing. So, the reason we tested it was, we didn't know going in, how do screen reader readers handle it. So we decided to test these pages out, and we did run it against friends of ours who are blind, who are screen reader users. And it turns out it doesn't make much difference, but it's still interesting data. And what's kind of interesting is that the models that did not do as well on accessibility tended to do better on em dash. And so, then the models that did really well on accessibility, like GPT 5.2 Pro was number one on accessibility, but it had 764 em dashes. GPT 5.2 had 696, GPT 5 Mini 395, and then MiniMax, which also did very well. It's almost like the opposite of the accessibility list. So I won't read it in any more detail, but you can look at it at AIMAC.ai, and you can see for yourself which models are good. If you're trying to generate some writing, and you want to make sure that there's no em dashes in it, this might be of use to you. And it's, I found it kind of interesting.

 

- Joe, listen, I have to say, this journey right from the start, from initiation, planting the seed of this idea, to where we're at now, it's been truly, thoroughly enjoyable partnering with you and working with you. Like our overall goal here, is just to raise the level of conformance and ensure that when developers develop, they develop the highest possible conformance code possible. So everyone can benefit from features, functionality, and anything they're doing on the website. So, appreciate your partnership, appreciate your ongoing commitment, and we look forward to taking this to the next level.

 

- Likewise Eamon. It's really been an honor and a pleasure working with you. This would not be possible without you. I keep telling everybody that.

 

- Well, kudos to ServiceNow for supporting this because they've been, again, helping start doing this. So, a huge shout out to ServiceNow. Okay, awesome, thank you Joe.

 

- Thank you Eamon.