Benchmarking accessibility using Fable

When it comes to benchmarking and reporting on your accessibility efforts, you have options! In this article, I’ll share several approaches you can take to benchmark your accessibility efforts, all while leveraging Fable Crowdtesting. You can pick one approach or combine several of them for a more comprehensive view of accessibility at your organization. Let’s walk through what types of Fable requests to use and how to use what you learn in each approach. 

A black woman in a professional setting video conferences while signing into the computer.
Illustration style avatar of a blond woman wearing a light green V-neck shirt.

Kate Kalcevich, Head of Services

If you don’t see improvements, it’s a sign that you need to either re-prioritize or pivot your approach to accessibility.

Benchmarking approaches

When it comes to reporting on accessibility progress over time, you may want to adopt one or more of the following approaches:

  1. Set a baseline and benchmark over time: Your digital product’s current accessibility (and re-evaluate every 3-6 months for smaller organizations or every 6-12 months for larger ones to track progress over time).
  2. Measure usability: The usability of your digital products for people with disabilities against usability metrics for all users.
  3. Accessibility for various device: The accessibility of products across various devices – large screen devices like desktops or laptops versus various smartphones.
  4. Accessibility for various platforms: The accessibility of your content on various platforms – website, iOS, Android, blogs/social, etc.
  5. Assistive technology: The accessibility of your digital products for different types of assistive technology – screen readers, screen magnifiers, alternative navigation, etc.

1. Benchmarking accessibility over time

For approach #1, Fable’s Compatibility Tests are a good way to measure the accessibility of a digital product. This is a simple benchmark to establish: determine the critical tasks for all your products, run Compatibility Tests for each one, and retest the exact same tasks on a regular basis to identify changes in accessibility, ease of use, and task completion.

This approach can show you how accessibility is improving over time – demonstrating if your internal accessibility processes are effective. If you don’t see improvements, it’s a sign that you need to either re-prioritize or pivot your approach to accessibility.

Line graph showing accessibility of three products over time. Products A and B improve and Product C decreases after a Q3 release.

2. Benchmarking usability for people with disabilities versus usability for all users

Approach #2 requires that you already have or can set up a general usability metric. If you use the System Usability Scale (SUS) to measure usability, data is usually collected via a survey that pops-up at random on your website.

You can use Fable’s new Accessible Usability Scale (AUS) to evaluate key tasks and compare the results to your SUS scores. Of course, your SUS scores will likely include some users with assistive technology, but it’s a qualitative metric so statistical accuracy is not the goal. To learn more about this methodology, you can learn more about how we built the AUS.

Another usability metric to look at, which is more quantitative, is task completion. This data is also typically collected via a survey, that pops up after a key task is completed. Fable’s Compatibility Tests include task completion data for 5 users. You can count the number of successful task completions, average those numbers and compare them to overall task completion scores for all users.

This is a good approach for evaluating if you are truly being inclusive as an organization. If you have significantly worse usability for users of assistive technology compared to all users, that’s a sign that you need to make sure that accessibility is prioritized equally to general user experience within your organization.

Bar graph showing usability for all users remaining fairly consistent with a small dip in Q3 and usability for assistive technology users increasing from Q1 to Q4.

3. Benchmarking across devices

The third approach can also be done using Fable’s Compatibility Tests. Identify which tasks you want to benchmark, run compatibility tests for laptop / desktop users, and then run the same tasks for mobile users. Then compare the differences in accessibility, ease of use, and task completion across different devices.

This approach can help you identify if you have a significant difference in accessibility for different devices and can be compared to your site analytics for visitor devices. For example, if 80% of visitors to your website are using a mobile device and mobile testing shows poor accessibility, you’d want to prioritize improving the mobile experience over the laptop and desktop experience.

Bar graph showing 80% accessibility for laptop and desktop users and 60% accessibility for mobile users.
Bar graph showing 80% of site visitors use mobile and 20% use desktop or laptop

4. Benchmarking across platforms

For the fourth approach, identify which platforms you want to benchmark (for example, website, iOS app, Android app, blogs/social etc.) and choose tasks that are the same or very similar on each platform. For example, looking up contact information. Use the Compatibility Test request type for Laptop / Desktop, along with Mobile custom audiences for iOS and Android to segment your benchmarking.

This data can help you to prioritize which platform to tackle first. For example, if your website is pretty close to accessible, it might require minimal effort to just finish the job and make it fully accessible. This approach prioritizes your internal resources. Conversely, if your iOS app is very inaccessible, you may want to start by tackling the biggest barriers there, because that will have the biggest benefit for your users.

Bar graph showing 96% accessibility for website and 60% accessibility for native app

5. Benchmarking assistive technology

For the last approach (#5) you can again use Compatibility Tests – but this time, separate out the data for each tester and compare the screen readers (average the 3 users) against the alternative navigation (1 user) and magnification (1 user) scores. You can look at any dimension – task completion, accessibility, ease of use, or even recommendation. You can also create requests that consist of all screen reader users and all magnification users.

This data can be helpful in prioritizing accessibility fixes by assistive technology. For example, you can fix the entire journey for a magnification user first, and then tackle all screen reader issues and afterwards alternative navigation. That allows you to create a fully accessible journey for one type of assistive technology, instead of tackling accessibility issues by priority, effort or another approach. At the end of the day, a partially accessible experience isn’t really an accessible experience at all. It’s important to aim for complete experiences.

Bar graph showing task completion for screen magnifiers is high and task completion for screen readers and alternative navigation is low.

Data evaluation

To evaluate all the data that you get from the various reports, use the Export to Excel feature of the Fable platform. Combine the data from various tests into one master spreadsheet by importing each test as a new sheet. You can then use Excel functions to evaluate the data.

If you’re benchmarking with completion rates, it’s pretty simple to calculate the number of users who completed a task as a percentage. One user out of 5 is 20%, two is 40%, three 60%, four 80% and all five is 100% If no users complete the task, that’s 0%.

When working with the accessibility and ease of use scores, you may want to use a different approach. Fable calculates the percentage of users who scored accessibility as “great” or “satisfactory” and ease of use as “very easy” or “easy”. These are your positive user experiences. However, you can’t just average them if you want to be accurate.

Here’s an example. Five friends go to a movie and three loved it and two hated it. This graph shows what that would look like as a percentage.

Bar graph showing 60% of the friends loved the movie and 40% of the friends hated it.

We can’t just average that to 50% (60 + 40 / 2).  That doesn’t accurately reflect the friend’s real experience. None of them thought the movie was “okay.”

You might want to use something called the top-two box model instead. If you’re looking at user satisfaction, you take the top two scores and add them up. On a scale of 1 to 5, how many users rated something as 4 or 5? Or, in Fable’s terms, how many users rated the accessibility as great or satisfactory?

Back to the movie example, this would be 60% using the top two box model. If everyone loved it, it would be 100%. It’s also 100% if 60% loved it and 40% liked it.

Bar graph showing 60% of the friends loved the movie and 20% of the friends hated it.

Summary

Here’s a high-level overview of the different benchmarking approaches, which Fable request types to use, and how to use the data.

Benchmarking approach Fable request type What to test Technology considerations How to use the data
Over time Compatibility Test all tasks for all products Mixed users Confirm if approach to accessibility is working
Across user groups Compatibility or AUS Test key tasks for key products Mixed users Confirm if accessibility has equal priority with UX
Across devices Compatibility Test the same tasks on the same platform Separate tests for Laptop / Desktop and Mobile users (responsive website) Prioritize which devices to improve accessibility on first
Across platforms Compatibility Test the same tasks on different platforms Separate tests for Laptop / Desktop and Mobile users (native apps) Prioritize which platforms to improve accessibility on first
Across assistive technology Compatibility Test key tasks for key products Separate data for users within each test Prioritize which assistive technology to support first