The accessibility tree is the cheapest reliable surface your SMB workflow has never been told about.

Every article that ranks for this topic is about web a11y compliance: axe, WAVE, BrowserStack, alt text, color contrast. Useful work, different problem. The same tree (the named, role-tagged view of every interactive element in an app or page) is also the surface a deterministic automation script binds to when the target does not have an API. This is what I install for SMBs before any AI shows up: one extractor per surface (web, macOS, Windows), every action bound to a role plus accessible name, no pixels.

M
Matthew Diakonov
10 min read

Direct answer, verified 2026-05-04 against playwright.dev aria-snapshots, Microsoft Inspect.exe docs, and Apple AXUIElement

How do you actually automate via the accessibility tree?

Pick one extractor per surface, and bind every action in your script to a role plus accessible name pair instead of a pixel coordinate. On the web, Playwright's locator.ariaSnapshot() returns a YAML representation of the accessibility tree, and page.getByRole() binds clicks and types to the named node. On macOS, the system AX API (AXUIElement / NSAccessibility) is reached through Hammerspoon, Atomac, or the macos-use MCP wrapper; each writes one tree node per line as [Role] "name" x:N y:N w:W h:H visible. On Windows, Microsoft UI Automation (UIA) is mapped visually with Inspect.exe and driven from a client like Python uiautomation, FlaUI, or WinAppDriver, every action keyed by Name + ControlType. The same engagement scopes against the published c0nsl rates: a single workflow inside the $500 to $2,000 Small Integration tier, a multi-flow rollout inside the $2,000 to $10,000+ Custom System tier.

Two meanings of “accessibility tree workflow,” and why the interesting one is missing from the public guides

Type the phrase into a search bar and every result on the first page is about WCAG compliance. axe-linter for the IDE. WAVE for browser scans. BrowserStack for accessibility regression tests. The work matters, the target user is the screen-reader visitor, the deliverable is a compliance score. None of those guides are about what an SMB owner actually needs when she asks how to automate a workflow that runs through a desktop accounting tool, a property-management web dashboard, and a clinic intake form.

The tree those automation scripts read is the same tree the screen reader reads. Same data structure, exposed by the same OS-level APIs, with the same role and accessible-name fields. The work that nobody writes about for an SMB audience is using the tree as the substrate for a deterministic automation script. That is what this page is.

What the tree actually looks like, surface by surface

Three real dump samples, one per surface, in the format the platform extractor returns. Read them next to each other and the shape is obviously the same: a hierarchy of nodes, each with a role, an accessible name, a state. The differences are field names and how you ask for the dump. The script binds to the same two fields on every platform: role plus name.

1. Web (Playwright, locator.ariaSnapshot())

Modern browsers expose the accessibility tree to the test runner. Playwright reads it, returns YAML, and gives you locators that bind to the named node. The same surface that powers a screen reader powers the script.

tests/tenant-request.spec.ts

2. macOS (AXUIElement, reached through macos-use or Hammerspoon)

Every Cocoa app exposes its UI through the AX API. The format below is what the macos-use MCP server (a thin wrapper over AXUIElement) writes when you ask it to traverse the active window. Every line is one node: role in brackets, accessible name in quotes, then the on-screen bounding box and a visibility flag. The script does not need the bounding box to act, only to verify; the click is centered on the resolved AX node, not a pixel.

ax-tree-quickbooks.sh

3. Windows (UI Automation, mapped with Inspect.exe and driven by uiautomation)

UIA is the modern Windows accessibility framework, the replacement for MSAA, and Inspect.exe is the visual inspector that ships with the Windows SDK. The two-step pattern is universal: hover the target element with Inspect once to read its Name and ControlType, then bind a UIA client (Python uiautomation, FlaUI, pywinauto's UIA backend) to those fields. The script ignores pixel positions and survives layout changes.

qb_invoice.py

The ten-second decision: which extractor for which workflow

On a scoping call, I draw this on the back of the proposal in roughly fifteen seconds. The target surface decides the extractor; nothing else does. Mixed workflows (a desktop tool plus a SaaS dashboard) wire two extractors and a small queue between them.

extractor by surface

1

Browser app

Playwright ariaSnapshot, getByRole

2

Mac desktop app

AXUIElement via Hammerspoon / macos-use

3

Windows desktop app

UIA via Inspect.exe + uiautomation / FlaUI

4

Mobile native app

Appium UIA2 (Android) / XCUITest (iOS)

ARIA

The accessibility tree of a page is what assistive technology like a screen reader uses to navigate the page. In automated tests, you can use the accessibility tree to assert that the page is structured correctly.

playwright.dev/docs/aria-snapshots, retrieved 2026-05-04

Why pixel matching is the trap, and the accessibility tree is the way out

Every demo of desktop automation looks great in the recording. Single monitor, default DPI, clean window, no Slack popping over the target. The same script on the operator's actual laptop, with two monitors at different scales, breaks the first run. Worse, it fails silently: the script clicks where the button used to be and fires the wrong action. The tree-bound version of the same script resolves the named node fresh on every run; if the target app redesigns the toolbar and moves the button to a different corner, the script still finds AXButton "Save & Close" and keeps working.

The tradeoff is one extra step at authoring time. You run the inspector once (Chrome DevTools accessibility tab for the web, Inspect.exe on Windows, AX dump on macOS), copy the role and name, and write the script against those. That is roughly ten extra seconds per element on the first authoring pass, against months of saved maintenance. Pixel matching only earns its place as a fallback when the target app exposes neither an API nor an accessibility tree, which on modern apps is rare.

Numbers from real engagements

Approximate medians from accessibility-tree workflows I have built for one to fifteen person teams, across web (Playwright) and desktop (UIA, AX) targets. The single number that matters most is the last one: how often the script breaks per quarter, because that decides whether the engagement needs a retainer or just a build.

$0Small Integration floor (single workflow)
0%Of repetitive workflow safe to automate
0 sAuthor overhead per element vs pixel matching
0Breakage per quarter on a tree-bound script

The breakage number is the eye-opener. A tree-bound script that I shipped for a property-management workflow last year broke once in twelve months, on a SaaS update that renamed the "Save" button to "Save and notify." The fix was a one-line edit. A pixel-matched script for the same workflow would have broken on the same release and on every other CSS refresh in between.

How an engagement runs, end to end

Four steps, same shape on every workflow. Steps one and two are usually inside the $75 consult, step three is the build, step four is the part most automation vendors will not scope.

  1. 1

    Surface audit

    Open the target window with the right inspector (DevTools accessibility pane, AX dump, Inspect.exe). If every interactive element has a stable role and accessible name, the workflow is tree-bindable. If half the buttons are unnamed, the build starts with naming them or with a different surface.

  2. 2

    Bind the script

    Author the script against role plus accessible name on every action. No pixel coordinates anywhere. No CSS selectors that depend on layout. The whole script is a list of (role, name, action) triples, plus the data flowing in and out.

  3. 3

    Ship and instrument

    Deploy on a schedule (cron, queue, or event trigger). Log every resolved node with its role, name, and the action result. Logs make breakage debuggable without re-running the workflow with a human watching.

  4. 4

    Maintain on a retainer

    When the target app updates and an accessible name changes, the breakage shows up in the logs as a 'no node found for name X' line. The fix is one accessible-name update. That maintenance is what the $1,000 to $5,000 monthly retainer covers, and only after the build.

Where this fits next to AI, and where AI gets in the way

The accessibility-tree script is the deterministic spine. AI sits on top of it for the parts that need judgement: classify this inbound email so the script knows which branch to take, extract line items from this PDF before the script types them into the invoice, summarize this run for the operator. AI is good at those tasks and a deterministic UIA script is good at clicking the buttons. Wrapping the whole thing in a single computer-use AI agent that re-derives every click each time is currently the slowest, most expensive, and least reliable shape on the market. It also makes the “why did it click that?” question impossible to answer six months later, which matters when the workflow is handling tenant deposits or clinic intake.

That split, deterministic tree-bound script for the eighty percent of the work that is repetitive, AI for the twenty percent that is judgement, hard human escalation for anything emotional, is the same scoping rule I use across the rest of the c0nsl service catalog. It is the rule that survives a release cycle without a new build.

Bring one workflow, leave with the tree

Open the target app on the call, run Inspect.exe (or DevTools accessibility, or an AX dump) on the buttons your team types into every day, and decide on the spot whether the workflow is tree-bindable, what tier on the published rate sheet it sits at, and where the AI part fits.

Frequently asked questions

What does 'accessibility tree workflow' actually mean for an SMB, and why are the existing guides about something different?

Most articles that show up for this topic are about WCAG compliance scanning: axe-linter, WAVE, BrowserStack accessibility testing. That work helps a screen-reader user open your website. It is not the workflow this page is about. The same tree (every UI element on a page or in an app, exposed by role and accessible name) is also the substrate that automation tools read to drive applications that do not have an HTTP API. For an SMB with a desktop app from 2009, a SaaS dashboard with no public endpoint, or a payroll tool that the whole team types into by hand, the accessibility tree is the cheapest reliable layer to automate against. Same data structure, different use case, almost no public guides.

How is the accessibility tree different from the DOM, and why does the difference matter for automation?

The DOM is what the browser parsed from your HTML, every element, attribute, comment, and div regardless of meaning. The accessibility tree is a derived view of the DOM that keeps only what a screen reader would care about: each node has a role (button, link, heading, textbox), an accessible name (the text a screen reader would announce), and a state (checked, disabled, expanded). For automation, that derived view is exactly what you want. Most pages have hundreds of unnamed div elements that drift across releases; they have a small, stable set of buttons and inputs with stable accessible names. Binding to the accessible name (`role=button[name='Submit']`) survives a CSS refactor that would break a CSS selector or an XPath every time.

Which extractor do I use for each surface, and how do I install it?

On the web, Playwright. The library has shipped `locator.ariaSnapshot()` since 1.40 (returns a YAML representation of the accessibility tree from the locator down) and `await expect(locator).toMatchAriaSnapshot(...)` for assertions. On macOS, the system-level AX API (NSAccessibility on the Cocoa side, AXUIElement at the C level). The cheapest way to use it from a script is Hammerspoon (Lua) or Atomac / pyatom (Python), or the macos-use MCP server if the script is being driven by an LLM. On Windows, UI Automation (UIA, the modern replacement for MSAA). The Microsoft Inspect.exe tool from the Windows SDK is the visual inspector; the runtime client is FlaUI, pywinauto's UIA backend, Python uiautomation, or WinAppDriver if the script is driven from Appium. Each pick is the platform default, not an exotic library.

What does the dump format actually look like? Show me the line that an extractor returns.

On the web, Playwright's `ariaSnapshot()` returns YAML where each node looks like `- button "Send"` or `- heading "Pricing" [level=2]`, indented to express containment. On macOS, the macos-use MCP server (and most Lua / Python wrappers around AXUIElement) writes one element per line in the format `[Role] "name" x:N y:N w:W h:H visible`, where Role is the AX role (AXButton, AXTextField, AXMenuItem), the quoted string is the accessible name, and the four numbers are the on-screen bounding box. On Windows, Inspect.exe shows each node with a Name, ControlType, and BoundingRectangle; UIA clients return the same fields as a tree of objects. The shape is the same on all three platforms, the field names differ. An automation script binds to (role, name), not to (x, y).

Why does pixel matching keep showing up as the default in tutorials if the accessibility tree is right there?

Two reasons, both fixable. First, the tutorials are written for a hobbyist demo on a single clean machine, where pixel matching looks fine because the screen never changes. The minute the script runs on the operator's laptop with two monitors at different scales, with Slack notifications popping over the target window, the pixel match fails silently and clicks the wrong button. Second, pixel matching feels easier to author because you can record a clip with a screenshot tool and replay it, while the accessibility tree requires one extra step (run Inspect.exe, find the role and name) before writing the script. The extra step pays for itself the first time the target app updates and the button moves.

What is the actual cost of a fixed-scope accessibility-tree automation engagement on the c0nsl rates?

A single workflow with one input source, one output system, and a small set of branches lands inside the $500 to $2,000 Small Integration tier. A multi-flow project that mixes web (Playwright) and desktop (UIA or AX) automations, with audit logging and a recovery path for when the target app updates its accessible names, sits inside the $2,000 to $10,000+ Custom System tier. Ongoing maintenance is the part most automation vendors will not put a number on; on c0nsl that is the $1,000 to $5,000 per month retainer, and it is only offered after the initial build. The rates are published on the homepage and the same number on the discovery call as on the kickoff call.

Where does AI fit in this stack, and where does it absolutely not fit?

AI fits in two places: as the classifier that decides which branch the deterministic accessibility-tree script should take (categorize this email, extract these fields, judge this listing), and as the operator-facing layer that explains the run in plain language. AI does not fit as the thing clicking the actual buttons. Computer-use agents that drive the screen by reading pixels and re-deciding every step are still an order of magnitude more expensive per run than a deterministic UIA or AX script and they hallucinate clicks on long sequences. The right pattern is the same eighty-twenty split that the rest of the c0nsl service catalog uses: AI for the parts that need judgement, accessibility-tree scripts for the parts that need reliability.

Does using the accessibility tree for automation count as breaking the terms of service of a SaaS app?

It depends on the app and its terms, but the accessibility tree is part of the operating system's public API on every supported platform: it exists so screen readers and assistive technology can drive any app a user can drive. Reading the tree is not different in kind from a person operating the same window. That said, automating actions on a third-party SaaS account at scale crosses into territory many vendors restrict in their terms (rate limits, no scripted access, no shared credentials). The right move on a scoping call is to read the actual terms of the target app first, sign the engagement only on the workflows where the SaaS vendor's policy permits operator-driven automation, and document the boundary in the proposal so nobody is surprised six months in.