How Well Do AI Agents Understand Top Software Products?

Summary

Most AEO / GEO analysis has focused on the discovery of new products through AI platforms. But the rise of AI agents requires a focus further down the funnel: even after a customer knows about a product, they're turning to agents to ask in-depth follow-ups on feature sets and pricing.

All with highly varying success. Many websites might look great for human visitors, but are inaccessible to agents or don't provide adequate product info (or provide way too much). To understand how prepared today's top software is, we ran repeated simulations with a Claude agent across 100 top B2B products and took in-depth measurements of success rates, costs & efficiency.

In short, for each product we asked our agent to:

“Find the monthly pricing for all publicly listed plans offered by [Product Name] and list the top features of each plan.”

Here are a few of the top findings:

Large cost & efficiency variance between websites. The median agent run took 32 seconds, made 3 searches and fetches and cost $0.24 using Claude Sonnet 4.6. But with a huge spread between products: the top 10% of products ran 2x faster at less than 1/4th the cost of the bottom 10%.
Easy access and concise information was the winning combination. Sites that gave the agent what it needed on the first try made extraction fast and efficient. On poorer-performing sites, the agent had to sift through large amounts of scattered text and ran into technical access issues.
Access barriers lead agents to reference untrustworthy 3rd party info. Nearly 1 in 3 runs had at least one error searching or fetching the site. 25% of those were the agent being denied access entirely. Runs with access errors pulled 58% of content from 3rd party sources vs. just 12% for runs where the agent accessed the website smoothly.
Only 65% of plans surfaced pricing directly. The rest got pushed to a demo or managed sales process, leaving room for the agent to recommend a competitor that does publish its prices.

Background: How do AI agents research product info?

In the past 6 months products like ChatGPT and Claude are becoming increasingly “agentic.” Whereas in 2025, in response to a user prompt they might have referenced the pre-trained LLM and maybe conducted a search or two, today for queries seeking up-to-date product info they will most certainly conduct multiple fan-out searches, page fetches and perhaps other internal tool calls to process what they've found (last Fall's GPT-5 release was a step change up in the likelihood to search).

This change in behavior means that websites need to be accessible and digestible by agents as they visit multiple pages and conduct their research - much like human sessions of the past. But many websites are still built for human audiences meant to keep out agents (via firewalls, access controls etc) or built without agent access in mind e.g. require Javascript or client-side rendering, which most AI agents are not capable of.

As we put our Claude agent through its paces, clear examples emerged of “successful” and “high friction” sessions as it sought product pricing and feature info.

Success was measured by a few key factors:

Accessibility. Could the agent identify and access the company website without issue?
Key information pulled cleanly. Was product and pricing info easily discoverable, agent-readable and structured correctly for extraction?
Token usage, cost and time spent. How expensive and time consuming was it to find the key info? With increasingly complex agents, cost per resolution is rising and businesses are becoming more price sensitive.

Success example: Linear

Linear.app (modern engineering project management) was a consistent example of easy agent access and clear information resulting in a fast and cost-efficient agent session. The agent found what it needed in under 20 seconds for $0.10 without consulting external sources other than Linear's own website:

Fast & Efficient Session: Linear

16.9s · 33k tokens · $0.109 · 2 tool calls

Model

Web Search

Web Fetch

0s 4s 9s 13s 17s

I'll start by searching for Linear's official site

Found pricing URL, fetching directly

Extracting plans, prices & top features

Output: 4 plans with pricing & features

Linear app pricing — 40 KB returned

linear.app/pricing — 5.9 KB · first-party

Quick and straightforward session with one search to locate the pricing page, one fetch to read it. The model parsed 4 plans with pricing and top features in a single pass. No errors or third-party fallbacks.

High friction Example: Zendesk

Zendesk (a leading customer support platform) was a good counter example where the agent struggled. Admittedly, it's an older product with a more complex offering (we controlled for this later in the analysis), but multiple Javascript elements meant the agent couldn't read the page and turned to costly and untrustworthy 3rd party sources.

The result was a 5x longer and more expensive session that potentially poorly represented Zendesk's offerings as it didn't come from their own website.

High Friction Session: Zendesk

53.1s · 159k tokens · $0.510 · 6 tool calls

Model

Web Search

Web Fetch

3rd Party

0s 10s 20s 30s 40s 53s

Searching for Zendesk pricing

Pricing page loaded but table is JS-rendered — no plan data

Compare-plans page unavailable, pivoting to 3rd party

Output from 3rd party sources

Zendesk customer support pricing 2025

Zendesk Suite plans site:zendesk.com

zendesk.com/pricing — JS-rendered, pricing table missing

zendesk.com/pricing/compare-plans — unavailable

hiverhq.com/blog/zendesk-pricing

featurebase.app/blog/zendesk-pricing

Zendesk's pricing page loaded but plan details are JS-rendered and not visible to the agent. After two failed attempts at first-party data, it fell back entirely to third-party blogs, which risk being outdated or inaccurate.

Speed & Cost: Varying results across websites

While the median session ran for ~32 seconds at a reasonable cost of $0.24, there was a 2.2x difference in time taken and 4.2x price difference between the top 10% most efficient runs and bottom 10%. The bigger spread on price was due to costly additional web search tool calls.

I'd note that our simulations were done using Anthropic's mid-tier Sonnet 4.6, but with the latest Opus 4.8 the top 10% of runs would cost close to $1 each! Something to consider as users are becoming more sensitive to token costs as pricing increases.

	Time (s)	Total Tokens	Web Searches	Web Fetches	Total Cost
p10	22.9	42,212	1.0	1.0	$0.142
p25	26.4	46,357	1.0	1.0	$0.156
Median	31.7	73,230	1.0	2.0	$0.236
Mean	35.0	97,388	1.5	2.3	$0.311
p75	40.8	123,045	2.0	3.0	$0.391
p90	50.4	188,977	2.0	4.0	$0.589

Naturally there's a big difference between the product offerings and pricing of a product like Stripe vs. a newer entrant like Linear. But even when we control for the number of product offerings there's still a wide spread between the best and worst performing websites for cost (we only see a 3% drop in variance).

Cost distribution — all runs vs simple products

All runs Simple products only

Access errors and reliance on 3rd party sources

While agents from Claude and ChatGPT are designed to be resilient and make multiple attempts to find information, the error rate in the runs was surprisingly high: 30% of runs had at least 1 error. 25% of the errors were due to 'URL not accessible' or 'URL not allowed', meaning the agent faced bot blocking, broken or non-HTML pages.

In the majority of cases the agent recovered from errors, retried, and was able to access the info on the website. But in 5% of runs it completely abandoned the brand's website and fetched 100% of information from 3rd party sources - which presents the risk of being inaccurate or stale.

Overall, runs with access errors pulled 58% of content from 3rd party sources vs. just 12% for runs where the agent had no issues accessing the companies' website.

3rd party source referencing

No access errors With access errors

Another key factor in agents abandoning websites and turning to 3rd party sources is the prevalence of client-side rendered sites that require Javascript. With the exception of Google, crawlers / agents from top AI platforms e.g. Anthropic and OpenAI do not run Javascript, all HTML must be fetched from the server.

While not technically counted as errors in our dataset, 13% of runs had references in the agent's internal processing mentions of Javascript or rendering issues.

The grand reveal: top and worst performers

Let's get to it. Our agent evaluated 20 in each of 5 prominent B2B SaaS product categories: Productivity/Collaboration, Developer Tools, Marketing/Sales, Customer Support and Analytics/ Data.

In favor of interpretability I'll show the cost of an average run (Sonnet 4.6), the % of content pulled from 3rd party sources and % of runs with access errors - but you could imagine a more robust “agent access score” like we've developed for Siteline. We'll use cost instead of just tokens as it includes tokens + search / fetch tool calls, which tend to be quite expensive. You can find additional metrics by hovering over the bars.

Again, let's caveat that some products are naturally more complex, have bigger offerings and will take more effort for an agent to process. But that doesn't necessarily let them off the hook, it's still more expensive for an agent to understand.

View the full results dashboard