Does ExaBot respect robots.txt?

Exa publishes Disallow and Sitemap directives in its own robots.txt, indicating awareness of the standard. To block ExaBot, add User-agent: exabot with Disallow: / to your robots.txt. There is no public documentation on Crawl-delay support.

Does ExaBot use my content for AI training?

This is unclear. Exa's terms grant broad rights to use content for providing and improving services, but no public policy explicitly states that crawled pages train AI models. Contact hello@exa.ai if you need a definitive answer.

Can ExaBot drive traffic to my site?

Yes. Exa Search and the Exa Search API return results with citation links and source URLs. When your content is indexed and surfaces in a query, users and AI agents see a direct link back to your page.

How do I verify that a request is actually from ExaBot?

Exa does not publish IP ranges or a formal verification method. Check for the exabot substring in the user-agent header. For more reliable verification, contact hello@exa.ai to discuss allowlisting or IP confirmation.

How often does ExaBot crawl my site?

Exa describes its indexing system as continuously updated for near-real-time retrieval. Exact crawl frequency is not documented, but expect ongoing activity rather than infrequent batch crawls.

What is the full user-agent string for ExaBot?

Exa has not published a canonical full user-agent string. The known identifier substring is exabot. Filter on this substring when writing robots.txt rules or application-level blocks.

Agent Directory ExaExaBot

ExaBot

AI User Initiated

Exa agent fetching pages at user request for search and research.

What does ExaBot do?

ExaBot is the web crawling agent used by Exa to fetch pages, extract structured content and dense highlights, and build the indexes behind Exa Search and the Exa Search API. These indexes power search results, agent-facing products like exa-code (web context for coding agents), and various integrations that provide grounded context to AI agents. Exa's search results include citation links and source URLs, so allowing ExaBot can drive referral traffic back to your site.

Should I allow and optimize for ExaBot to drive organic growth?

ExaBot feeds Exa Search and the Exa Search API, both of which return results with citation links and source URLs. When AI agents or users query Exa, your content can appear as a grounded source with a direct link back to your site. Exa is increasingly used as a context layer for AI coding agents and research tools, giving your content exposure across a growing ecosystem of AI-powered products. Allowing ExaBot keeps your pages indexed and eligible to appear in these results.

Here's how to optimize for ExaBot:

Allow exabot in your robots.txt to ensure your pages are indexed by Exa
Use clean, semantic HTML so ExaBot can extract structured highlights effectively
Include descriptive meta titles and descriptions to improve how your content is summarized
Add structured data (JSON-LD) to help ExaBot understand page content and relationships
Keep important content in the initial HTML rather than loading it entirely via JavaScript
Ensure fast server response times to avoid timeouts during crawls

Data Usage & Training

It is unclear whether content crawled by ExaBot is used to train AI models. Exa's public terms grant broad rights to use ingested content to provide and improve services, which could include model development. However, no explicit public policy confirms or denies training use. If this concerns you, contact [email protected] for clarification.

How ExaBot Accesses Content

Here's how ExaBot accesses your site and understands your content:

Fetches HTML pages via standard HTTP requests
Extracts dense, token-efficient highlights and structured content from pages
Identifies itself with the user-agent substring exabot
No published IP ranges or canonical full user-agent string
JavaScript rendering capability is unknown

Not explicitly documented, but Exa's product materials describe a continuously updated indexing system designed for near-real-time retrieval. Expect ongoing crawl activity rather than scheduled batches.

How to Block or Control ExaBot

To block ExaBot via robots.txt, add a rule targeting its user-agent substring: User-agent: exabot Disallow: / Exa does not publish a dedicated robots.txt token beyond exabot, and their own robots.txt uses User-agent: * rules. If you need finer control, filter requests by the exabot user-agent substring at the application or WAF layer. Exa does not publish IP ranges, so IP-based blocking is unreliable. For verification or allowlist requests, contact [email protected].

Common Issues & Troubleshooting

Watch out for these common problems when working with ExaBot:

No published IP ranges makes IP-based blocking or allowlisting unreliable
Exa's robots.txt uses User-agent: * rather than a dedicated token, so site owners must add explicit exabot rules
If the crawler rotates user agents or uses multiple fetchers, simple UA-based blocking may not catch all requests
JavaScript-rendered content may not be fully indexed since ExaBot's rendering capabilities are unknown
No documented Crawl-delay support, so rate limiting may require server-side configuration