Amazon Kendra

Enterprise search indexing crawler for Amazon Kendra.

What does Amazon Kendra do?

Amazon Kendra Web Crawler fetches and indexes web pages so AWS customers can build searchable enterprise indexes and retrieval-augmented generation (RAG) pipelines. Indexed content feeds into Amazon Kendra, Amazon Q Business, and AWS Bedrock knowledge-base integrations. Applications built on these services can surface clickable citations linking back to your source pages, creating a referral traffic path.

Should I allow and optimize for Amazon Kendra to drive organic growth?

Amazon Kendra powers enterprise search and AI-assisted answers across AWS products including Amazon Q Business and Bedrock knowledge bases. Applications built on Kendra can return your pages as cited sources with clickable links (via DocumentURI), driving direct referral traffic. Blocking the crawler removes your content from these enterprise search indexes entirely. If your audience includes organizations using AWS-powered internal search or customer-facing AI tools, allowing Amazon Kendra gives your content a path into those results.

Here's how to optimize for Amazon Kendra:

  • Allow amazon-kendra in your robots.txt to ensure your pages are indexed
  • Use clean, semantic HTML so the crawler can extract structured content after JavaScript rendering
  • Include descriptive page titles and meta descriptions for better search relevance in Kendra indexes
  • Add structured data (JSON-LD) to help downstream applications understand your content type
  • Ensure your server can handle scheduled crawl bursts without triggering rate limits
  • Whitelist the amazon-kendra user-agent in any WAF or bot-protection rules

Data Usage & Training

Crawled content is indexed into the customer's Amazon Kendra index for search and retrieval, including RAG and generative AI use cases. Whether crawled content is also used to train broader AWS machine learning models is unclear from Kendra's documentation. You can block the crawler via robots.txt if you want to prevent indexing entirely.

How Amazon Kendra Accesses Content

Here's how Amazon Kendra accesses your site and understands your content:

  • Fetches HTML via standard HTTP requests
  • Renders JavaScript fully before indexing content
  • Respects robots.txt Disallow and Allow directives for the amazon-kendra token
  • Uses user-agent strings matching patterns like amazon-kendra and amazon-kendra-web-crawler-*
  • Customer-specific variants may appear as amazon-kendra-customer-id-[id]
  • Supports sitemap and seed URL configuration through crawler settings

Crawl frequency is customer-controlled. AWS customers configure sync schedules in their Kendra data source settings, starting with an initial full sync followed by scheduled full or incremental syncs. Rate and throttling options are also configurable per data source.

How to Block or Control Amazon Kendra

To block Amazon Kendra via robots.txt: User-agent: amazon-kendra Disallow: / You can also use granular Disallow rules to block specific paths. For IP-based blocking, AWS publishes its global IP ranges at https://ip-ranges.amazonaws.com/ip-ranges.json, but there is no Kendra-specific range, so blocking by IP may affect other AWS services. Crawl-delay is not documented as supported. If you suspect abusive crawl behavior, contact AWS Support directly.

Common Issues & Troubleshooting

Watch out for these common problems when working with Amazon Kendra:

  • WAF rules or bot protection services may return 403 errors to the crawler by default
  • User-agent-based blocking can trigger if your site blocks unrecognized bot tokens
  • IP whitelisting is difficult because AWS does not publish a Kendra-specific IP range
  • Customer-specific user-agent variants (amazon-kendra-customer-id-[id]) may not match a single robots.txt rule targeting just amazon-kendra
  • Rate limiting on your server can interfere with scheduled sync bursts from Kendra customers

Quick Reference

Platform
Agent Category
Growth Value
User Agent String
amazon-kendra
robots.txt Entry
User-agent: amazon-kendra
Disallow: /

See which agents visit your site

Monitor real-time AI agent and bot activity on your site for free with Siteline Agent Analytics

Get started free

Frequently Asked Questions

Similar Agents & Bots

Learn More

Related Resources

💥 Get started

Ready to track Amazon Kendra on your site?

Start monitoring agent traffic, understand how AI discovers your content, and optimize for the next generation of search.