PerplexityBot

Perplexity crawler that indexes the public web.

What does PerplexityBot do?

PerplexityBot crawls and indexes web pages to power Perplexity's AI search and answer product. When users ask questions on Perplexity, the system draws on this index to generate answers with numbered, clickable citations linking back to source pages. Allowing PerplexityBot can drive direct referral traffic to your site through these citation links.

Should I allow and optimize for PerplexityBot to drive organic growth?

Perplexity is one of the fastest-growing AI search products, and every answer it generates includes numbered citation links back to source pages. Allowing PerplexityBot gives your content a chance to appear in these cited answers, driving direct referral traffic. Blocking the bot removes your pages from Perplexity's index entirely (though domain names and headlines may still appear in limited form). If you want visibility in AI-powered search, keeping PerplexityBot allowed is one of the highest-value decisions you can make.

Here's how to optimize for PerplexityBot:

  • Allow PerplexityBot in your robots.txt to ensure full indexing
  • Add a Sitemap directive in robots.txt so PerplexityBot can discover all your pages efficiently
  • Use clear, descriptive page titles and meta descriptions since Perplexity may display these even for blocked pages
  • Include structured data (JSON-LD) to help the crawler understand your content's context and relationships
  • Ensure key content is in the initial HTML rather than loaded entirely via JavaScript, given partial JS rendering support
  • Keep server response times fast to avoid timeouts during crawls
  • Publish authoritative, well-sourced content that Perplexity's ranking system is likely to cite

Data Usage & Training

Content crawled by PerplexityBot is not used to pre-train foundation models. Perplexity states the data is indexed solely to support its search and answer product. Perplexity also says contractual terms prohibit third-party model vendors from training on Perplexity data. A separate agent, Perplexity-User, handles on-demand fetches triggered by individual user queries.

How PerplexityBot Accesses Content

Here's how PerplexityBot accesses your site and understands your content:

  • Fetches HTML via standard HTTP requests using the user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
  • Partial JavaScript rendering support
  • Respects robots.txt Disallow and Allow directives
  • Recognizes Sitemap directives in robots.txt
  • Published IP ranges available at https://perplexity.ai/perplexitybot.json for verification
  • Even when a page is blocked via robots.txt, Perplexity may still index the domain, headline, and a brief factual summary

PerplexityBot crawls continuously to build and refresh its search index. User-initiated fetches happen separately through the Perplexity-User agent when someone asks a question that triggers a live page retrieval.

How to Block or Control PerplexityBot

To block PerplexityBot via robots.txt: User-agent: PerplexityBot Disallow: / For IP-based blocking, use the official IP ranges published at https://perplexity.ai/perplexitybot.json. Verify requests by matching both the user-agent string and source IP against these published prefixes. Be aware that blocking via robots.txt may not fully suppress your site from Perplexity results; the domain, headline, and a brief factual summary can still appear. Crawl-delay is not supported. To block user-initiated fetches (Perplexity-User), you may need authentication or additional access controls since those requests are triggered by real users.

Common Issues & Troubleshooting

Watch out for these common problems when working with PerplexityBot:

  • Occasional IPs fall outside published ranges, making stale IP-based rules ineffective. Update your IP lists regularly from the official JSON endpoint.
  • User-agent string variants have been observed, so UA-only blocking may miss some requests.
  • Blocked pages can still appear in Perplexity results with domain name, headline, and a brief summary.
  • CloudFlare and other WAF services may need custom rules combining both UA and IP matching for reliable blocking.
  • Perplexity-User (the on-demand fetcher) is a separate agent and requires its own robots.txt rules or access controls.
  • Crawl-delay directives are not supported, so you cannot throttle PerplexityBot through robots.txt alone.

Quick Reference

Platform
Agent Category
Growth Value
User Agent String
perplexitybot
robots.txt Entry
User-agent: perplexitybot
Disallow: /

See which agents visit your site

Monitor real-time AI agent and bot activity on your site for free with Siteline Agent Analytics

Get started free

Frequently Asked Questions

Similar Agents & Bots

Learn More

Related Resources

💥 Get started

Ready to track PerplexityBot on your site?

Start monitoring agent traffic, understand how AI discovers your content, and optimize for the next generation of search.