Does PerplexityBot use my content to train AI models?

No. Perplexity states that PerplexityBot indexes content to power its search and answer product, not to pre-train foundation models. Contractual terms also prohibit third-party model vendors from training on Perplexity's data.

Does PerplexityBot respect robots.txt?

Yes. PerplexityBot respects Disallow and Allow directives and recognizes Sitemap entries. However, even blocked pages may still appear in Perplexity results with limited information like the domain name and headline.

How do I verify that a request is actually from PerplexityBot?

Match the request's source IP against the official IP prefixes published at https://perplexity.ai/perplexitybot.json, and confirm the user-agent string contains PerplexityBot. Combining both checks is the most reliable approach.

What is the difference between PerplexityBot and Perplexity-User?

PerplexityBot continuously crawls the web to build Perplexity's search index. Perplexity-User fetches pages on demand when a user asks a question that requires live retrieval. They are separate agents with separate robots.txt tokens.

Can PerplexityBot drive traffic to my site?

Yes. Perplexity's answers include numbered citation links that users can click to visit source pages. Allowing PerplexityBot to index your content is the first step to appearing in these cited results.

Does PerplexityBot support Crawl-delay?

No. Crawl-delay is not supported according to available documentation. If you need to limit crawl rate, consider IP-based rate limiting at the server or WAF level.

Will blocking PerplexityBot completely remove my site from Perplexity?

Not entirely. Perplexity may still display your domain name, page headline, and a brief factual summary even for blocked pages. Full removal requires additional steps beyond robots.txt.

Agent Directory PerplexityPerplexityBot

PerplexityBot

AI Search Index

Perplexity crawler that indexes the public web.

What does PerplexityBot do?

PerplexityBot crawls and indexes web pages to power Perplexity's AI search and answer product. When users ask questions on Perplexity, the system draws on this index to generate answers with numbered, clickable citations linking back to source pages. Allowing PerplexityBot can drive direct referral traffic to your site through these citation links.

Should I allow and optimize for PerplexityBot to drive organic growth?

Perplexity is one of the fastest-growing AI search products, and every answer it generates includes numbered citation links back to source pages. Allowing PerplexityBot gives your content a chance to appear in these cited answers, driving direct referral traffic. Blocking the bot removes your pages from Perplexity's index entirely (though domain names and headlines may still appear in limited form). If you want visibility in AI-powered search, keeping PerplexityBot allowed is one of the highest-value decisions you can make.

Here's how to optimize for PerplexityBot:

Allow PerplexityBot in your robots.txt to ensure full indexing
Add a Sitemap directive in robots.txt so PerplexityBot can discover all your pages efficiently
Use clear, descriptive page titles and meta descriptions since Perplexity may display these even for blocked pages
Include structured data (JSON-LD) to help the crawler understand your content's context and relationships
Ensure key content is in the initial HTML rather than loaded entirely via JavaScript, given partial JS rendering support
Keep server response times fast to avoid timeouts during crawls
Publish authoritative, well-sourced content that Perplexity's ranking system is likely to cite

Data Usage & Training

Content crawled by PerplexityBot is not used to pre-train foundation models. Perplexity states the data is indexed solely to support its search and answer product. Perplexity also says contractual terms prohibit third-party model vendors from training on Perplexity data. A separate agent, Perplexity-User, handles on-demand fetches triggered by individual user queries.

How PerplexityBot Accesses Content

Here's how PerplexityBot accesses your site and understands your content:

Fetches HTML via standard HTTP requests using the user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Partial JavaScript rendering support
Respects robots.txt Disallow and Allow directives
Recognizes Sitemap directives in robots.txt
Published IP ranges available at https://perplexity.ai/perplexitybot.json for verification
Even when a page is blocked via robots.txt, Perplexity may still index the domain, headline, and a brief factual summary

PerplexityBot crawls continuously to build and refresh its search index. User-initiated fetches happen separately through the Perplexity-User agent when someone asks a question that triggers a live page retrieval.

How to Block or Control PerplexityBot

To block PerplexityBot via robots.txt: User-agent: PerplexityBot Disallow: / For IP-based blocking, use the official IP ranges published at https://perplexity.ai/perplexitybot.json. Verify requests by matching both the user-agent string and source IP against these published prefixes. Be aware that blocking via robots.txt may not fully suppress your site from Perplexity results; the domain, headline, and a brief factual summary can still appear. Crawl-delay is not supported. To block user-initiated fetches (Perplexity-User), you may need authentication or additional access controls since those requests are triggered by real users.

Common Issues & Troubleshooting

Watch out for these common problems when working with PerplexityBot:

Occasional IPs fall outside published ranges, making stale IP-based rules ineffective. Update your IP lists regularly from the official JSON endpoint.
User-agent string variants have been observed, so UA-only blocking may miss some requests.
Blocked pages can still appear in Perplexity results with domain name, headline, and a brief summary.
CloudFlare and other WAF services may need custom rules combining both UA and IP matching for reliable blocking.
Perplexity-User (the on-demand fetcher) is a separate agent and requires its own robots.txt rules or access controls.
Crawl-delay directives are not supported, so you cannot throttle PerplexityBot through robots.txt alone.