Question 1

Does Similarweb respect robots.txt?

Accepted Answer

Yes. Similarweb honors User-agent, Allow, Disallow, and Sitemap directives. It does not support the non-standard Crawl-delay directive. Use the token similarweb (lowercase) in your robots.txt rules.

Question 2

Does Similarweb use my content for AI training?

Accepted Answer

Similarweb aggregates crawled data and uses it to train internal machine-learning models for its analytics products. It does not provide raw crawled pages to third parties for LLM pre-training. The company states that inputs are aggregated and PII-free.

Question 3

How do I verify that a request is actually from Similarweb?

Accepted Answer

Check the user-agent string for the substring similarweb and compare the source IP against the documented defaults (52.5.118.182 and 52.86.188.211). No reverse-DNS verification method is documented. Contact Similarweb Support if you need further confirmation.

Question 4

Can I block Similarweb from crawling specific pages?

Accepted Answer

Yes. Use targeted Disallow rules under User-agent: similarweb in your robots.txt. For example, Disallow: /private/ will block crawling of that directory while allowing access to the rest of your site.

Question 5

Why am I still seeing Similarweb crawls after adding robots.txt rules?

Accepted Answer

Make sure you're using the exact lowercase token similarweb. Some crawl IPs may differ from the two documented defaults, so IP-based blocking alone may miss requests. If exclusions still fail, contact Similarweb Support directly to request removal.

Question 6

Does blocking Similarweb affect my site's visibility in their reports?

Accepted Answer

Blocking the crawler may result in incomplete or inaccurate data about your site in Similarweb's platform. Competitors and potential partners who use Similarweb to research your traffic and authority would see less reliable information. If accurate representation matters to you, allow the crawler.

Similarweb

What does Similarweb do?

Should I allow and optimize for Similarweb to drive organic growth?

Data Usage & Training

How Similarweb Accesses Content

How to Block or Control Similarweb

Common Issues & Troubleshooting

Quick Reference

See which agents visit your site

Frequently Asked Questions

Similar Agents & Bots

AhrefsBot

AhrefsSiteAudit

Barkrowler

ClarityBot

Learn More

How well do AI agents understand top software products? An in depth benchmark analysis

The Rise of Claude Code Web Agents

How Websites Will Need to Adapt for Their New Agentic Visitors

Understanding User Intent Through AI Agent & Bot Traffic

Related Resources

Agent Readiness Check

AI Agent Directory

Case Studies

Blog

Ready to track Similarweb on your site?