Similarweb
Similarweb crawler used for competitive web traffic and SEO analysis.
What does Similarweb do?
Similarweb crawls public web pages to collect content and technical signals for its analytics and intelligence products. The data feeds Similarweb's Site Audit, Web & App Intelligence, Data-as-a-Service APIs, and its AI Search Intelligence suite (AI Brand Visibility, AI Traffic Analysis, AI Traffic Trackers). Platform reports and tools surface clickable links to source domains, which can drive referral traffic back to your site.
Should I allow and optimize for Similarweb to drive organic growth?
Similarweb's platform reports, site profiles, and trackers surface clickable links to source domains. Users browsing competitor analysis or traffic intelligence reports can click through to your site. The AI Search Intelligence suite (AI Brand Visibility, AI Traffic Analysis) also references source domains, giving your site exposure to marketers and SEO practitioners researching your space. Allowing the crawler ensures your site's data is accurately represented in these reports, which can influence how competitors and potential partners perceive your traffic and authority.
Here's how to optimize for Similarweb:
- Allow the similarweb user-agent in your robots.txt to ensure accurate representation in Similarweb reports
- Add a Sitemap directive in robots.txt so the crawler can discover all important pages
- Ensure server response times are fast, since Similarweb does not honor Crawl-delay
- Use clean, descriptive URLs and page titles for better identification in platform reports
- Include structured data to help the crawler extract meaningful technical signals
- Whitelist Similarweb's documented IPs (52.5.118.182, 52.86.188.211) in your WAF if you want to guarantee access
Data Usage & Training
Content crawled by Similarweb is aggregated and used to train and refine internal machine-learning models that power its analytics and product features. Similarweb emphasizes that inputs are aggregated and PII-free, and does not state that raw crawled pages are provided to third parties for LLM pre-training. If you don't want your content included in these pipelines, block the crawler via robots.txt or contact Similarweb Support.
How Similarweb Accesses Content
Here's how Similarweb accesses your site and understands your content:
- Fetches HTML via standard HTTP requests
- Partial JavaScript rendering capability
- Honors robots.txt Allow/Disallow directives and Sitemap references
- Does not support the Crawl-delay directive
- Uses the user-agent substring
similarwebfor identification - Known default crawler IPs include 52.5.118.182 and 52.86.188.211
Site Audit crawls are on-demand or scheduled by users who configure scans. Separately, Similarweb runs continuous automated collection for its public data products and DaaS pipelines, so you may see regular visits independent of any Site Audit activity.
How to Block or Control Similarweb
To block Similarweb via robots.txt:
User-agent: similarweb
Disallow: /
For IP-based blocking, deny traffic from the documented default crawler IPs: 52.5.118.182 and 52.86.188.211. Be aware that regional or additional crawl IPs may differ from these defaults. If robots.txt or IP blocking doesn't fully stop crawls, contact Similarweb Support directly to request cessation. The crawler does not support Crawl-delay, so rate limiting through robots.txt is not an option.
Common Issues & Troubleshooting
Watch out for these common problems when working with Similarweb:
- Cloudflare and other WAF services may block
Similarwebrequests by default; whitelist the documented IPs or match thesimilarwebuser-agent substring in your firewall rules - Robots.txt exclusions can fail if the exact token
similarwebisn't used (e.g., usingSimilarwebwith a capital S) - Regional or stealth crawl IPs may differ from the two documented defaults, making IP-based blocking incomplete
- No Crawl-delay support means you cannot throttle request frequency through robots.txt alone
- Heavy JavaScript-rendered content may only be partially captured due to limited JS rendering
Quick Reference
similarwebUser-agent: similarweb
Disallow: /See which agents visit your site
Monitor real-time AI agent and bot activity on your site for free with Siteline Agent Analytics
Frequently Asked Questions
Similar Agents & Bots
Learn More
Related Resources
Ready to track Similarweb on your site?
Start monitoring agent traffic, understand how AI discovers your content, and optimize for the next generation of search.



