Google-Extended
Opt-in crawler for Google AI model training.
What does Google-Extended do?
Google-Extended is a robots.txt control token that lets publishers decide whether content Google crawls from their site can be used to train Gemini models and for grounding (providing search-index content to models at prompt time). It does not affect whether your site appears in Google Search or influence your search rankings. Downstream products like Gemini Apps and Vertex AI grounding may surface your content with citations, creating a potential path for referral traffic.
Should I allow and optimize for Google-Extended to drive organic growth?
Allowing Google-Extended means your content can be used for Gemini model training and grounding. Grounded responses in Gemini Apps and Vertex AI may cite or link to your content, creating indirect referral traffic. Blocking Google-Extended won't affect your Google Search rankings, but it removes your content from AI-powered features like Gemini's grounded answers. If visibility in Google's AI products matters to you, keep Google-Extended allowed.
Here's how to optimize for Google-Extended:
- Allow Google-Extended in your robots.txt to participate in Gemini training and grounding
- Ensure your site is already well-optimized for Googlebot, since Google-Extended shares the same crawl infrastructure
- Use structured data (JSON-LD) to help Google understand your content's context and entities
- Include clear, descriptive meta descriptions that summarize page content accurately
- Add a sitemap.xml and reference it in robots.txt to help Google discover all relevant pages
- Keep page load times fast to maximize crawl efficiency within Google's crawl budget
Data Usage & Training
Content crawled from your site may be used to train future Gemini models and for grounding in Google's AI products unless you explicitly opt out using the Google-Extended robots.txt token. Blocking Google-Extended only prevents AI training and grounding use. It does not remove your site from Google Search results.
How Google-Extended Accesses Content
Here's how Google-Extended accesses your site and understands your content:
- Does not use a separate user-agent string; piggybacks on existing
Googlebotuser-agent strings (e.g.,Googlebot/2.1) - Fetches pages through Google's standard crawl infrastructure
- Respects robots.txt Disallow, Allow, and Sitemap directives
- Does not support Crawl-delay
- IPs can be verified via reverse DNS/forward-confirmation and Google's published IP ranges
Continuous automated crawling governed by Google's crawl infrastructure and site-specific crawl-rate controls. Pages are queued for crawling and, when applicable, rendering.
How to Block or Control Google-Extended
To block Google-Extended, add this to your robots.txt:
User-agent: Google-Extended
Disallow: /
This prevents your content from being used for Gemini training and grounding, but does not affect your Google Search presence. Because Google-Extended is a robots.txt token and not a distinct HTTP user-agent string, you cannot block it by inspecting user-agent strings in your server logs. For IP-based verification, match request IPs against Google's published ranges at https://www.gstatic.com/ipranges/goog.json and use reverse DNS/forward-confirmation. Be cautious blocking entire Google IP ranges, as this can affect many Google services including Search crawling.
Common Issues & Troubleshooting
Watch out for these common problems when working with Google-Extended:
Google-Extendeddoes not appear as a separate user-agent in server logs, making it impossible to identify or block via user-agent string inspection- Blocking Google IP ranges to stop
Google-Extendedcan inadvertently blockGooglebotand other Google services - Crawl-delay directives are ignored by all Google crawlers, including
Google-Extended - Publishers sometimes confuse
Google-ExtendedwithGooglebotand accidentally block their site from Google Search - No way to selectively allow grounding while blocking training use (or vice versa) through the
Google-Extendedtoken alone
Quick Reference
google-extendedUser-agent: google-extended
Disallow: /See which agents visit your site
Monitor real-time AI agent and bot activity on your site for free with Siteline Agent Analytics
Frequently Asked Questions
Similar Agents & Bots
Learn More
Related Resources
Ready to track Google-Extended on your site?
Start monitoring agent traffic, understand how AI discovers your content, and optimize for the next generation of search.



