Agent DirectoryGoogleGoogle-Extended

Google-Extended

Opt-in crawler for Google AI model training.

What does Google-Extended do?

Google-Extended is a robots.txt control token that lets publishers decide whether content Google crawls from their site can be used to train Gemini models and for grounding (providing search-index content to models at prompt time). It does not affect whether your site appears in Google Search or influence your search rankings. Downstream products like Gemini Apps and Vertex AI grounding may surface your content with citations, creating a potential path for referral traffic.

Should I allow and optimize for Google-Extended to drive organic growth?

Allowing Google-Extended means your content can be used for Gemini model training and grounding. Grounded responses in Gemini Apps and Vertex AI may cite or link to your content, creating indirect referral traffic. Blocking Google-Extended won't affect your Google Search rankings, but it removes your content from AI-powered features like Gemini's grounded answers. If visibility in Google's AI products matters to you, keep Google-Extended allowed.

Here's how to optimize for Google-Extended:

  • Allow Google-Extended in your robots.txt to participate in Gemini training and grounding
  • Ensure your site is already well-optimized for Googlebot, since Google-Extended shares the same crawl infrastructure
  • Use structured data (JSON-LD) to help Google understand your content's context and entities
  • Include clear, descriptive meta descriptions that summarize page content accurately
  • Add a sitemap.xml and reference it in robots.txt to help Google discover all relevant pages
  • Keep page load times fast to maximize crawl efficiency within Google's crawl budget

Data Usage & Training

Content crawled from your site may be used to train future Gemini models and for grounding in Google's AI products unless you explicitly opt out using the Google-Extended robots.txt token. Blocking Google-Extended only prevents AI training and grounding use. It does not remove your site from Google Search results.

How Google-Extended Accesses Content

Here's how Google-Extended accesses your site and understands your content:

  • Does not use a separate user-agent string; piggybacks on existing Googlebot user-agent strings (e.g., Googlebot/2.1)
  • Fetches pages through Google's standard crawl infrastructure
  • Respects robots.txt Disallow, Allow, and Sitemap directives
  • Does not support Crawl-delay
  • IPs can be verified via reverse DNS/forward-confirmation and Google's published IP ranges

Continuous automated crawling governed by Google's crawl infrastructure and site-specific crawl-rate controls. Pages are queued for crawling and, when applicable, rendering.

How to Block or Control Google-Extended

To block Google-Extended, add this to your robots.txt: User-agent: Google-Extended Disallow: / This prevents your content from being used for Gemini training and grounding, but does not affect your Google Search presence. Because Google-Extended is a robots.txt token and not a distinct HTTP user-agent string, you cannot block it by inspecting user-agent strings in your server logs. For IP-based verification, match request IPs against Google's published ranges at https://www.gstatic.com/ipranges/goog.json and use reverse DNS/forward-confirmation. Be cautious blocking entire Google IP ranges, as this can affect many Google services including Search crawling.

Common Issues & Troubleshooting

Watch out for these common problems when working with Google-Extended:

  • Google-Extended does not appear as a separate user-agent in server logs, making it impossible to identify or block via user-agent string inspection
  • Blocking Google IP ranges to stop Google-Extended can inadvertently block Googlebot and other Google services
  • Crawl-delay directives are ignored by all Google crawlers, including Google-Extended
  • Publishers sometimes confuse Google-Extended with Googlebot and accidentally block their site from Google Search
  • No way to selectively allow grounding while blocking training use (or vice versa) through the Google-Extended token alone

Quick Reference

Platform
Agent Category
Growth Value
User Agent String
google-extended
robots.txt Entry
User-agent: google-extended
Disallow: /

See which agents visit your site

Monitor real-time AI agent and bot activity on your site for free with Siteline Agent Analytics

Get started free

Frequently Asked Questions

Similar Agents & Bots

Learn More

Related Resources

💥 Get started

Ready to track Google-Extended on your site?

Start monitoring agent traffic, understand how AI discovers your content, and optimize for the next generation of search.