Enterprise Search Infrastructure and Product Information

Enterprise Search is powered by multiple Google Search Appliances. Two primary production appliances support up to 15 million documents and are geographically distributed, residing in both St. Louis and Bellevue data centers. A load balancer sits in front of these appliances and directs traffic. Each appliance can handle up to 30 queries per second and both appliances crawl the intranet independent of each other.

There are also multiple development appliances and one pre-prod appliance. Development appliances support of to 500,000 documents. The pre-prod environment is a legacy production appliance that can index up to 30 million documents, albeit, on less capable hardware than production.

Enterprise Search Architecture

Each search appliance has its own crawler, which will appear in web server logs. As a webmaster, it is good practice to remove crawler requests from your web logs. Crawler traffic can artificially inflate the number of visits to your website. The following is information identifying all of our crawlers on the intranet:

Address Name Environment Crawl IP User Agent Name DEV googlebot-dev DEV googlebot-dev1 DEV googlebot-dev2 DEV googlebot-devthree BETA googlebot-beta PROD googlebot-GW4 PRE-PROD googlebot-GW6 PRE-PROD googlebot-GWSeven PROD googlebot-GWEight