This site is for demonstration purposes and has been scrubbed for proprietary data. Return to dmbaughman.com

Enterprise Search Infrastructure and Product Information

Enterprise Search is powered by multiple Google Search Appliances. Two primary production appliances support up to 15 million documents and are geographically distributed, residing in both St. Louis and Bellevue data centers. A load balancer sits in front of these appliances and directs traffic. Each appliance can handle up to 30 queries per second and both appliances crawl the intranet independent of each other.

There are also multiple development appliances and one pre-prod appliance. Development appliances support of to 500,000 documents. The pre-prod environment is a legacy production appliance that can index up to 30 million documents, albeit, on less capable hardware than production.


Enterprise Search Architecture



Each search appliance has its own crawler, which will appear in web server logs. As a webmaster, it is good practice to remove crawler requests from your web logs. Crawler traffic can artificially inflate the number of visits to your website. The following is information identifying all of our crawlers on the intranet:

Address Name Environment Crawl IP User Agent Name
hostname.xxx.boeing.com DEV xxx.xxx.73.225 googlebot-dev
hostname1.xxx.boeing.com DEV xxx.xxx.73.126 googlebot-dev1
hostname2.xxx.boeing.com DEV xxx.xxx.73.184 googlebot-dev2
hostname3.xxx.boeing.com DEV xxx.xxx.72.221 googlebot-devthree
googlebeta.xxx.boeing.com BETA xxx.xxx.76.210 googlebot-beta
hostname4.xxx.boeing.com PROD xxx.xxx.17.18 googlebot-GW4
hostname6.xxx.boeing.com PRE-PROD xxx.xxx.200.15 googlebot-GW6
hostname7.xxx.boeing.com PRE-PROD xxx.xxx.192.69 googlebot-GWSeven
hostname8.xxx.boeing.com PROD xxx.xxx.16.35 googlebot-GWEight