July 10, 2010 Derek Lackey
from Nancy Gohring, IDG News
Google has introduced a new Web indexing system to provide users with more up-to-date search results, the company said Tuesday. The new system, called Caffeine, delivers results that are closer to “live” than Google’s previous system, the company said.
Previously, Google would crawl a fraction of the Web each night, index it and push it out in its results. With Caffeine, as Google crawls the Web and finds new information, it indexes it immediately. “We process it immediately so we can serve it seconds later,” said Matt Cutts, the head of Google’s webspam team. He unveiled the news at the Search Marketing Expo in Seattle.
When Google started, it would update its index only every four months, he said. Around 2000, it started indexing every month in a process that took a week to 10 days. “The funny thing is, we didn’t have enough capacity to update all our data centers at once,” he said. That meant that people might get different results when searching for the same term if they were hitting different Google data centers.
Google Caffeine went live “in the last few days” and is now being used in all Google data centers, he said.
In addition to serving “fresher” results, Caffeine “massively increases our ability to scale up,” Cutts said. The company will be able to index many more documents — “on the order of 100 petabytes,” he said.
Google Caffeine adds new information at a rate of hundreds of thousands of gigabytes per day, Google said in a blog post.
The progression in how Google does its indexing mirrors how people increasingly expect to find the very latest information online. Google noticed that after the Sept. 11 attacks on the U.S., when people were looking for the most up-to-the-minute information possible, Cutts said.
Derek Lackey, Raven5 Ltd, July 2010