General

New Google indexing system

New Google indexing system

Google presents its new search index: Caffeine (Caffeine)

Google has announced that its new Web indexing system, known as Caffeine, is ready.

Caffeine provides 50% more up-to-date results than the previously used index and is the largest collection of Web content offered by Google to date. Be it a news item, a blog or a message posted on a forum, it is now possible to find links to relevant content in much less time than before, counting from the date of its publication.

To be understood: when we do a search on Google, we are not looking directly on the Internet, but on the index that Google has developed of the Internet and that, like the index that is usually found at the back of a book, we helps us locate exactly the information we need.

So why has Google built a new indexing system for searches? Web content is growing. And not only does it grow in size and number, but with the advent of videos, images, news, and real-time updates, the average web page is richer and more complex. People's search expectations are also higher than they used to be. Searchers want to find the latest relevant content, and publishers hope that their content can be found as soon as they publish it.

That's why Google built Caffeine.

The old index had multiple layers, some of which updated faster than others; the main layer, for example, was updated every two weeks. To update a layer in the old index, Google analyzed the entire web, which meant that there was a significant lag between when the user could find a page and the date from which that page was available on the Internet.

With Caffeine, the Internet is analyzed in small chunks and the search index is continuously updated around the world. As Google finds new pages or new information on existing pages, you can add this content to the index directly. That means users can find newer information than ever before, regardless of when or where it was published.

Caffeine allows web pages to be indexed on a large scale, processing hundreds of thousands of pages in parallel every second. Caffeine uses nearly 100 million gigabytes of database storage and adds new information at a rate of hundreds of thousands of gigabytes per day.

Google has built Caffeine with the future in mind. The new index only offers more recent information, but provides a solid foundation on which to build a faster and more comprehensive search engine that grows with the growth of information online and provides even more relevant search results. So it is convenient to be attentive to the improvements that are produced in the coming months.

Source: Google Blog

Video: Redis system design. Distributed cache System design (September 2020).