Distributed search engine

A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in a decentralized manner where there is no single point of control.

History

Rorur

The short-term goal of the Rorur project is to create a distributed search engine that runs on a network of computers of common people. A competitive latency and the delivery of the requested rank can be achieved if the number of participating nodes is large enough and the fraction of malicious nodes does not exceed a calculable threshold https://rorur.com/whitepaper. The architecture builds on open-source algorithms that rely on public contribution for development and maintenance. To incentivize those who join and contribute, the revenue from advertising is distributed among node maintainers. The long-term goal is to have built-in personal search agents that construct and maintain personal knowledge graphs to assist the human-web interaction. ^[1]

Presearch

Presearch is a search engine powered by a distributed network of community operated nodes which aggregate results from a variety of sources. This powers the searches at https://engine.presearch.org/search This is planned to be a precursor where each node collaborates on a global decentralised index. ^[2]

YaCy

On December 15, 2003 Michael Christen announced development of a P2P-based search engine, eventually named YaCy, on the heise online forums.^[3]^[4]

Dews

A theoretical design for adistributed search engine discussed in academic literature. ^[5]

Seeks

Seeks was an open source websearch proxy and collaborative distributed tool for websearch. It ceased to have a usable release in 2016.

InfraSearch

In April 2000 several programmers (including Gene Kan, Steve Waterhouse) built a prototype P2P web search engine based on Gnutella called InfraSearch. The technology was later acquired by Sun Microsystems and incorporated into the JXTA project.^[6] It was meant to run inside the participating websites' databases creating a P2P network that could be accessed through the InfraSearch website.^[7]^[8]^[9]

Opencola

On May 31, 2000 Steelbridge Inc. announced development of OpenCOLA a collaborative distributive open source search engine.^[10] It runs on the user's computer and crawls the web pages and links the user puts in their opencola folder and shares resulting index over its P2P network.^[11]

FAROO

In February 2001 Wolf Garbe published an idea of a peer-to-peer search engine,^[12] started the Faroo prototype in 2004,^[13] and released it in 2005.^[14]^[15]

Goals

The goals of building a distributed decentralized search engine include:

1. to create an independent search engine powered by the community;

2. to make the search operation open and transparent by relying on open-source software;

3. to distribute the advertising revenue to node maintainers, which may help create more robust web infrastructure;

4. to allow researchers to contribute to the development of open-source and publicly-maintainable ranking algorithms and to oversee the training of the algorithm parameters.

Challenges

1. The amount of data to be processed is enormous. The size of the visible web is estimated at 5PB spread around 10 billion pages.

2. The latency of the distributed operation must be competitive with the latency of the commercial search engines.

3. A mechanism that prevents malicious users from corrupting the distributed data structures or the rank needs to be developed.

References

^ "Distributed Search Engine".
^ "Presearch is a Decentralized Search Engine".
^ "YaCy: News". Archived from the original on 2005-11-24.
^ Michael Christen. "Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit?". heise online.
^ "DEWS: A decentralized engine for Web search".
^ Justin Hibbard. "Can peer-to-peer grow up?". Red Herring.^{[permanent dead link]}
^ Simon Foust. "Move Over Yahoo, Here Comes InfraSearch". Dmusic. Archived from the original on 2000-10-13.
^ Sean M. Dugan. "Peer-to-peer networking is poised to revolutionize the Internet once again". InfoWorld. Archived from the original on 2000-10-18.
^ John Borland. "Napster-like technology takes Web search to new level". Cnet.
^ David Akin. "Software launched with a little pop". Financial Post.^{[dead link]}
^ Paul Heltzel. "OpenCola-Have Some Code and a Smile". Technology Review.
^ Wolf Garbe. "BINGOOO - Die Transformation des World Wide Web zur virtuellen Datenbank" (in German). Wirtschaftinformatik. Archived from the original on 2014-02-02. Retrieved 2010-12-21. ... Wir setzen dem das Konzept einer verteilten Peer-to-Peer-Suchmaschine entgegen [We counter with the concept of a distributed peer-to-peer search engine] ...
^ Bernard Lunn. "Technical Q&A With FAROO Founder". ReadWriteWeb. Archived from the original on 2011-02-14. ... When I started to work on the first prototype in 2004 ...
^ "FAROO: History". Archived from the original on 2008-03-22.
^ "Revisited: Deriving crawler start points from visited pages by monitoring HTTP traffic". Faroo.

[1] "Distributed Search Engine".

[2] "Presearch is a Decentralized Search Engine".

[3] "YaCy: News". Archived from the original on 2005-11-24.

[4] Michael Christen. "Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit?". heise online.

[5] "DEWS: A decentralized engine for Web search".

[6] Justin Hibbard. "Can peer-to-peer grow up?". Red Herring.^{[permanent dead link]}

[7] Simon Foust. "Move Over Yahoo, Here Comes InfraSearch". Dmusic. Archived from the original on 2000-10-13.

[8] Sean M. Dugan. "Peer-to-peer networking is poised to revolutionize the Internet once again". InfoWorld. Archived from the original on 2000-10-18.

[9] John Borland. "Napster-like technology takes Web search to new level". Cnet.

[10] David Akin. "Software launched with a little pop". Financial Post.^{[dead link]}

[11] Paul Heltzel. "OpenCola-Have Some Code and a Smile". Technology Review.

[12] Wolf Garbe. "BINGOOO - Die Transformation des World Wide Web zur virtuellen Datenbank" (in German). Wirtschaftinformatik. Archived from the original on 2014-02-02. Retrieved 2010-12-21. ... Wir setzen dem das Konzept einer verteilten Peer-to-Peer-Suchmaschine entgegen [We counter with the concept of a distributed peer-to-peer search engine] ...

[13] Bernard Lunn. "Technical Q&A With FAROO Founder". ReadWriteWeb. Archived from the original on 2011-02-14. ... When I started to work on the first prototype in 2004 ...

[14] "FAROO: History". Archived from the original on 2008-03-22.

[15] "Revisited: Deriving crawler start points from visited pages by monitoring HTTP traffic". Faroo.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

v t e Distributed search engines
Distributed web search	Presearch Seeks YaCy
Distributed web crawlers	Grub
italics = defunct