History of development of search engines
In an initial stage of development the Internet, number of its users it was insignificant, and volume of the accessible information rather small. In most cases to the Internet employees of various universities and laboratories had access, and as a whole the Network was used in the scientific purposes. At this time the information search problem in a network the Internet was far not so much actual, as now.
Creation of catalogues of sites in which links to resources were grouped according to subjects became one of the first ways of the organisation of access to information resources of a network. Site Yahoo which has opened in April, 1994 became the first such project. After the number of sites in catalogue Yahoo has considerably increased, the information search capability under the catalogue has been added. It, certainly, was not a search engine in full sense as the search area has been limited only by the resources which are present at the catalogue, instead of all resources of a network the Internet.
Reference catalogues were widely used earlier, but have practically lost the popularity now. The reason of it is very simple – even the modern catalogues containing a large quantity of resources, represent the information only about very small part of a network the Internet. The biggest catalogue of network DMOZ (or Open Directory Project) contains the information on 5 million resources while the base of a Google search engine consists from more than 8 billion documents. In an initial stage of development the Internet, number of its users it was insignificant, and volume of the accessible information rather small. In most cases to the Internet employees of various universities and laboratories had access, and as a whole the Network was used in the scientific purposes. At this time the information search problem in a network the Internet was far not so much actual, as now.
Creation of catalogues of sites in which links to resources were grouped according to subjects became one of the first ways of the organisation of access to information resources of a network. Site Yahoo which has opened in April, 1994 became the first such project. After the number of sites in catalogue Yahoo has considerably increased, the information search capability under the catalogue has been added. It, certainly, was not a search engine in full sense as the search area has been limited only by the resources which are present at the catalogue, instead of all resources of a network the Internet.
Project WebCrawler appeared in 1994 became the first high-grade search engine.
In 1995 there were search engines Lycos and AltaVista. Last many long years was the leader in information search area in the Internet.
In 1997 Sergey Brin and Larri Pejdzh have created Google within the limits of the research project at Stanford University. At the moment Google the most popular search engine in the world.
On September, 23rd, 1997 the Yandex search engine most popular in a Russian-speaking part the Internet has been officially announced.
Now exists 3 basic international search engines – Google, Yahoo and MSN Search, having own bases and search algorithms. The majority of other search engines (which can be counted much) uses results of 3 listed in this or that kind. For example, search AOL (search.aol.com) and Mail.ru use base Google, and AltaVista, Lycos and AllTheWeb – base Yahoo.
In Russia the basic search engine is Yandex, behind it go Rambler, Google.ru, Aport, Mail.ru and KM.ru.
General principles of work of search engines
The search engine consists of following basic components:
Spider (spider) - browser such the program which downloads web pages.
Crawler (crawler, "a travelling" spider) – the program which automatically passes under all links found on page.
Indexer (indexer) - the program which analyzes web pages, downloaded spiders.
Database (database) – storehouse of the downloaded and processed pages.
Search engine results engine (the system of delivery of results) – takes search results from a database.
Web server (Web server) – a Web server which carries out interaction between the user and other components of a search engine.
Detailed realisation of search mechanisms can differ from each other (for example, sheaf Spider+Crawler+Indexer can be executed in the form of the uniform program which downloads known web pages, analyzes them and searches under links for new resources), however the described common features are inherent in all search engines.
Spider. The spider is a program which downloads web pages in the same way, as the browser of the user. Difference consists that the browser displays the information containing on page (text, graphic etc.), a spider has no any visual the component and works directly with the page html-text (you can make «html-code viewing» in your browser to see "the crude" html-text).
Crawler. Allocates all links which are present on page. Its problem - to define, where there should be a spider further, being based on links or proceeding from in advance set list of addresses. Краулер, following under the found links, carries out search of the new documents still unknown to a search engine.
Indexer. indexer assorts page on components and analyzes them. Various elements of page, such as the text, headings, the structural and style features, special office html-tegi etc. are allocated and analyzed
Database. The database is a storehouse of all data which the search engine downloads and analyzes. Sometimes a database name a search engine index.
Search Engine Results Engine. The system of delivery of results is engaged in ranging of pages. She solves, what pages satisfy to inquiry of the user, and in what order they should be sorted. It occurs according to algorithms of ranging of a search engine. This information is the most valuable and interesting to us – with this component of a search engine co-operates optimizer, trying to improve site positions in delivery, therefore further we will in detail consider all factors influencing ranging of results.
Web server. As a rule, on the server there is a html-page with a data entry field in which the user can set its interesting search term. The Web server also is responsible for delivery of results to the user in the form of html-page.
No comments:
Post a Comment