k2 apache: comprehension Web Crawlers

Friday, June 24, 2011

comprehension Web Crawlers

Web crawlers are nothing but programs which helps to acquire relevant facts for the search engines by crawling the web.The term crawler is given to them because they crawl from one link to other across the websites and manages to fetch information. They recognize themselves to a web server by using Http request's user agent field.We can recognize which crawler visited our site by examining the web server's log (The server on which the site is hosted).

There are many crawlers in the world wide web today which continuously crawls the web and manages to get facts about the varied websites and helps the respective search engines to modernize their database.Some of the well known web crawlers are as follows:-

Apache

1- Google crawler- The Google crawler is based on C++ and Python.It is ordinarily known as the Google Bot.

2- Yahoo crawler- The Yahoo search crawler is also known as the 'Slurp'.It helps Yahoo to allege its search database.

3- Msn crawler-The Msn or Bing crawler is known as Msn bot.Bing manages to get facts on the web with the help of this software.

4- Rbse- This was the first published crawler for the world wide web.

5- Web Race-This is a web crawler created in Java.An prominent feature of WebRace is that it continuously receives new starting Url's to crawl from.

6- Grub- It is an open source(free) crawler of the Wikia search.

7- Nutch- It is other web crawler which is written in Java and is released under Apache license.

comprehension Web Crawlers

k2 apache

Friday, June 24, 2011

comprehension Web Crawlers

Apache

No comments:

Post a Comment