Jump to content

Robots exclusion standard

From Simple English Wikipedia, the free encyclopedia
Revision as of 00:50, 14 February 2012 by SassoBot (talk | changes) (r2.7.2) (Robot: Adding bar, ca, cs, da, de, en, es, fi, fr, he, id, it, ja, ko, nl, pl, pt, ru, sv, tr, zh)

The robots exclusion standard (also called the robots exclusion protocol or robots.txt protocol) is a way of telling Web crawlers and other Web robots what parts of a Web site they can see.

To give robots instructions about which pages of a Web site they can access, site owners put a text file called robots.txt in the main directory of their Web site , e.g. http://www.example.com/robots.txt.[1] This text file tells robots which parts of the site they can and cannot access. However, robots can ignore robots.txt files, especially malicious (bad) robots.[2] If the robots.txt file does not exist, Web robots assume that they can see all parts of the site.

Examples of robots.txt files

References

  1. "Robot Exclusion Standard". HelpForWebBeginners.com. Retrieved 2012-02-13.
  2. "About /robots.txt". Robotstxt.org. Retrieved 2012-02-13.