Jump to content

Robots exclusion standard

From Simple English Wikipedia, the free encyclopedia
Revision as of 12:50, 16 January 2017 by 81.128.132.202 (talk)

The robots exclusion standard (also called the robots exclusion protocol or robots.txt protocol) is a way of telling Web crawlers and other Web robots which parts of a Web site they can see.

To give robots instructions about which pages of a Web site they can access, site owners put a text file called robots.txt in the main directory of their Website, e.g. http://www.example.com/robots.txt.[1] This text file tells robots which parts of the site they can and cannot access. However, robots can ignore robots.txt files, especially malicious (bad) robots.[2] If the robots.txt file does not exist, Web robots assume that they can see all parts of the site.maya gotts

Examples of robots.txt files

References

  1. "Robot Exclusion Standard". HelpForWebBeginners.com. Retrieved 2012-02-13.
  2. "About /robots.txt". Robotstxt.org. Retrieved 2012-02-13.