Skip to main content

What does a robots.txt do?

As the file extension already indicates, the robots.txt file is a human-readable text file. The purpose of robots.txt is to inform search engines such as Google or Bing that selected pages of a website may not be included in the search engine index. The technical details of a robots.txt follow the specifications of the robots exclusion standard.

However, the contents of a robots.txt are purely indicative. To ensure that the excluded parts of a website are not actually included in the search engine index, the relevant web crawlers must adhere to the specifications in robots.txt. In particular, it is not possible to use a robots.txt to protect the content of a website from being accessed by unauthorized persons.

Example of a robots.txt:

# robots.txt for example.com
# I exclude these web crawlers
User-agent: Sidewinder
Disallow: /

User-agent: Microsoft.URL.Control
Disallow: /

# These directories/files should not be
# be searched
User-agent: *
Disallow: /default.html
Disallow: /Temp/ # these contents will disappear soon
Disallow: /Privat/Familie/Geburtstage.html # Not secret, but should not be listed in search engines.

Further reference:

https://de.wikipedia.org/wiki/Robots_Exclusion_Standard

The following video explains the benefits of a robots.txt in connection with the Google search engine: