Robots.txt Tutorial

Example Robots.txt Format

Allow indexing of everything

User-agent: *
Disallow:

Disallow indexing of everything

User-agent: *
Disallow: /

Disawllow indexing of a psecific folder

User-agent: *
Disallow: /folder/

Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot
Disallow: /folder1/
Allow: /folder1/myfile.html

Robots.txt Wildcard Matching

Google and Microsoft's Bing allow the use of wildcards in robots.txt files.

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?

You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:

User-agent: Googlebot
Disallow: /*.asp$

Create a robots.txt file

The simplest robots.txt file uses two rules:

User-agent: the robot the following rule applies to
Disallow: the URL you want to block

These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.
Each section in the robots.txt file is separate and does not build upon previous sections. For example:

User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

In this example only the URLs matching /folder2/ would be disallowed for Googlebot.

User-agents and bots

A user-agent is a specific search engine robot. The Web Robots Database lists many common bots. You can set an entry to apply to a specific bot (by listing the name) or you can set it to apply to all bots (by listing an asterisk). An entry that applies to all bots looks like this:

User-agent: *

Google uses several different bots (user-agents). The bot we use for our web search is Googlebot. Our other bots like Googlebot-Mobile and Googlebot-Image follow rules you set up for Googlebot, but you can set up specific rules for these specific bots as well.

Blocking user-agents

The Disallow line lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).

To block the entire site, use a forward slash.
```
Disallow: /
```
To block a directory and everything in it, follow the directory name with a forward slash.
```
Disallow: /junk-directory/
```
To block a page, list the page.
```
Disallow: /private_file.html
```
To remove a specific image from Google Images, add the following:
```
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg 
```
To remove all images on your site from Google Images:
```
User-agent: Googlebot-Image
Disallow: / 
```
To block files of a specific file type (for example, .gif), use the following:
```
User-agent: Googlebot
Disallow: /*.gif$
```
To prevent pages on your site from being crawled, while still displaying AdSense ads on those pages, disallow all bots other than Mediapartners-Google. This keeps the pages from appearing in search results, but allows the Mediapartners-Google robot to analyze the pages to determine the ads to show. The Mediapartners-Google robot doesn't share pages with the other Google user-agents. For example:
```
User-agent: *
Disallow: /

User-agent: Mediapartners-Google
Allow: /
```

Note that directives are case-sensitive. For instance, Disallow: /junk_file.asp would block http://www.example.com/junk_file.asp, but would allow http://www.example.com/Junk_file.asp. Googlebot will ignore white-space (in particular empty lines)and unknown directives in the robots.txt.

Free Directory List | Free Bookmarking List