Website hosting service by Active-Venture.com
  

 Back to Index

12.19 robotparser -- Parser for robots.txt

This module provides a single class, RobotFileParser, which answers questions about whether or not a particular user agent can fetch a URL on the Web site that published the robots.txt file. For more details on the structure of robots.txt files, see http://www.robotstxt.org/wc/norobots.html.

 
class RobotFileParser( )

This class provides a set of methods to read, parse and answer questions about a single robots.txt file.

 
set_url( url)
Sets the URL referring to a robots.txt file.
 
read( )
Reads the robots.txt URL and feeds it to the parser.
 
parse( lines)
Parses the lines argument.
 
can_fetch( useragent, url)
Returns True if the useragent is allowed to fetch the url according to the rules contained in the parsed robots.txt file.
 
mtime( )
Returns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically.
 
modified( )
Sets the time the robots.txt file was last fetched to the current time.

 

The following example demonstrates basic use of the RobotFileParser class.

>>> import robotparser
>>> rp = robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True

  

 

2002-2004 Active-Venture.com Webhosting Service

 

Disclaimer: This documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://python.org/doc/

 

Cheap domain registrar provides cheap domain registration, buy domain name and domain transfer from $5.95/year only   Cheap domain registration : Buy domain name and enjoy comprehensive free services