166.html ( File view )
- By 2010-08-21
- View(s):12
- Download(s):0
- Point(s): 1
Safari | Python Developer's Handbook -> Accessing URLs
See All Titles |
![]() ![]() Accessing URLsURL stands for uniform resource locator. URLs are those strings, such as http://www.lessaworld.com/, that you have to type in your Web browser in order to jump to a Web page. Python provides the urllib and urlparse modules as great tools to process URLs. Tip
Many applications today that have to parse Web pages always suffer with changes in the page design. However, these problems will go away when more structural formats (such as XML) start getting used to producing the pages. The urllib ModuleThe urllib module is a high-level interface to retrieve data across the World Wide Web, supporting any HTTP, FTP, and gopher connections by using sockets. This module defines functions for writing programs that must be active users of the Web. It is normally used as an outer interface to other modules, such as httplib, ftplib, gopherlib, and so on. To retrieve a Web page, use the urllib.urlopen(url [,data]) function. This function returns a stream object that can be manipulated as easily as any other regular file object, and is illustrated as follows: >>> import urllib >>> page = urllib.urlopen("http://www.bog.frb.fed.us") >>> page.readline() This stream object has two additional attributes: url and headers. The first one is the URL that you are opening, and the other is a dictionary that contains the page headers, as illustrated in the next example. >>> page.url 'http://www.bog.frb.fed.us' >>> for key, value in page.headers.items(): print key, " = ", value server = Microsoft-IIS/4.0 content-type = text/html content-length = 461 date = Thu, 15 Jun 2000 15:31:32 GMT Next, you have a couple of other functions that are made available by the urllib module.
Note
For those that have Python 2.0 installed, keep in mind that the new urllib module is able to scan environment variables for proxy configuration. Also note that Python 2.0's version of the urllib module has support to " https:// " URLs over SSL. The urlparse ModuleThe urlparse module manipulates an URL string, parsing it into tuples. It is able to break an URL up into components, combines them back, and converts relative addresses to absolute addresses. Basically, it rips URLs apart, being able to put them together again. Let's take a look at the functions that are provided by this module: urlparse.urlparse() syntax: urlparse.urlparse(urlstring [,default_scheme [,allow_fragments]]) The next example copies a Web page into a local file:< ... ... (Not finished, please download and read the complete file) ...
Expand> <Close
Sponsored links
File listTips: You can preview the content of files by clicking file names^_^
|
Your Point isn't enough.
Get point immediately by PayPal
More(Debit card / Credit card / PayPal Credit / Online Banking)