166.html ( File view )

  • By 2010-08-21
  • View(s):12
  • Download(s):0
  • Point(s): 1
			






Safari | Python Developer's Handbook -> Accessing URLs





< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162148046198039088135025208

Accessing URLs

URL stands for uniform resource locator. URLs are those strings, such as http://www.lessaworld.com/, that you have to type in your Web browser in order to jump to a Web page.

Python provides the urllib and urlparse modules as great tools to process URLs.

Tip

Many applications today that have to parse Web pages always suffer with changes in the page design. However, these problems will go away when more structural formats (such as XML) start getting used to producing the pages.



The urllib Module

The urllib module is a high-level interface to retrieve data across the World Wide Web, supporting any HTTP, FTP, and gopher connections by using sockets. This module defines functions for writing programs that must be active users of the Web. It is normally used as an outer interface to other modules, such as httplib, ftplib, gopherlib, and so on.

To retrieve a Web page, use the urllib.urlopen(url [,data]) function. This function returns a stream object that can be manipulated as easily as any other regular file object, and is illustrated as follows:

						
>>> import urllib
>>> page = urllib.urlopen("http://www.bog.frb.fed.us")
>>> page.readline()

					

This stream object has two additional attributes: url and headers. The first one is the URL that you are opening, and the other is a dictionary that contains the page headers, as illustrated in the next example.

						
>>> page.url
'http://www.bog.frb.fed.us'
>>> for key, value in page.headers.items():
       print key, " = ", value

server  =  Microsoft-IIS/4.0
content-type  =  text/html
content-length  =  461
date  =  Thu, 15 Jun 2000 15:31:32 GMT
				
					

Next, you have a couple of other functions that are made available by the urllib module.

urllib.urlretrieve(url [,filename] [,hook] Copies a network object to a local file.

								
>>> urllib.urlretrieve('http://www.lessaworld.com', 'copy.html')

							

urllib.urlcleanup() Cleans up the cache used by urllib.urlretrieve.

urllib.quote(string [,safe]) Replaces special characters in string using %xx escape codes. The optional safe parameter specifies additional characters that should be quoted.

								
>>> urllib.quote('This & that @ home')
'this%20%26%20that%20%40%20home'

							

urllib.quote_plus(string [,safe])Works just like quote(), but it replaces spaces by using plus signs.

urllib.unquote(string) Returns the original value that was passed to urllib.quote.

								
>>> urllib.unquote('this%20%26%20that%20%40%20home')
'This & that @ home'

							

urllib.urlencode(dict)Converts a dictionary into a URL-encoded string.

								
>>> dict = {
 'sex':'female', 'name':'renata lessa'
}
>>> urllib.urlencode(dict)
'sex=female&name=renata+lessa'
						
							

Note

For those that have Python 2.0 installed, keep in mind that the new urllib module is able to scan environment variables for proxy configuration.

Also note that Python 2.0's version of the urllib module has support to " https:// " URLs over SSL.



The urlparse Module

The urlparse module manipulates an URL string, parsing it into tuples. It is able to break an URL up into components, combines them back, and converts relative addresses to absolute addresses. Basically, it rips URLs apart, being able to put them together again.

Let's take a look at the functions that are provided by this module:

urlparse.urlparse()
syntax: urlparse.urlparse(urlstring [,default_scheme [,allow_fragments]])

							

Parses an URL into six elementsaddressing scheme, network location, path, parameters, query, fragment identifierreturning the following tuple:

								
>>> urlparse('http://www.python.org/FAQ.html')
('http', 'www.python.org','FAQ.html','','','')

							

urlparse.urlunparse(tuple)Constructs a URL string from a tuple as returned by urlparse().

urlparse.urljoin(base, url [,allow_fragments])Combines an absolute URL with a relative URL.

								
>>>urljoin('http://www.python.org', 'doc/lib')
'http://www.python.org/doc/lib'

							

The next example copies a Web page into a local file:< ... ... (Not finished, please download and read the complete file)

...
Expand> <Close

Want complete source code? Download it here

Point(s): 1

Download
0 lines left, continue to read
Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
0672319942.html3.37 kB01-06-02 20:06
1.html5.10 kB01-06-02 20:06
10.html3.69 kB01-06-02 20:06
100.html5.41 kB01-06-02 20:06
101.html7.96 kB01-06-02 20:06
102.html3.75 kB01-06-02 20:06
103.html3.75 kB01-06-02 20:06
104.html5.81 kB01-06-02 20:06
105.html16.46 kB01-06-02 20:06
106.html25.87 kB01-06-02 20:06
107.html7.44 kB01-06-02 20:06
108.html20.95 kB01-06-02 20:06
109.html10.02 kB01-06-02 20:06
11.html3.66 kB01-06-02 20:06
110.html9.58 kB01-06-02 20:06
111.html9.96 kB01-06-02 20:06
112.html11.34 kB01-06-02 20:06
113.html7.87 kB01-06-02 20:06
114.html13.56 kB01-06-02 20:06
115.html3.76 kB01-06-02 20:06
116.html3.76 kB01-06-02 20:06
117.html4.27 kB01-06-02 20:06
118.html4.27 kB01-06-02 20:06
119.html9.45 kB01-06-02 20:06
12.html3.66 kB01-06-02 20:06
120.html8.62 kB01-06-02 20:06
121.html65.99 kB01-06-02 20:06
122.html28.42 kB01-06-02 20:06
123.html14.92 kB01-06-02 20:06
124.html7.17 kB01-06-02 20:06
125.html24.21 kB01-06-02 20:06
126.html10.82 kB01-06-02 20:06
127.html10.54 kB01-06-02 20:06
128.html3.86 kB01-06-02 20:06
129.html3.86 kB01-06-02 20:06
13.html12.82 kB01-06-02 20:06
130.html5.85 kB01-06-02 20:06
131.html5.30 kB01-06-02 20:06
132.html29.95 kB01-06-02 20:06
133.html61.54 kB01-06-02 20:06
134.html40.34 kB01-06-02 20:06
135.html9.11 kB01-06-02 20:06
136.html15.53 kB01-06-02 20:06
137.html3.86 kB01-06-02 20:06
138.html3.86 kB01-06-02 20:06
139.html5.80 kB01-06-02 20:06
14.html11.79 kB01-06-02 20:06
140.html14.32 kB01-06-02 20:06
141.html26.46 kB01-06-02 20:06
142.html21.93 kB01-06-02 20:06
143.html17.03 kB01-06-02 20:06
144.html7.95 kB01-06-02 20:06
145.html28.59 kB01-06-02 20:06
146.html52.14 kB01-06-02 20:06
147.html8.37 kB01-06-02 20:06
148.html3.63 kB01-06-02 20:06
149.html3.63 kB01-06-02 20:06
15.html10.93 kB01-06-02 20:06
150.html4.42 kB01-06-02 20:06
151.html16.23 kB01-06-02 20:06
152.html32.55 kB01-06-02 20:06
153.html13.13 kB01-06-02 20:06
154.html28.33 kB01-06-02 20:06
155.html40.15 kB01-06-02 20:06
156.html23.47 kB01-06-02 20:06
157.html7.73 kB01-06-02 20:06
158.html10.61 kB01-06-02 20:06
159.html3.72 kB01-06-02 20:06
16.html11.42 kB01-06-02 20:06
160.html3.72 kB01-06-02 20:06
161.html3.64 kB01-06-02 20:06
162.html3.64 kB01-06-02 20:06
163.html4.70 kB01-06-02 20:06
164.html60.18 kB01-06-02 20:06
165.html42.25 kB01-06-02 20:06
166.html17.91 kB01-06-02 20:06
167.html9.76 kB01-06-02 20:06
168.html13.52 kB01-06-02 20:06
169.html10.35 kB01-06-02 20:06
17.html17.40 kB01-06-02 20:06
170.html9.08 kB01-06-02 20:06
171.html3.61 kB01-06-02 20:06
172.html3.61 kB01-06-02 20:06
173.html6.95 kB01-06-02 20:06
174.html27.86 kB01-06-02 20:06
175.html28.55 kB01-06-02 20:06
176.html16.39 kB01-06-02 20:06
177.html24.60 kB01-06-02 20:06
178.html10.20 kB01-06-02 20:06
179.html3.62 kB01-06-02 20:06
18.html12.04 kB01-06-02 20:06
180.html3.62 kB01-06-02 20:06
181.html7.21 kB01-06-02 20:06
182.html11.83 kB01-06-02 20:06
183.html17.37 kB01-06-02 20:06
184.html87.57 kB01-06-02 20:06
185.html25.23 kB01-06-02 20:06
186.html6.62 kB01-06-02 20:06
187.html3.73 kB01-06-02 20:06
188.html3.73 kB01-06-02 20:06
189.html4.76 kB01-06-02 20:06
19.html5.49 kB01-06-02 20:06
190.html74.34 kB01-06-02 20:06
191.html9.56 kB01-06-02 20:06
192.html31.86 kB01-06-02 20:06
193.html67.73 kB01-06-02 20:06
194.html69.48 kB01-06-02 20:06
195.html32.75 kB01-06-02 20:06
196.html10.74 kB01-06-02 20:06
197.html3.54 kB01-06-02 20:06
198.html3.54 kB01-06-02 20:06
199.html3.81 kB01-06-02 20:06
2.html5.10 kB01-06-02 20:06
20.html9.25 kB01-06-02 20:06
200.html3.81 kB01-06-02 20:06
201.html8.82 kB01-06-02 20:06
202.html7.07 kB01-06-02 20:06
203.html50.32 kB01-06-02 20:06
204.html8.07 kB01-06-02 20:06
205.html7.53 kB01-06-02 20:06
206.html3.60 kB01-06-02 20:06
207.html3.60 kB01-06-02 20:06
208.html7.22 kB01-06-02 20:06
209.html21.63 kB01-06-02 20:06
21.html5.84 kB01-06-02 20:06
210.html24.67 kB01-06-02 20:06
211.html30.17 kB01-06-02 20:06
212.html159.15 kB01-06-02 20:06
213.html18.72 kB01-06-02 20:06
214.html6.54 kB01-06-02 20:06
215.html7.22 kB01-06-02 20:06
216.html6.76 kB01-06-02 20:06
217.html3.11 kB01-06-02 20:06
218.html3.54 kB01-06-02 20:06
219.html4.14 kB01-06-02 20:06
22.html3.69 kB01-06-02 20:06
220.html4.14 kB01-06-02 20:06
221.html4.82 kB01-06-02 20:06
222.html50.92 kB01-06-02 20:06
223.html3.87 kB01-06-02 20:06
224.html57.67 kB01-06-02 20:06
225.html37.66 kB01-06-02 20:06
226.html5.04 kB01-06-02 20:06
227.html3.85 kB01-06-02 20:06
228.html3.85 kB01-06-02 20:06
229.html4.47 kB01-06-02 20:06
23.html3.69 kB01-06-02 20:06
230.html21.41 kB01-06-02 20:06
231.html19.56 kB01-06-02 20:06
232.html26.27 kB01-06-02 20:06
233.html10.07 kB01-06-02 20:06
234.html22.22 kB01-06-02 20:06
235.html36.83 kB01-06-02 20:06
236.html49.23 kB01-06-02 20:06
237.html16.63 kB01-06-02 20:06
238.html6.96 kB01-06-02 20:06
239.html3.09 kB01-06-02 20:06
24.html4.86 kB01-06-02 20:06
240.html3.43 kB01-06-02 20:06
241.html3.59 kB01-06-02 20:06
242.html3.59 kB01-06-02 20:06
243.html17.49 kB01-06-02 20:06
244.html9.38 kB01-06-02 20:06
245.html16.62 kB01-06-02 20:06
246.html9.81 kB01-06-02 20:06
247.html11.50 kB01-06-02 20:06
248.html8.95 kB01-06-02 20:06
249.html8.93 kB01-06-02 20:06
25.html11.48 kB01-06-02 20:06
250.html10.89 kB01-06-02 20:06
251.html8.21 kB01-06-02 20:06
252.html5.14 kB01-06-02 20:06
253.html3.61 kB01-06-02 20:06
254.html3.61 kB01-06-02 20:06
255.html4.61 kB01-06-02 20:06
256.html4.61 kB01-06-02 20:06
257.html43.07 kB01-06-02 20:06
258.html10.12 kB01-06-02 20:06
259.html7.99 kB01-06-02 20:06
26.html25.88 kB01-06-02 20:06
260.html17.17 kB01-06-02 20:06
261.html10.99 kB01-06-02 20:06
262.html15.70 kB01-06-02 20:06
263.html36.19 kB01-06-02 20:06
264.html61.93 kB01-06-02 20:06
265.html39.81 kB01-06-02 20:06
266.html15.13 kB01-06-02 20:06
267.html16.34 kB01-06-02 20:06
268.html3.55 kB01-06-02 20:06
269.html3.55 kB01-06-02 20:06
27.html27.92 kB01-06-02 20:06
270.html14.30 kB01-06-02 20:06
271.html14.21 kB01-06-02 20:06
272.html9.44 kB01-06-02 20:06
273.html8.41 kB01-06-02 20:06
274.html5.45 kB01-06-02 20:06
275.html5.45 kB01-06-02 20:06
276.html8.08 kB01-06-02 20:06
277.html9.78 kB01-06-02 20:06
278.html6.04 kB01-06-02 20:06
279.html6.89 kB01-06-02 20:06
28.html12.02 kB01-06-02 20:06
280.html11.77 kB01-06-02 20:06
281.html10.18 kB01-06-02 20:06
282.html4.98 kB01-06-02 20:06
283.html4.98 kB01-06-02 20:06
284.html4.12 kB01-06-02 20:06
285.html5.78 kB01-06-02 20:06
286.html12.42 kB01-06-02 20:06
287.html6.87 kB01-06-02 20:06
29.html45.69 kB01-06-02 20:06
3.html5.51 kB01-06-02 20:06
30.html10.03 kB01-06-02 20:06
31.html23.06 kB01-06-02 20:06
32.html22.26 kB01-06-02 20:06
33.html18.88 kB01-06-02 20:06
34.html18.25 kB01-06-02 20:06
35.html15.70 kB01-06-02 20:06
36.html5.50 kB01-06-02 20:06
37.html11.83 kB01-06-02 20:06
38.html3.69 kB01-06-02 20:06
39.html3.69 kB01-06-02 20:06
4.html5.51 kB01-06-02 20:06
40.html8.24 kB01-06-02 20:06
41.html12.92 kB01-06-02 20:06
42.html4.51 kB01-06-02 20:06
43.html3.90 kB01-06-02 20:06
44.html3.52 kB01-06-02 20:06
45.html5.99 kB01-06-02 20:06
46.html4.33 kB01-06-02 20:06
47.html4.19 kB01-06-02 20:06
48.html4.01 kB01-06-02 20:06
49.html3.95 kB01-06-02 20:06
5.html4.69 kB01-06-02 20:06
50.html3.63 kB01-06-02 20:06
51.html3.75 kB01-06-02 20:06
52.html5.56 kB01-06-02 20:06
53.html4.41 kB01-06-02 20:06
54.html6.99 kB01-06-02 20:06
55.html3.52 kB01-06-02 20:06
56.html3.95 kB01-06-02 20:06
57.html3.69 kB01-06-02 20:06
58.html4.27 kB01-06-02 20:06
59.html3.88 kB01-06-02 20:06
6.html4.69 kB01-06-02 20:06
60.html3.58 kB01-06-02 20:06
61.html4.17 kB01-06-02 20:06
62.html3.93 kB01-06-02 20:06
63.html3.68 kB01-06-02 20:06
64.html4.16 kB01-06-02 20:06
65.html4.04 kB01-06-02 20:06
66.html4.97 kB01-06-02 20:06
67.html3.91 kB01-06-02 20:06
68.html3.55 kB01-06-02 20:06
69.html3.51 kB01-06-02 20:06
7.html10.67 kB01-06-02 20:06
70.html3.52 kB01-06-02 20:06
71.html3.83 kB01-06-02 20:06
72.html4.02 kB01-06-02 20:06
73.html21.61 kB01-06-02 20:06
74.html12.94 kB01-06-02 20:06
75.html31.26 kB01-06-02 20:06
76.html11.98 kB01-06-02 20:06
77.html4.24 kB01-06-02 20:06
78.html4.26 kB01-06-02 20:06
79.html10.95 kB01-06-02 20:06
8.html10.67 kB01-06-02 20:06
80.html12.11 kB01-06-02 20:06
81.html4.88 kB01-06-02 20:06
82.html6.48 kB01-06-02 20:06
83.html5.64 kB01-06-02 20:06
84.html13.30 kB01-06-02 20:06
85.html7.00 kB01-06-02 20:06
86.html3.90 kB01-06-02 20:06
87.html4.59 kB01-06-02 20:06
88.html5.00 kB01-06-02 20:06
89.html17.71 kB01-06-02 20:06
9.html3.69 kB01-06-02 20:06
90.html6.00 kB01-06-02 20:06
91.html3.68 kB01-06-02 20:06
92.html3.68 kB01-06-02 20:06
93.html12.98 kB01-06-02 20:06
94.html10.60 kB01-06-02 20:06
95.html19.93 kB01-06-02 20:06
96.html10.34 kB01-06-02 20:06
97.html6.27 kB01-06-02 20:06
98.html6.77 kB01-06-02 20:06
99.html16.97 kB01-06-02 20:06
front_matter.html4.16 kB01-06-02 20:06
index.html3.37 kB01-06-02 20:06
new_toc.html15.79 kB01-06-02 20:06
rindex1.html53.02 kB01-06-02 20:05
rindex10.html5.16 kB01-06-02 20:05
rindex11.html3.96 kB01-06-02 20:05
rindex12.html15.07 kB01-06-02 20:05
rindex13.html84.43 kB01-06-02 20:05
rindex14.html10.85 kB01-06-02 20:05
rindex15.html26.04 kB01-06-02 20:05
rindex16.html58.33 kB01-06-02 20:05
rindex17.html3.52 kB01-06-02 20:05
rindex18.html16.83 kB01-06-02 20:05
rindex19.html52.89 kB01-06-02 20:05
rindex2.html13.47 kB01-06-02 20:05
rindex20.html17.55 kB01-06-02 20:05
rindex21.html13.56 kB01-06-02 20:05
rindex22.html10.63 kB01-06-02 20:05
rindex23.html16.92 kB01-06-02 20:05
rindex24.html4.90 kB01-06-02 20:05
rindex25.html2.50 kB01-06-02 20:05
rindex3.html43.59 kB01-06-02 20:06
rindex4.html21.76 kB01-06-02 20:06
rindex5.html22.31 kB01-06-02 20:06
rindex6.html36.57 kB01-06-02 20:06
rindex7.html14.06 kB01-06-02 20:06
rindex8.html11.93 kB01-06-02 20:06
rindex9.html25.33 kB01-06-02 20:06
toc.html24.97 kB01-06-02 20:06
<(ebook>0.00 BPython) O'Reilly
<0672319942>0.00 B25-08-02 17:24
<graphics>0.00 B25-08-02 17:24
<images>0.00 B25-08-02 17:24
<oreillyi>0.00 B25-08-02 17:24
oreillyM.css4.42 kB31-05-02 17:16
oreillyN.css4.42 kB31-05-02 17:16
00.gif41.00 B31-05-02 17:16
01fig01.gif58.03 kB28-01-02 08:54
01fig02.gif39.76 kB28-01-02 08:54
01fig03.gif22.06 kB28-01-02 08:54
01fig04.gif64.75 kB28-01-02 08:54
02fig01.gif54.48 kB28-01-02 08:54
02fig02.gif16.29 kB28-01-02 08:54
02fig03.gif14.11 kB28-01-02 08:54
02fig04.gif47.82 kB28-01-02 08:54
06fig01.gif32.19 kB28-01-02 08:54
06fig02.gif58.39 kB28-01-02 08:54
06fig03.gif48.96 kB28-01-02 08:54
07fig01.gif47.96 kB28-01-02 08:54
07fig02.gif8.99 kB28-01-02 08:54
07fig03.gif9.46 kB28-01-02 08:54
07fig04.gif11.51 kB28-01-02 08:54
07fig05.gif8.27 kB28-01-02 08:54
07fig06.gif31.46 kB28-01-02 08:54
12fig01.gif34.54 kB28-01-02 08:54
15fig01.gif1.30 kB28-01-02 08:54
15fig02.gif1.50 kB28-01-02 08:54
15fig03.gif5.93 kB28-01-02 08:54
15fig04.gif924.00 B28-01-02 08:54
15fig05.gif1.55 kB28-01-02 08:54
15fig06.gif1.62 kB28-01-02 08:54
15fig07.gif1.69 kB28-01-02 08:54
15fig08.gif1.02 kB28-01-02 08:54
15fig09.gif2.54 kB28-01-02 08:54
15fig10.gif1.92 kB28-01-02 08:54
15fig11.gif2.44 kB28-01-02 08:54
15fig12.gif2.48 kB28-01-02 08:54
15fig13.gif2.36 kB28-01-02 08:54
15fig14.gif5.43 kB28-01-02 08:54
15fig15.gif5.51 kB28-01-02 08:54
15fig16.gif2.18 kB28-01-02 08:54
15fig17.gif12.22 kB28-01-02 08:54
16fig01.gif28.99 kB28-01-02 08:54
16fig02.gif23.02 kB28-01-02 08:54
16fig03.gif32.48 kB28-01-02 08:54
16fig04.gif17.27 kB28-01-02 08:54
16fig05.gif23.88 kB28-01-02 08:54
16fig06.gif23.85 kB28-01-02 08:54
16fig07.gif36.76 kB28-01-02 08:54
16fig08.gif20.66 kB28-01-02 08:54
16fig09.gif71.63 kB28-01-02 08:54
16fig10.gif23.23 kB28-01-02 08:54
16fig11.gif22.97 kB28-01-02 08:54
16fig12.gif22.06 kB28-01-02 08:54
16fig13.gif49.73 kB28-01-02 08:54
16fig14.gif60.34 kB28-01-02 08:54
18fig01.gif63.94 kB28-01-02 08:54
18fig02.gif24.07 kB28-01-02 08:54
ccc.gif109.00 B28-01-02 08:54
spacer.gif41.00 B31-05-02 17:16
view.gif41.00 B31-05-02 17:16
0672319942_s.jpg2.91 kB29-01-02 11:29
...
Sponsored links
  • Sent successfully!
  • 1 point

166.html (1.65 MB)

Need 1 point
Your Point(s)

Your Point isn't enough.

Get point immediately by PayPal

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more point

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D