158.html ( File view )

  • By 2010-08-21
  • View(s):12
  • Download(s):0
  • Point(s): 1
			






Safari | Python Developer's Handbook -> Code Examples





< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162148045048070068078146247

Code Examples

Next, you have some code examples that demonstrate the concepts illustrated by this chapter.

HTML Parsing Tool (File: parsing.py)

We are going to use the exchange.html as the source of information for this program. The idea is to read the file, replace all the occurrences of the domain name "lessaworld" for "bebemania", and add hyperlinks for all email and Web pages references that exist there.

Listing 9.1 File: exchange.html
<HTML>
<HEAD>
<TITLE>Exchange Rates Home Page</TITLE>
</HEAD>
<BODY>
<p align=justify>
<b>List of current files that we have available at this site:</b></p>
<br>
http://www.lessaworld.com/exchange/real.txt <br>
http://www.lessaworld.com/exchange/pound.txt <br>
http://www.lessaworld.com/exchange/dollar.txt <br><br>

Many people are currently working to keep these exchange rates updated.<br>
Andre (andre@bebemania.com.br) handles all the Brazilian Real operations,
 meanwhile,Joao Pedro (jp@bebemania.com.br) takes care of pounds and
 dollars.<br><br>

</BODY>
</HTML>

The following code implements the parsing program.

Listing 9.2 File: parsing.py
 1:
 2: import re, sys
 3:
 4: TextOriginal = open("exchange.html").read()
 5:
 6: TextIn = re.sub("lessaworld", "bebemania", TextOriginal)
 7:
 8: operation_result = re.search(r'<title>(.*?)</title>', TextIn
    ,re.IGNORECASE)
 9: if operation_result:
10:     HTML_TITLE = operation_result.group(1)
11:
12: link_pattern = re.compile(r'((ftp|http)://[\w-]+(?:\.[\w-]+)*(?:/[\w-]*)*
                                   (?:\.[\w-]*)*)')
13: links = re.findall(link_pattern, TextIn)
14: TextIn = re.sub(link_pattern, r"<a href=\1>\1</a>", TextIn)
15:
16: email_pattern = re.compile(r'([a-zA-Z][\w-]*@[\w-]+(?:\.[\w-]+)*)')
17: emails = re.findall(email_pattern, TextIn)
18: TextIn = re.sub(email_pattern, r"<a href=mailto:\1>\1</a>", TextIn)
19:
20: FileOut = open("newexchange.html", "w")
21: FileOut.write(TextIn)
22: FileOut.close()
23:
24: print '"%s" is done.'% (HTML_TITLE)

Line 4: Opens and reads the original file.

Line 6: Replaces occurrences of "lessaworld" with "bebemania".

Lines 810: Locates the Web page title.

Line 10: The first group is the element between parenthesis in the regular expression of line 8.

Line 12: Creates a regular expression that locates all the Web addresses in the text.

Line 13: Creates a list of all the elements (links) that were found by the matching.

Line 14: Adds the hyperlinks for all the Web links that were found.

Line 16: Creates a regular expression that locates all the email addresses in the text.

Line 17: Creates a list of all the elements (emails) that were found by the matching.

Line 18: Adds the hyperlinks for all the email addresses that were found.

Lines 2022: Creates a new file with the new content.

In order to execute the routine, you just need to call it from the OS prompt, and then check the resulting file in your browser.

						
S:\python> python parsing.py
"Exchange Rates Home Page" is done.
S:\python>
				
					

TV Network Audiences (File: audience.py)

The next example demonstrates the use of the Queue module. The idea is to have several threads running and sharing information at the same time. The program starts several threads that execute some time-consuming operations, while the main thread is generating numbers that are used by all the other threads.

Listing 9.3 File: audience.py
 1:
 2: import threading, time
 3: import Queue, random
 4:
 5: class VCR(threading.Thread):
 6:     channels = ["KDSF", "FOKS", "CBA", "ESTN"]
 7:
 8:     def __init__(self, queue, channel, seconds):
 9:         self.__queue = queue
10:         self.seconds = seconds
11:         self.network = VCR.channels[channel-1]
12:         threading.Thread.__init__(self)
13:     def run(self):
14:         for i in range(self.seconds):
15:             time.sleep(0.0001)
16:             self.public = self.__queue.get()
17:             print "After %d seconds, %d people were watching %s" % 
}
18:               (self.seconds, self.public, self.network)
19:
20: queue = Queue.Queue(0)
21:
22: VCR(queue, 1, 60).start()
23: VCR(queue, 2, 40).start()
24: VCR(queue, 3, 35).start()
25: VCR(queue, 4, 75).start()
26:
27: audience = 0
28: while audience < random.randint(200,300):
29:     queue.put(audience)
30:     audience = audience + 1
31:     print "The audience now has %d people." % (audience)
32:     time.sleep(0.001)
33:
34: time.sleep(10)

Line 5: Defines a subclass of the Thread class.

Line 6: Creates a class variable.

Line 13: Implements the functionality that is executed when the thread is started.

Line 15: Pauses the execution, in order to let other threads run simultaneously.

Line 16: Gets the current value in the Queue.

Line 20: Initializes the Queue object that is shared by all threads.

Lines 2225: Starts all the threads.

Lines 2832: Implements a routine that keeps generating numbers to be passed to the thread.

Line 29: Sends a value to the queue in order to be collected by the threads.

Line 34: Pauses the main thread so that the other threads can end normally.


Last updated on 1/30/2002
Python Developer's Handbook, © 2002 Sams Publishing

< BACKMake Note | BookmarkCONTINUE >

Index terms contained in this section

code
      HTML parsing tool
HTML parsing tool source code
source code
      HTML parsing tool
tools
   &nb ... ... (Not finished, please download and read the complete file)
...
Expand> <Close

Want complete source code? Download it here

Point(s): 1

Download
0 lines left, continue to read
Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
0672319942.html3.37 kB01-06-02 20:06
1.html5.10 kB01-06-02 20:06
10.html3.69 kB01-06-02 20:06
100.html5.41 kB01-06-02 20:06
101.html7.96 kB01-06-02 20:06
102.html3.75 kB01-06-02 20:06
103.html3.75 kB01-06-02 20:06
104.html5.81 kB01-06-02 20:06
105.html16.46 kB01-06-02 20:06
106.html25.87 kB01-06-02 20:06
107.html7.44 kB01-06-02 20:06
108.html20.95 kB01-06-02 20:06
109.html10.02 kB01-06-02 20:06
11.html3.66 kB01-06-02 20:06
110.html9.58 kB01-06-02 20:06
111.html9.96 kB01-06-02 20:06
112.html11.34 kB01-06-02 20:06
113.html7.87 kB01-06-02 20:06
114.html13.56 kB01-06-02 20:06
115.html3.76 kB01-06-02 20:06
116.html3.76 kB01-06-02 20:06
117.html4.27 kB01-06-02 20:06
118.html4.27 kB01-06-02 20:06
119.html9.45 kB01-06-02 20:06
12.html3.66 kB01-06-02 20:06
120.html8.62 kB01-06-02 20:06
121.html65.99 kB01-06-02 20:06
122.html28.42 kB01-06-02 20:06
123.html14.92 kB01-06-02 20:06
124.html7.17 kB01-06-02 20:06
125.html24.21 kB01-06-02 20:06
126.html10.82 kB01-06-02 20:06
127.html10.54 kB01-06-02 20:06
128.html3.86 kB01-06-02 20:06
129.html3.86 kB01-06-02 20:06
13.html12.82 kB01-06-02 20:06
130.html5.85 kB01-06-02 20:06
131.html5.30 kB01-06-02 20:06
132.html29.95 kB01-06-02 20:06
133.html61.54 kB01-06-02 20:06
134.html40.34 kB01-06-02 20:06
135.html9.11 kB01-06-02 20:06
136.html15.53 kB01-06-02 20:06
137.html3.86 kB01-06-02 20:06
138.html3.86 kB01-06-02 20:06
139.html5.80 kB01-06-02 20:06
14.html11.79 kB01-06-02 20:06
140.html14.32 kB01-06-02 20:06
141.html26.46 kB01-06-02 20:06
142.html21.93 kB01-06-02 20:06
143.html17.03 kB01-06-02 20:06
144.html7.95 kB01-06-02 20:06
145.html28.59 kB01-06-02 20:06
146.html52.14 kB01-06-02 20:06
147.html8.37 kB01-06-02 20:06
148.html3.63 kB01-06-02 20:06
149.html3.63 kB01-06-02 20:06
15.html10.93 kB01-06-02 20:06
150.html4.42 kB01-06-02 20:06
151.html16.23 kB01-06-02 20:06
152.html32.55 kB01-06-02 20:06
153.html13.13 kB01-06-02 20:06
154.html28.33 kB01-06-02 20:06
155.html40.15 kB01-06-02 20:06
156.html23.47 kB01-06-02 20:06
157.html7.73 kB01-06-02 20:06
158.html10.61 kB01-06-02 20:06
159.html3.72 kB01-06-02 20:06
16.html11.42 kB01-06-02 20:06
160.html3.72 kB01-06-02 20:06
161.html3.64 kB01-06-02 20:06
162.html3.64 kB01-06-02 20:06
163.html4.70 kB01-06-02 20:06
164.html60.18 kB01-06-02 20:06
165.html42.25 kB01-06-02 20:06
166.html17.91 kB01-06-02 20:06
167.html9.76 kB01-06-02 20:06
168.html13.52 kB01-06-02 20:06
169.html10.35 kB01-06-02 20:06
17.html17.40 kB01-06-02 20:06
170.html9.08 kB01-06-02 20:06
171.html3.61 kB01-06-02 20:06
172.html3.61 kB01-06-02 20:06
173.html6.95 kB01-06-02 20:06
174.html27.86 kB01-06-02 20:06
175.html28.55 kB01-06-02 20:06
176.html16.39 kB01-06-02 20:06
177.html24.60 kB01-06-02 20:06
178.html10.20 kB01-06-02 20:06
179.html3.62 kB01-06-02 20:06
18.html12.04 kB01-06-02 20:06
180.html3.62 kB01-06-02 20:06
181.html7.21 kB01-06-02 20:06
182.html11.83 kB01-06-02 20:06
183.html17.37 kB01-06-02 20:06
184.html87.57 kB01-06-02 20:06
185.html25.23 kB01-06-02 20:06
186.html6.62 kB01-06-02 20:06
187.html3.73 kB01-06-02 20:06
188.html3.73 kB01-06-02 20:06
189.html4.76 kB01-06-02 20:06
19.html5.49 kB01-06-02 20:06
190.html74.34 kB01-06-02 20:06
191.html9.56 kB01-06-02 20:06
192.html31.86 kB01-06-02 20:06
193.html67.73 kB01-06-02 20:06
194.html69.48 kB01-06-02 20:06
195.html32.75 kB01-06-02 20:06
196.html10.74 kB01-06-02 20:06
197.html3.54 kB01-06-02 20:06
198.html3.54 kB01-06-02 20:06
199.html3.81 kB01-06-02 20:06
2.html5.10 kB01-06-02 20:06
20.html9.25 kB01-06-02 20:06
200.html3.81 kB01-06-02 20:06
201.html8.82 kB01-06-02 20:06
202.html7.07 kB01-06-02 20:06
203.html50.32 kB01-06-02 20:06
204.html8.07 kB01-06-02 20:06
205.html7.53 kB01-06-02 20:06
206.html3.60 kB01-06-02 20:06
207.html3.60 kB01-06-02 20:06
208.html7.22 kB01-06-02 20:06
209.html21.63 kB01-06-02 20:06
21.html5.84 kB01-06-02 20:06
210.html24.67 kB01-06-02 20:06
211.html30.17 kB01-06-02 20:06
212.html159.15 kB01-06-02 20:06
213.html18.72 kB01-06-02 20:06
214.html6.54 kB01-06-02 20:06
215.html7.22 kB01-06-02 20:06
216.html6.76 kB01-06-02 20:06
217.html3.11 kB01-06-02 20:06
218.html3.54 kB01-06-02 20:06
219.html4.14 kB01-06-02 20:06
22.html3.69 kB01-06-02 20:06
220.html4.14 kB01-06-02 20:06
221.html4.82 kB01-06-02 20:06
222.html50.92 kB01-06-02 20:06
223.html3.87 kB01-06-02 20:06
224.html57.67 kB01-06-02 20:06
225.html37.66 kB01-06-02 20:06
226.html5.04 kB01-06-02 20:06
227.html3.85 kB01-06-02 20:06
228.html3.85 kB01-06-02 20:06
229.html4.47 kB01-06-02 20:06
23.html3.69 kB01-06-02 20:06
230.html21.41 kB01-06-02 20:06
231.html19.56 kB01-06-02 20:06
232.html26.27 kB01-06-02 20:06
233.html10.07 kB01-06-02 20:06
234.html22.22 kB01-06-02 20:06
235.html36.83 kB01-06-02 20:06
236.html49.23 kB01-06-02 20:06
237.html16.63 kB01-06-02 20:06
238.html6.96 kB01-06-02 20:06
239.html3.09 kB01-06-02 20:06
24.html4.86 kB01-06-02 20:06
240.html3.43 kB01-06-02 20:06
241.html3.59 kB01-06-02 20:06
242.html3.59 kB01-06-02 20:06
243.html17.49 kB01-06-02 20:06
244.html9.38 kB01-06-02 20:06
245.html16.62 kB01-06-02 20:06
246.html9.81 kB01-06-02 20:06
247.html11.50 kB01-06-02 20:06
248.html8.95 kB01-06-02 20:06
249.html8.93 kB01-06-02 20:06
25.html11.48 kB01-06-02 20:06
250.html10.89 kB01-06-02 20:06
251.html8.21 kB01-06-02 20:06
252.html5.14 kB01-06-02 20:06
253.html3.61 kB01-06-02 20:06
254.html3.61 kB01-06-02 20:06
255.html4.61 kB01-06-02 20:06
256.html4.61 kB01-06-02 20:06
257.html43.07 kB01-06-02 20:06
258.html10.12 kB01-06-02 20:06
259.html7.99 kB01-06-02 20:06
26.html25.88 kB01-06-02 20:06
260.html17.17 kB01-06-02 20:06
261.html10.99 kB01-06-02 20:06
262.html15.70 kB01-06-02 20:06
263.html36.19 kB01-06-02 20:06
264.html61.93 kB01-06-02 20:06
265.html39.81 kB01-06-02 20:06
266.html15.13 kB01-06-02 20:06
267.html16.34 kB01-06-02 20:06
268.html3.55 kB01-06-02 20:06
269.html3.55 kB01-06-02 20:06
27.html27.92 kB01-06-02 20:06
270.html14.30 kB01-06-02 20:06
271.html14.21 kB01-06-02 20:06
272.html9.44 kB01-06-02 20:06
273.html8.41 kB01-06-02 20:06
274.html5.45 kB01-06-02 20:06
275.html5.45 kB01-06-02 20:06
276.html8.08 kB01-06-02 20:06
277.html9.78 kB01-06-02 20:06
278.html6.04 kB01-06-02 20:06
279.html6.89 kB01-06-02 20:06
28.html12.02 kB01-06-02 20:06
280.html11.77 kB01-06-02 20:06
281.html10.18 kB01-06-02 20:06
282.html4.98 kB01-06-02 20:06
283.html4.98 kB01-06-02 20:06
284.html4.12 kB01-06-02 20:06
285.html5.78 kB01-06-02 20:06
286.html12.42 kB01-06-02 20:06
287.html6.87 kB01-06-02 20:06
29.html45.69 kB01-06-02 20:06
3.html5.51 kB01-06-02 20:06
30.html10.03 kB01-06-02 20:06
31.html23.06 kB01-06-02 20:06
32.html22.26 kB01-06-02 20:06
33.html18.88 kB01-06-02 20:06
34.html18.25 kB01-06-02 20:06
35.html15.70 kB01-06-02 20:06
36.html5.50 kB01-06-02 20:06
37.html11.83 kB01-06-02 20:06
38.html3.69 kB01-06-02 20:06
39.html3.69 kB01-06-02 20:06
4.html5.51 kB01-06-02 20:06
40.html8.24 kB01-06-02 20:06
41.html12.92 kB01-06-02 20:06
42.html4.51 kB01-06-02 20:06
43.html3.90 kB01-06-02 20:06
44.html3.52 kB01-06-02 20:06
45.html5.99 kB01-06-02 20:06
46.html4.33 kB01-06-02 20:06
47.html4.19 kB01-06-02 20:06
48.html4.01 kB01-06-02 20:06
49.html3.95 kB01-06-02 20:06
5.html4.69 kB01-06-02 20:06
50.html3.63 kB01-06-02 20:06
51.html3.75 kB01-06-02 20:06
52.html5.56 kB01-06-02 20:06
53.html4.41 kB01-06-02 20:06
54.html6.99 kB01-06-02 20:06
55.html3.52 kB01-06-02 20:06
56.html3.95 kB01-06-02 20:06
57.html3.69 kB01-06-02 20:06
58.html4.27 kB01-06-02 20:06
59.html3.88 kB01-06-02 20:06
6.html4.69 kB01-06-02 20:06
60.html3.58 kB01-06-02 20:06
61.html4.17 kB01-06-02 20:06
62.html3.93 kB01-06-02 20:06
63.html3.68 kB01-06-02 20:06
64.html4.16 kB01-06-02 20:06
65.html4.04 kB01-06-02 20:06
66.html4.97 kB01-06-02 20:06
67.html3.91 kB01-06-02 20:06
68.html3.55 kB01-06-02 20:06
69.html3.51 kB01-06-02 20:06
7.html10.67 kB01-06-02 20:06
70.html3.52 kB01-06-02 20:06
71.html3.83 kB01-06-02 20:06
72.html4.02 kB01-06-02 20:06
73.html21.61 kB01-06-02 20:06
74.html12.94 kB01-06-02 20:06
75.html31.26 kB01-06-02 20:06
76.html11.98 kB01-06-02 20:06
77.html4.24 kB01-06-02 20:06
78.html4.26 kB01-06-02 20:06
79.html10.95 kB01-06-02 20:06
8.html10.67 kB01-06-02 20:06
80.html12.11 kB01-06-02 20:06
81.html4.88 kB01-06-02 20:06
82.html6.48 kB01-06-02 20:06
83.html5.64 kB01-06-02 20:06
84.html13.30 kB01-06-02 20:06
85.html7.00 kB01-06-02 20:06
86.html3.90 kB01-06-02 20:06
87.html4.59 kB01-06-02 20:06
88.html5.00 kB01-06-02 20:06
89.html17.71 kB01-06-02 20:06
9.html3.69 kB01-06-02 20:06
90.html6.00 kB01-06-02 20:06
91.html3.68 kB01-06-02 20:06
92.html3.68 kB01-06-02 20:06
93.html12.98 kB01-06-02 20:06
94.html10.60 kB01-06-02 20:06
95.html19.93 kB01-06-02 20:06
96.html10.34 kB01-06-02 20:06
97.html6.27 kB01-06-02 20:06
98.html6.77 kB01-06-02 20:06
99.html16.97 kB01-06-02 20:06
front_matter.html4.16 kB01-06-02 20:06
index.html3.37 kB01-06-02 20:06
new_toc.html15.79 kB01-06-02 20:06
rindex1.html53.02 kB01-06-02 20:05
rindex10.html5.16 kB01-06-02 20:05
rindex11.html3.96 kB01-06-02 20:05
rindex12.html15.07 kB01-06-02 20:05
rindex13.html84.43 kB01-06-02 20:05
rindex14.html10.85 kB01-06-02 20:05
rindex15.html26.04 kB01-06-02 20:05
rindex16.html58.33 kB01-06-02 20:05
rindex17.html3.52 kB01-06-02 20:05
rindex18.html16.83 kB01-06-02 20:05
rindex19.html52.89 kB01-06-02 20:05
rindex2.html13.47 kB01-06-02 20:05
rindex20.html17.55 kB01-06-02 20:05
rindex21.html13.56 kB01-06-02 20:05
rindex22.html10.63 kB01-06-02 20:05
rindex23.html16.92 kB01-06-02 20:05
rindex24.html4.90 kB01-06-02 20:05
rindex25.html2.50 kB01-06-02 20:05
rindex3.html43.59 kB01-06-02 20:06
rindex4.html21.76 kB01-06-02 20:06
rindex5.html22.31 kB01-06-02 20:06
rindex6.html36.57 kB01-06-02 20:06
rindex7.html14.06 kB01-06-02 20:06
rindex8.html11.93 kB01-06-02 20:06
rindex9.html25.33 kB01-06-02 20:06
toc.html24.97 kB01-06-02 20:06
<(ebook>0.00 BPython) O'Reilly
<0672319942>0.00 B25-08-02 17:24
<graphics>0.00 B25-08-02 17:24
<images>0.00 B25-08-02 17:24
<oreillyi>0.00 B25-08-02 17:24
oreillyM.css4.42 kB31-05-02 17:16
oreillyN.css4.42 kB31-05-02 17:16
00.gif41.00 B31-05-02 17:16
01fig01.gif58.03 kB28-01-02 08:54
01fig02.gif39.76 kB28-01-02 08:54
01fig03.gif22.06 kB28-01-02 08:54
01fig04.gif64.75 kB28-01-02 08:54
02fig01.gif54.48 kB28-01-02 08:54
02fig02.gif16.29 kB28-01-02 08:54
02fig03.gif14.11 kB28-01-02 08:54
02fig04.gif47.82 kB28-01-02 08:54
06fig01.gif32.19 kB28-01-02 08:54
06fig02.gif58.39 kB28-01-02 08:54
06fig03.gif48.96 kB28-01-02 08:54
07fig01.gif47.96 kB28-01-02 08:54
07fig02.gif8.99 kB28-01-02 08:54
07fig03.gif9.46 kB28-01-02 08:54
07fig04.gif11.51 kB28-01-02 08:54
07fig05.gif8.27 kB28-01-02 08:54
07fig06.gif31.46 kB28-01-02 08:54
12fig01.gif34.54 kB28-01-02 08:54
15fig01.gif1.30 kB28-01-02 08:54
15fig02.gif1.50 kB28-01-02 08:54
15fig03.gif5.93 kB28-01-02 08:54
15fig04.gif924.00 B28-01-02 08:54
15fig05.gif1.55 kB28-01-02 08:54
15fig06.gif1.62 kB28-01-02 08:54
15fig07.gif1.69 kB28-01-02 08:54
15fig08.gif1.02 kB28-01-02 08:54
15fig09.gif2.54 kB28-01-02 08:54
15fig10.gif1.92 kB28-01-02 08:54
15fig11.gif2.44 kB28-01-02 08:54
15fig12.gif2.48 kB28-01-02 08:54
15fig13.gif2.36 kB28-01-02 08:54
15fig14.gif5.43 kB28-01-02 08:54
15fig15.gif5.51 kB28-01-02 08:54
15fig16.gif2.18 kB28-01-02 08:54
15fig17.gif12.22 kB28-01-02 08:54
16fig01.gif28.99 kB28-01-02 08:54
16fig02.gif23.02 kB28-01-02 08:54
16fig03.gif32.48 kB28-01-02 08:54
16fig04.gif17.27 kB28-01-02 08:54
16fig05.gif23.88 kB28-01-02 08:54
16fig06.gif23.85 kB28-01-02 08:54
16fig07.gif36.76 kB28-01-02 08:54
16fig08.gif20.66 kB28-01-02 08:54
16fig09.gif71.63 kB28-01-02 08:54
16fig10.gif23.23 kB28-01-02 08:54
16fig11.gif22.97 kB28-01-02 08:54
16fig12.gif22.06 kB28-01-02 08:54
16fig13.gif49.73 kB28-01-02 08:54
16fig14.gif60.34 kB28-01-02 08:54
18fig01.gif63.94 kB28-01-02 08:54
18fig02.gif24.07 kB28-01-02 08:54
ccc.gif109.00 B28-01-02 08:54
spacer.gif41.00 B31-05-02 17:16
view.gif41.00 B31-05-02 17:16
0672319942_s.jpg2.91 kB29-01-02 11:29
...
Sponsored links
  • Sent successfully!
  • 1 point

158.html (1.65 MB)

Need 1 point
Your Point(s)

Your Point isn't enough.

Get point immediately by PayPal

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more point

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D