190.html ( File view )

  • By 2010-08-21
  • View(s):12
  • Download(s):0
  • Point(s): 1
			






Safari | Python Developer's Handbook -> XML Processing





< BACKMake Note | BookmarkCONTINUE >
152015024128143245168232148039199167010047123209178152124239215162148041040017229250144100

XML Processing

The first standard that you will learn how to manipulate in Python is XML.

The Web already has a standard for defining markup languages like HTML, which is called SGML. HTML is actually defined in SGML. SGML could have been used as this new standard, and browsers could have been extended with SGML parsers. However, SGML is quite complex to implement and contains a lot of features that are very rarely used.

SGML is much more than a Web standard because it was around long before the Web. HTML is an application of SGML, and XML is a subset.

SGML also lacks character sets support, and it is difficult to interpret an SGML document without having the definition of the markup language (the DTDDocument Type Definition) available.

Consequently, it was decided to develop a simplified version of SGML, which was called XML. The main point of XML is that you, by defining your own markup language, can encode the information of your documents more precisely than is possible with HTML. This meas that programs processing these documents can "understand" them much better and therefore process the information in ways that are impossible with HTML (or ordinary text processor documents).

Introduction to XML

The Extensible Markup Language (XML) is a subset of SGML. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

XML describes a class of data objects called XML documents and partially describes the behavior of computer programs that process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language (ISO 8879). By construction, XML documents are conforming SGML documents. An XML parser can check if an XML document is formal without the aid of a DTD.

XML documents are made up of storage units called elements, which contain either parsed or unparsed data, and are delimited by tags. Parsed data is made up of characters, some of which form character data, and some of which form markup elements. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

A software module called an XML parser is used to read XML documents and provide access to their content and structure. It is assumed that an XML parser is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML parser in terms of how it must read XML data and the information it must provide to the application. For more information, check out

Extensible Markup Language (XML) Recommendation

W3C RecommendationExtensible Markup Language (XML) 1.0

http://www.w3.org/TR/REC-xml.html

Writing an XML File

As you can see next, it is simple to define your own markup language with XML. The next block of code is the content of a file called survey.xml. This code defines a specific markup language for a given survey.

						
<!DOCTYPE SURVEY SYSTEM  "SURVEY.DTD">
<SURVEY>
  <CLIENT>
     <NAME>         Lessaworld Corp.           </NAME>
     <LOCATION>     Pittsburgh, PA             </LOCATION>
     <CONTACT>      Andre Lessa                </CONTACT>
     <EMAIL>        webmaster@lessaworld.com   </EMAIL>
     <TELEPHONE>   (412)555-5555                </TELEPHONE>
  </CLIENT>
  <SECTION SECTION_ID="1">
  <QUESTION QUESTION_ID="1" QUESTION_LEVEL="1">
   <QUESTION_DESC>What is your favorite language?</QUESTION_DESC>
   <Op1>Python</Op1>
   <Op2>Perl</Op2>
  </QUESTION>
  <QUESTION QUESTION_ID="2" QUESTION_LEVEL="1">
    <QUESTION_DESC>Do you use this language at work?</QUESTION_DESC>
    <Op1>Yes</Op1>
    <Op2>No</Op2>
  </QUESTION> <QUESTION QUESTION_ID="3" QUESTION_LEVEL="1">
    <QUESTION_DESC>Did you expect the Spanish inquisition?</QUESTION_DESC>
    <Op1>No</Op1> 
    <Op2>Of course not</Op2>
  </QUESTION> 
   </SECTION>
</SURVEY>

					

In order to complement the XML markup language shown previously, we need a Document Type Definition (DTD), just like the following one. The DTD can be part of the XML file, or it can be stored as an independent file, as we are doing here. Note the first line of the XML file, where we are passing the name of the DTD file (survey.dtd). Also, it seems that XML is standardizing the use of XML Schemas rather the DTDs.

						
<!ELEMENT SURVEY      (CLIENT, SECTION+)>

<!ELEMENT CLIENT (NAME, LOCATION, CONTACT?, EMAIL?, TELEPHONE?)>
<!ELEMENT NAME         (#PCDATA)>
<!ELEMENT LOCATION     (#PCDATA)>
<!ELEMENT CONTACT      (#PCDATA)> 
<!ELEMENT EMAIL        (#PCDATA)>
<!ELEMENT TELEPHONE    (#PCDATA)>

<!ELEMENT SECTION     (QUESTION+)>
<!ELEMENT QUESTION    (QUESTION_DESC, Op1, Op2)>

<!ELEMENT QUESTION_DESC   (#PCDATA)>
<!ELEMENT Op1              (#PCDATA)>
<!ELEMENT Op2              (#PCDATA)>

<!ATTLIST SECTION    SECTION_ID     CDATA #IMPLIED>
<!ATTLIST QUESTION   QUESTION_ID    CDATA #IMPLIED
                     QUESTION_LEVEL CDATA #IMPLIED>

					

Now, let's understand how a DTD works. For a simple example, like this one, we need two special tags called <!ELEMENT> and <!ATTLIST>.

The <!ELEMENT> definition tag is used to define the elements presented in the XML file. The general syntax is

						
lt;!ELEMENT NAME
 CONTENTS>
					

The first argument (NAME) gives the name of the element, and the second one (CONTENTS) lists the element names that are allowed to be underneath the element that we are defining.

The ordering that we use to list the contents is important. When we say, for example,

						

...
...
(Not finished, please download and read the complete file)
			
...
Expand> <Close

Want complete source code? Download it here

Point(s): 1

Download
0 lines left, continue to read
Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
0672319942.html3.37 kB01-06-02 20:06
1.html5.10 kB01-06-02 20:06
10.html3.69 kB01-06-02 20:06
100.html5.41 kB01-06-02 20:06
101.html7.96 kB01-06-02 20:06
102.html3.75 kB01-06-02 20:06
103.html3.75 kB01-06-02 20:06
104.html5.81 kB01-06-02 20:06
105.html16.46 kB01-06-02 20:06
106.html25.87 kB01-06-02 20:06
107.html7.44 kB01-06-02 20:06
108.html20.95 kB01-06-02 20:06
109.html10.02 kB01-06-02 20:06
11.html3.66 kB01-06-02 20:06
110.html9.58 kB01-06-02 20:06
111.html9.96 kB01-06-02 20:06
112.html11.34 kB01-06-02 20:06
113.html7.87 kB01-06-02 20:06
114.html13.56 kB01-06-02 20:06
115.html3.76 kB01-06-02 20:06
116.html3.76 kB01-06-02 20:06
117.html4.27 kB01-06-02 20:06
118.html4.27 kB01-06-02 20:06
119.html9.45 kB01-06-02 20:06
12.html3.66 kB01-06-02 20:06
120.html8.62 kB01-06-02 20:06
121.html65.99 kB01-06-02 20:06
122.html28.42 kB01-06-02 20:06
123.html14.92 kB01-06-02 20:06
124.html7.17 kB01-06-02 20:06
125.html24.21 kB01-06-02 20:06
126.html10.82 kB01-06-02 20:06
127.html10.54 kB01-06-02 20:06
128.html3.86 kB01-06-02 20:06
129.html3.86 kB01-06-02 20:06
13.html12.82 kB01-06-02 20:06
130.html5.85 kB01-06-02 20:06
131.html5.30 kB01-06-02 20:06
132.html29.95 kB01-06-02 20:06
133.html61.54 kB01-06-02 20:06
134.html40.34 kB01-06-02 20:06
135.html9.11 kB01-06-02 20:06
136.html15.53 kB01-06-02 20:06
137.html3.86 kB01-06-02 20:06
138.html3.86 kB01-06-02 20:06
139.html5.80 kB01-06-02 20:06
14.html11.79 kB01-06-02 20:06
140.html14.32 kB01-06-02 20:06
141.html26.46 kB01-06-02 20:06
142.html21.93 kB01-06-02 20:06
143.html17.03 kB01-06-02 20:06
144.html7.95 kB01-06-02 20:06
145.html28.59 kB01-06-02 20:06
146.html52.14 kB01-06-02 20:06
147.html8.37 kB01-06-02 20:06
148.html3.63 kB01-06-02 20:06
149.html3.63 kB01-06-02 20:06
15.html10.93 kB01-06-02 20:06
150.html4.42 kB01-06-02 20:06
151.html16.23 kB01-06-02 20:06
152.html32.55 kB01-06-02 20:06
153.html13.13 kB01-06-02 20:06
154.html28.33 kB01-06-02 20:06
155.html40.15 kB01-06-02 20:06
156.html23.47 kB01-06-02 20:06
157.html7.73 kB01-06-02 20:06
158.html10.61 kB01-06-02 20:06
159.html3.72 kB01-06-02 20:06
16.html11.42 kB01-06-02 20:06
160.html3.72 kB01-06-02 20:06
161.html3.64 kB01-06-02 20:06
162.html3.64 kB01-06-02 20:06
163.html4.70 kB01-06-02 20:06
164.html60.18 kB01-06-02 20:06
165.html42.25 kB01-06-02 20:06
166.html17.91 kB01-06-02 20:06
167.html9.76 kB01-06-02 20:06
168.html13.52 kB01-06-02 20:06
169.html10.35 kB01-06-02 20:06
17.html17.40 kB01-06-02 20:06
170.html9.08 kB01-06-02 20:06
171.html3.61 kB01-06-02 20:06
172.html3.61 kB01-06-02 20:06
173.html6.95 kB01-06-02 20:06
174.html27.86 kB01-06-02 20:06
175.html28.55 kB01-06-02 20:06
176.html16.39 kB01-06-02 20:06
177.html24.60 kB01-06-02 20:06
178.html10.20 kB01-06-02 20:06
179.html3.62 kB01-06-02 20:06
18.html12.04 kB01-06-02 20:06
180.html3.62 kB01-06-02 20:06
181.html7.21 kB01-06-02 20:06
182.html11.83 kB01-06-02 20:06
183.html17.37 kB01-06-02 20:06
184.html87.57 kB01-06-02 20:06
185.html25.23 kB01-06-02 20:06
186.html6.62 kB01-06-02 20:06
187.html3.73 kB01-06-02 20:06
188.html3.73 kB01-06-02 20:06
189.html4.76 kB01-06-02 20:06
19.html5.49 kB01-06-02 20:06
190.html74.34 kB01-06-02 20:06
191.html9.56 kB01-06-02 20:06
192.html31.86 kB01-06-02 20:06
193.html67.73 kB01-06-02 20:06
194.html69.48 kB01-06-02 20:06
195.html32.75 kB01-06-02 20:06
196.html10.74 kB01-06-02 20:06
197.html3.54 kB01-06-02 20:06
198.html3.54 kB01-06-02 20:06
199.html3.81 kB01-06-02 20:06
2.html5.10 kB01-06-02 20:06
20.html9.25 kB01-06-02 20:06
200.html3.81 kB01-06-02 20:06
201.html8.82 kB01-06-02 20:06
202.html7.07 kB01-06-02 20:06
203.html50.32 kB01-06-02 20:06
204.html8.07 kB01-06-02 20:06
205.html7.53 kB01-06-02 20:06
206.html3.60 kB01-06-02 20:06
207.html3.60 kB01-06-02 20:06
208.html7.22 kB01-06-02 20:06
209.html21.63 kB01-06-02 20:06
21.html5.84 kB01-06-02 20:06
210.html24.67 kB01-06-02 20:06
211.html30.17 kB01-06-02 20:06
212.html159.15 kB01-06-02 20:06
213.html18.72 kB01-06-02 20:06
214.html6.54 kB01-06-02 20:06
215.html7.22 kB01-06-02 20:06
216.html6.76 kB01-06-02 20:06
217.html3.11 kB01-06-02 20:06
218.html3.54 kB01-06-02 20:06
219.html4.14 kB01-06-02 20:06
22.html3.69 kB01-06-02 20:06
220.html4.14 kB01-06-02 20:06
221.html4.82 kB01-06-02 20:06
222.html50.92 kB01-06-02 20:06
223.html3.87 kB01-06-02 20:06
224.html57.67 kB01-06-02 20:06
225.html37.66 kB01-06-02 20:06
226.html5.04 kB01-06-02 20:06
227.html3.85 kB01-06-02 20:06
228.html3.85 kB01-06-02 20:06
229.html4.47 kB01-06-02 20:06
23.html3.69 kB01-06-02 20:06
230.html21.41 kB01-06-02 20:06
231.html19.56 kB01-06-02 20:06
232.html26.27 kB01-06-02 20:06
233.html10.07 kB01-06-02 20:06
234.html22.22 kB01-06-02 20:06
235.html36.83 kB01-06-02 20:06
236.html49.23 kB01-06-02 20:06
237.html16.63 kB01-06-02 20:06
238.html6.96 kB01-06-02 20:06
239.html3.09 kB01-06-02 20:06
24.html4.86 kB01-06-02 20:06
240.html3.43 kB01-06-02 20:06
241.html3.59 kB01-06-02 20:06
242.html3.59 kB01-06-02 20:06
243.html17.49 kB01-06-02 20:06
244.html9.38 kB01-06-02 20:06
245.html16.62 kB01-06-02 20:06
246.html9.81 kB01-06-02 20:06
247.html11.50 kB01-06-02 20:06
248.html8.95 kB01-06-02 20:06
249.html8.93 kB01-06-02 20:06
25.html11.48 kB01-06-02 20:06
250.html10.89 kB01-06-02 20:06
251.html8.21 kB01-06-02 20:06
252.html5.14 kB01-06-02 20:06
253.html3.61 kB01-06-02 20:06
254.html3.61 kB01-06-02 20:06
255.html4.61 kB01-06-02 20:06
256.html4.61 kB01-06-02 20:06
257.html43.07 kB01-06-02 20:06
258.html10.12 kB01-06-02 20:06
259.html7.99 kB01-06-02 20:06
26.html25.88 kB01-06-02 20:06
260.html17.17 kB01-06-02 20:06
261.html10.99 kB01-06-02 20:06
262.html15.70 kB01-06-02 20:06
263.html36.19 kB01-06-02 20:06
264.html61.93 kB01-06-02 20:06
265.html39.81 kB01-06-02 20:06
266.html15.13 kB01-06-02 20:06
267.html16.34 kB01-06-02 20:06
268.html3.55 kB01-06-02 20:06
269.html3.55 kB01-06-02 20:06
27.html27.92 kB01-06-02 20:06
270.html14.30 kB01-06-02 20:06
271.html14.21 kB01-06-02 20:06
272.html9.44 kB01-06-02 20:06
273.html8.41 kB01-06-02 20:06
274.html5.45 kB01-06-02 20:06
275.html5.45 kB01-06-02 20:06
276.html8.08 kB01-06-02 20:06
277.html9.78 kB01-06-02 20:06
278.html6.04 kB01-06-02 20:06
279.html6.89 kB01-06-02 20:06
28.html12.02 kB01-06-02 20:06
280.html11.77 kB01-06-02 20:06
281.html10.18 kB01-06-02 20:06
282.html4.98 kB01-06-02 20:06
283.html4.98 kB01-06-02 20:06
284.html4.12 kB01-06-02 20:06
285.html5.78 kB01-06-02 20:06
286.html12.42 kB01-06-02 20:06
287.html6.87 kB01-06-02 20:06
29.html45.69 kB01-06-02 20:06
3.html5.51 kB01-06-02 20:06
30.html10.03 kB01-06-02 20:06
31.html23.06 kB01-06-02 20:06
32.html22.26 kB01-06-02 20:06
33.html18.88 kB01-06-02 20:06
34.html18.25 kB01-06-02 20:06
35.html15.70 kB01-06-02 20:06
36.html5.50 kB01-06-02 20:06
37.html11.83 kB01-06-02 20:06
38.html3.69 kB01-06-02 20:06
39.html3.69 kB01-06-02 20:06
4.html5.51 kB01-06-02 20:06
40.html8.24 kB01-06-02 20:06
41.html12.92 kB01-06-02 20:06
42.html4.51 kB01-06-02 20:06
43.html3.90 kB01-06-02 20:06
44.html3.52 kB01-06-02 20:06
45.html5.99 kB01-06-02 20:06
46.html4.33 kB01-06-02 20:06
47.html4.19 kB01-06-02 20:06
48.html4.01 kB01-06-02 20:06
49.html3.95 kB01-06-02 20:06
5.html4.69 kB01-06-02 20:06
50.html3.63 kB01-06-02 20:06
51.html3.75 kB01-06-02 20:06
52.html5.56 kB01-06-02 20:06
53.html4.41 kB01-06-02 20:06
54.html6.99 kB01-06-02 20:06
55.html3.52 kB01-06-02 20:06
56.html3.95 kB01-06-02 20:06
57.html3.69 kB01-06-02 20:06
58.html4.27 kB01-06-02 20:06
59.html3.88 kB01-06-02 20:06
6.html4.69 kB01-06-02 20:06
60.html3.58 kB01-06-02 20:06
61.html4.17 kB01-06-02 20:06
62.html3.93 kB01-06-02 20:06
63.html3.68 kB01-06-02 20:06
64.html4.16 kB01-06-02 20:06
65.html4.04 kB01-06-02 20:06
66.html4.97 kB01-06-02 20:06
67.html3.91 kB01-06-02 20:06
68.html3.55 kB01-06-02 20:06
69.html3.51 kB01-06-02 20:06
7.html10.67 kB01-06-02 20:06
70.html3.52 kB01-06-02 20:06
71.html3.83 kB01-06-02 20:06
72.html4.02 kB01-06-02 20:06
73.html21.61 kB01-06-02 20:06
74.html12.94 kB01-06-02 20:06
75.html31.26 kB01-06-02 20:06
76.html11.98 kB01-06-02 20:06
77.html4.24 kB01-06-02 20:06
78.html4.26 kB01-06-02 20:06
79.html10.95 kB01-06-02 20:06
8.html10.67 kB01-06-02 20:06
80.html12.11 kB01-06-02 20:06
81.html4.88 kB01-06-02 20:06
82.html6.48 kB01-06-02 20:06
83.html5.64 kB01-06-02 20:06
84.html13.30 kB01-06-02 20:06
85.html7.00 kB01-06-02 20:06
86.html3.90 kB01-06-02 20:06
87.html4.59 kB01-06-02 20:06
88.html5.00 kB01-06-02 20:06
89.html17.71 kB01-06-02 20:06
9.html3.69 kB01-06-02 20:06
90.html6.00 kB01-06-02 20:06
91.html3.68 kB01-06-02 20:06
92.html3.68 kB01-06-02 20:06
93.html12.98 kB01-06-02 20:06
94.html10.60 kB01-06-02 20:06
95.html19.93 kB01-06-02 20:06
96.html10.34 kB01-06-02 20:06
97.html6.27 kB01-06-02 20:06
98.html6.77 kB01-06-02 20:06
99.html16.97 kB01-06-02 20:06
front_matter.html4.16 kB01-06-02 20:06
index.html3.37 kB01-06-02 20:06
new_toc.html15.79 kB01-06-02 20:06
rindex1.html53.02 kB01-06-02 20:05
rindex10.html5.16 kB01-06-02 20:05
rindex11.html3.96 kB01-06-02 20:05
rindex12.html15.07 kB01-06-02 20:05
rindex13.html84.43 kB01-06-02 20:05
rindex14.html10.85 kB01-06-02 20:05
rindex15.html26.04 kB01-06-02 20:05
rindex16.html58.33 kB01-06-02 20:05
rindex17.html3.52 kB01-06-02 20:05
rindex18.html16.83 kB01-06-02 20:05
rindex19.html52.89 kB01-06-02 20:05
rindex2.html13.47 kB01-06-02 20:05
rindex20.html17.55 kB01-06-02 20:05
rindex21.html13.56 kB01-06-02 20:05
rindex22.html10.63 kB01-06-02 20:05
rindex23.html16.92 kB01-06-02 20:05
rindex24.html4.90 kB01-06-02 20:05
rindex25.html2.50 kB01-06-02 20:05
rindex3.html43.59 kB01-06-02 20:06
rindex4.html21.76 kB01-06-02 20:06
rindex5.html22.31 kB01-06-02 20:06
rindex6.html36.57 kB01-06-02 20:06
rindex7.html14.06 kB01-06-02 20:06
rindex8.html11.93 kB01-06-02 20:06
rindex9.html25.33 kB01-06-02 20:06
toc.html24.97 kB01-06-02 20:06
<(ebook>0.00 BPython) O'Reilly
<0672319942>0.00 B25-08-02 17:24
<graphics>0.00 B25-08-02 17:24
<images>0.00 B25-08-02 17:24
<oreillyi>0.00 B25-08-02 17:24
oreillyM.css4.42 kB31-05-02 17:16
oreillyN.css4.42 kB31-05-02 17:16
00.gif41.00 B31-05-02 17:16
01fig01.gif58.03 kB28-01-02 08:54
01fig02.gif39.76 kB28-01-02 08:54
01fig03.gif22.06 kB28-01-02 08:54
01fig04.gif64.75 kB28-01-02 08:54
02fig01.gif54.48 kB28-01-02 08:54
02fig02.gif16.29 kB28-01-02 08:54
02fig03.gif14.11 kB28-01-02 08:54
02fig04.gif47.82 kB28-01-02 08:54
06fig01.gif32.19 kB28-01-02 08:54
06fig02.gif58.39 kB28-01-02 08:54
06fig03.gif48.96 kB28-01-02 08:54
07fig01.gif47.96 kB28-01-02 08:54
07fig02.gif8.99 kB28-01-02 08:54
07fig03.gif9.46 kB28-01-02 08:54
07fig04.gif11.51 kB28-01-02 08:54
07fig05.gif8.27 kB28-01-02 08:54
07fig06.gif31.46 kB28-01-02 08:54
12fig01.gif34.54 kB28-01-02 08:54
15fig01.gif1.30 kB28-01-02 08:54
15fig02.gif1.50 kB28-01-02 08:54
15fig03.gif5.93 kB28-01-02 08:54
15fig04.gif924.00 B28-01-02 08:54
15fig05.gif1.55 kB28-01-02 08:54
15fig06.gif1.62 kB28-01-02 08:54
15fig07.gif1.69 kB28-01-02 08:54
15fig08.gif1.02 kB28-01-02 08:54
15fig09.gif2.54 kB28-01-02 08:54
15fig10.gif1.92 kB28-01-02 08:54
15fig11.gif2.44 kB28-01-02 08:54
15fig12.gif2.48 kB28-01-02 08:54
15fig13.gif2.36 kB28-01-02 08:54
15fig14.gif5.43 kB28-01-02 08:54
15fig15.gif5.51 kB28-01-02 08:54
15fig16.gif2.18 kB28-01-02 08:54
15fig17.gif12.22 kB28-01-02 08:54
16fig01.gif28.99 kB28-01-02 08:54
16fig02.gif23.02 kB28-01-02 08:54
16fig03.gif32.48 kB28-01-02 08:54
16fig04.gif17.27 kB28-01-02 08:54
16fig05.gif23.88 kB28-01-02 08:54
16fig06.gif23.85 kB28-01-02 08:54
16fig07.gif36.76 kB28-01-02 08:54
16fig08.gif20.66 kB28-01-02 08:54
16fig09.gif71.63 kB28-01-02 08:54
16fig10.gif23.23 kB28-01-02 08:54
16fig11.gif22.97 kB28-01-02 08:54
16fig12.gif22.06 kB28-01-02 08:54
16fig13.gif49.73 kB28-01-02 08:54
16fig14.gif60.34 kB28-01-02 08:54
18fig01.gif63.94 kB28-01-02 08:54
18fig02.gif24.07 kB28-01-02 08:54
ccc.gif109.00 B28-01-02 08:54
spacer.gif41.00 B31-05-02 17:16
view.gif41.00 B31-05-02 17:16
0672319942_s.jpg2.91 kB29-01-02 11:29
...
Sponsored links
  • Sent successfully!
  • 1 point

190.html (1.65 MB)

Need 1 point
Your Point(s)

Your Point isn't enough.

Get point immediately by PayPal

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more point

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D