Note: You should view this document in Internet Explorer5 or greater, or Netscape 6.
The document contains examples of XML and XSL, and many browser versions will not display.
1. Some Benefits of XML
2. Turning existing HTML Documents into XML
3. XML Is a Meta-Language
What does a Web Browser do with an XML Doc?
Exercise 1: What If an XML Doc is Missing an End Tag?
4. How Do We Create an HTML Doc from the XML Doc?
5. What if You Modify XML Doc, But Insert a Typo?
Exercise 2: Add Style to the HTML Doc
References
2000
BMW-M5 5 4.9 400 4.7 13.2 107.4 69400
FIREBIRD-TRANSAM 2 5.6 320 5.0 13.5 107.4 26630
The data is about automobiles, but what do all the numbers mean?
Worse, in 10 years no one may be able to read the file if the documentation of
the fields is lost.
The application required a custom parser, and needed a priori knowledge
of the file format.
With XML, you may create a file to look like this:
<?xml version="1.0"?>
<autos year="2000">
<auto name="BMW M5">
<doors>5</doors>
<engine displacement="4.9"
horsepower="400"/>
<performance>
<zeroto60>4.7</zeroto60>
<quartermile second="13.2"
mph="107.4"/>
</performance>
<price>69400</price>
</auto>
<auto name="FIREBIRD-TRANSAM">
<doors>2</doors>
<engine displacement="5.6"
horsepower="320"/>
<performance>
<zeroto60>5.0</zeroto60>>
<quartermile second="13.5"
mph="107.4"/>
</performance>
<price>26630</price>
</auto>
</autos>
[Above xml document is in this file...]
Since you created these tags like <auto> ,it is clearer what all those fields mean!
Incidentally, XML is case-sensitive, so it matters if you use "auto" or "Auto" or "AUTO".
The XML format is easier to understand, and is self-documenting.
Application programs can easily access file without knowing file's syntax:
What is the price of the FIREBIRD-TRANSAM?
Application programs can be designed to interpret this information, just as a human reader would.
<zeroto60>5.0</zeroto60>
is an element
<zeroto60> is a tag
In <quartermile second="13.5"
mph="107.4">, second and mph are attributes
The application author no longer needs to write a customer parser.
This idea is embodied in the Document Object Model (DOM).
DOM returns 26630 when asked " What is the price of the FIREBIRD-TRANSAM? "
XML may become the language for all data interchange among applications on computers in the Internet - whether for web pages, data base queries, remote procedure calls, or whatever.
Here's an example of applying this html template to the above xml document:
![]()
Click here to read XML in 10 points explained by the standards people at W3C.
<DOCTYPE HTML ...>
with
<?xml version="1.0" standalone="yes"?>
<img src="logo.jpg" alt="logo drawing" />
These elements may require some experimentation. For instance, some
browsers treat <br /> or </hr> the same as <br> or
<hr>.
Others will accept /> if there is a space between it, but not otherwise.
The browsers are not yet compliant with the standards.
HTML browsers may not accept XML style empty elements with a trailing slash
(e.g., <hr/> and are not backward compatible.
If you want your XML document read without parsing, you can add dummy end tags
to empty elements, so <hr> becomes <hr></hr>.
You cannot use XML "raw".
You must define a vocabulary of tags for your use.
You may optionally create a Document Type Definition (DTD) - a grammar telling a parser how to parse a document using your tags.
This "meta-langauge" approach makes the language extensible. People with common interests can define their own markup languages and standardize just within their own community (e.g., chemists). However a schema may require an extension such as Microsoft's MathPlayer to render MathML markups.
An xml parser by default checks that a document is well-formed.
Tags must have a start and matching end. There are two syntaxes; either
Tags must nest. This is ok, and can be mapped to a tree:
Try this:
Save the xml doc above to a local file.
Edit the file to reverse the order of two tags.
Display the result to see what happens.
Edit the file to delete one of the tags you reordered.
Display the result in to see what happens
This is done by creating an HTML file with missing parts (to be filled in by XML), then do one of two things...
Insert scripting into HTML file that populates missing parts.
Create an XSL (Extensible Style Language) file, which transforms XML to HTML.
Here is part of an html file for our xml example:
<table width=100% border=1>
<tr>
<td>Automobile Buyer's Guide for
</td>
<td>Model: </td>
</tr>
<tr>
<td>Body style</td><tr>
<td>Engine displacement</td>
<td> liters</td>
</tr>
</table>
</body>
</htm>
View this file in XML-aware browser
Now we need a way to populate it. So we do this:
//Some stuff to fill in the missing HTML parts
}
</script></head>
<body onload="init()">
...
</body>
</htm>
The BODY tag says to call function init() when the page is first loaded, before display. The JavaScript init() function then creates (via new ActiveXObject("Microsoft.XMLDOM")) an instance of the XLM Document Object Model object.
<tr>
<td>Body style</td><tr>
<td>Engine displacement</td>
// Get the autos object
autos = xml.documentElement;// Get the auto object
auto = xml.getElementsByTagName("auto").item(0);// Get the number of doors
Doors.innerHTML = xml.getElementsByTagName("DOORS").item(0).text; // Get engine info
Engine =
xml.getElementsByTagName("ENGINE").item(0);
Displacement.innerHTML =
Engine.getAttribute("displacement") + " ";
}
Hint: If you want to create xml and html files like this on your server, be sure that you server's configuration files return a mime type of "text/xml" for files ending with the .xml extension. Otherwise when your browser interprets your html file, " xml.load("autos.xml") ;" will either cause an access violation or return an illegal object.
Suppose that you have a bad memory for tag names, and try to create the xml file from memory, and you use <CARS> instead of <autoS>.Will an XML parser like that in IE complain? (Try it!)
To catch illegal tag names, attribute names, or illegal data type of bodies inside tags, you need a grammar.
How Grammars are Specified in XML
The grammar of our XML example can be specified something like this:In XML the grammar is represented by a Document Type Definition (DTD):
<!ELEMENT autos (auto)>
<!ATTLIST autos
year CDATA #REQUIRED>
<!ELEMENT auto (doors, engine, performance, PRICE)>
<!ATTLIST auto
name CDATA #REQUIRED>
<!ELEMENT doors EMPTY>
<!ELEMENT engine EMPTY>
<!ATTLIST engine
displacement CDATA #REQUIRED
horsepower CDATA #REQUIRED>
<!ELEMENT performance (zeroto60, quartermile)>
<!ELEMENT zeroto60 EMPTY>
<!ELEMENT quartermile EMPTY>
<!ATTLIST quartermile
second CDATA #REQUIRED
mpg CDATA #REQUIRED>
<!ELEMENT price EMPTY>[Above DTD is in this file...]
The final step is to add a line to the XML document naming the DTD as the second line:
<?xml version="1.0"?>
<!DOCTYPE BuyerGuide SYSTEM "autos.dtd">Notes on the example DTD:
- <! is an escape character -- it marks something that is not part of the xml doc itself.
Think of <!-- ... --> to mark a comment in HTML!
- !ELEMENT defines the tags in the language -- "autos", "engine", and so on.
- Syntax like
<!ELEMENT auto (doors, engine, performance, price)>
means the child of auto is a doors tag followed by an engine tag, and so on.
- doors+ means one or more instances of the doors tag
- doors?
means zero or one instance of the doors tag- doors* means zero or more instances of the doors tag
- !ATTLIST says what attributes are legal for an element
- !ENTITY introduces identifiers like %AutoName, which are made equivalent to some primitive XML type (like NMTOKEN - a "name token").
- #REQUIRED says whether an attribute is needed or not or has a default and so on.
- #IMPLIED =optional and no default value exists
- #REQUIRED =attribute must be included in every element
- #FIXED =what follows is a value in quotes for the attribute
- Modify the html doc to link to a CSS style sheet that dresses up the doc.
- Add a few more rows to the table with more info from the XML doc (0-60 time, etc.)
Send e-mail to randysmith@mpc.edu