Craig S. Mullins
              
Database Performance Management

Return to Home Page

December 1999

 

XML Marks The Spot

By Craig S. Mullins

XML is getting a lot of publicity these days. If you believe everything you read, then XML is going to solve all of our interoperability problems, completely replace SQL, and possibly even deliver world peace. Okay, that last one is an exaggeration, but you get the point. In actuality, XML stands for eXtensible Markup Language. The need for extensibility, structure, and validation is the basis for the evolution of the web towards XML. XML, like HTML, is based upon SGML (Standard Generalized Markup Language) which allows documents to be self-describing, through the specification of tag sets and the structural relationships between the tags. HTML is a small, specifically defined set of tags and attributes, enabling users to bypass the self-describing aspect for a document. XML, on the other hand, retains the key SGML advantage of self-description, while avoiding the complexity of full-blown SGML.

So What?

XML allows tags to be defined by users that describe the data in the document. This capability provides users a means to describe the structure and nature of the data in the document. In essence, the document becomes self-describing.

The simple syntax of XML makes it easy to process by machine while remaining understandable to humans. HTML uses tags to describe the appearance of data on a page. For example the tag, “<b> text </b>”, would specify that the “text” data should appear in bold face. XML uses tags to describe the data itself, instead of its appearance. For example, consider the following XML describing a customer address:

<CUSTOMER>
<first_name>Craig</first_name>
<middle_initial>S.</middle_initial>
<last_name>Mullins</last_name>
<company_name>BMC Software, Inc.</company_name>
<street_address>2101 CityWest Blvd.</street_address>
<city>Houston</city>
<state>TX</state>
<zip_code>77042</zip_code>
<country>U.S.A.</country>
</CUSTOMER>

XML is actually a meta language for defining other markup languages. These languages are collected in dictionaries called Document Type Definitions (DTDs). The DTD stores definitions of tags for specific industries or fields of knowledge. So, the meaning of a tag must be defined in a "document type declaration" (DTD), such as:

<!DOCTYPE CUSTOMER [
<!ELEMENT PRODUCT (first_name, middle_initial, last_name, company_name, street_address, city, state, zip_code, country*)>
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_initial (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT company_name (#PCDATA)>
<!ELEMENT street_address (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip_code (#PCDATA)>
<!ELEMENT country (#PCDATA)>
]

The DTD for an XML document can be either part of the document or stored in an external file. The XML code samples shown are meant to be examples only. By examining them you can quickly see how the document itself describes its contents. For data management professionals, this is beneficial because it removes the trouble of trying to track down the meaning of data elements. One of the biggest problems associates with database management and processing is tracking down and maintaining the meaning of stored data. If the data can be stored in documents using XML, the documents themselves will describe their data content.

The important thing to remember about XML is that it solves a different problem than HTML. HTML is a markup language, but XML is a meta-language. In other words, XML is a language that generates other kinds of languages. The idea is to use XML to generate a language specifically tailored for each requirement you encounter. It is essential that you understand this paradigm shift in order for you to understand the power of XML.

Some Skepticism

However, there are some problems with XML. For example, standard web browsers do not currently understand the descriptive tags. This problem will be alleviated in time as XML-capable web browsers come to market.

Another problem with XML is not really the fault of XML, but of market hype. There is a lot of confusion surrounding XML in the industry. Some folks believe that XML will provide metadata where none currently exists or that XML will replace SQL as a data access method for relational data. Neither of these assertions are true.

There is no way that any technology, XML included, can conjure up information that does not exist. Humans must create the metadata tags in XML for the data to be described. XML enables self-describing documents. It does not describe your data for you.

And XML does not do what SQL does. Hence, XML cannot replace SQL. SQL is the standard access method for relational data. It is used to “tell” a relational DBMS what data is to be retrieved. XML is a document description language. It describes the contents of data. XML may be useful for defining databases, but not for accessing them.

Summary

But skepticism aside, XML is definitely the wave of the immediate future. The future of the web will be defined using XML. The benefits of self-describing documents are just too many for XML to be ignored. Furthermore, being able to use XML to generate an application-specific language is powerful. This capability will drive XML to the forefront of computing.

 

 

From Database Trends, December 1999.
 
© 1999 Craig S. Mullins,  All rights reserved.
Home.