wiki:help:xml

Table of Contents

Correctness in an XML document

A XML document is correct, if it is:

  • Well-formed: A well-formed document conforms to all of XML‘s syntax rules. For example, if a non-empty element has an opening tag with no closing tag, it is not well-formed. A document that is not well-formed is not considered to be XML; a parser is required to refuse to process it.
  • Valid: A valid document has data that conforms to a particular set of user-defined content rules that describe correct data values and locations. For example, if an element in a document is required to contain text that can be interpreted as being an integer numeric value, and it instead has the text “hello”, is empty or has other elements in its content, then the document is not valid.


The standard XML-based interchange syntax of Topic Maps is called XML Topic Maps (XTM).

Well-formed documents

An XML document is text, which is a sequence of characters. The specification requires support for Unicode encoding UTF-8 and UTF-16 (UTF-32 is not mandatory). The use of other non-Unicode based encoding, such as ISO-8859, is admitted and is indeed widely used and supported.

A well-formed document must conform to the following rules, among others:

  • One and only one root element exists for the document. However, the XML declaration, processing instructions, and comments can precede the root element.
  • Non-empty elements are delimited by both a start-tag and an end-tag.
  • Empty elements may be marked with an empty-element (self-closing) tag, such as <IAmEmpty/>. This is equal to <IAmEmpty></IAmEmpty>.
  • All attribute values are quoted, either single (’) or double (”) quotes. Single quotes close a single quote and double quotes close a double quote.
  • Tags may be nested but may not overlap. Each non-root element must be completely contained in another element.
  • The document complies to its character set definition. The charset is usually defined in the xml declaration, but it can be provided by the transport protocol, such as HTTP. If no charset is defined, usage of a Unicode encoding is assumed, defined by the Unicode Byte Order Mark. If the mark does not exist, UTF-8 is the default.

Element names are case-sensitive. For example, the following is a well-formed matching pair

    <Step> ... </Step>

whereas this is not

    <Step> ... </step>

The careful choice of names for XML elements will convey the meaning of the data in the markup. This increases human readability while retaining the rigor needed for software parsing.

Choosing meaningful names implies the semantics of elements and attributes to a human reader without reference to external documentation. However, this can lead to verbosity, which complicates authoring and increases file size.

Valid documents

An XML document that complies with a particular schema, in addition to being well-formed, is said to be valid.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic constraints imposed by XML itself. A number of standard and proprietary XML schema languages have emerged for the purpose of formally expressing such schema’s, and some of these languages are XML-based, themselves.

Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers.

XML‘s regular structure and strict parsing rules allow software designers to leave parsing to standard tools, and since XML provides a general, data model-oriented framework for the development of application-specific languages, software designers need only concentrate on the development of rules for their data, at relatively high levels of abstraction.

XTM conform Topic Maps have to comply with the XTM 1.0 Document Type Declaration to be valid.

Take a look at our XTM Tutorial to get an introduction to XTM. —- This site is based on the article „XML“ from the free encyclopedia.Wikipedia** under the terms of the GNU Free Documentation License. In Wikipedia a list of versions and the authors is available.

 
wiki/help/xml.txt · Last modified: 2006/04/04 16:22 by 84.184.169.160