Comparison of data-serialization formats

A number of positive or negative have been leveled at XML (Extensible Markup Language), a general-purpose specification for creating custom markup languages.

Commentators have offered various positive or negative critiques of XML (Extensible Markup Language), suggesting circumstances where XML provides both advantages and potential disadvantages.^[1]

Advantages

XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations.^[2] XML data is stored in plain text format.^[3] This software- and hardware-independent way of storing data allows different incompatible systems to share data without needing to pass them through many layers of conversion. This also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing any data.
With XML, your data can be available to all kinds of "reading machines" (handheld computers, voice machines, news feeds, etc), and make it more available for blind people, or people with other disabilities.^[3]
XML provides a gateway for communication between applications, even applications on wildly different systems. As long as applications can share data (through HTTP, file sharing, or another mechanism), and have an XML parser, they can share structured information that is easily processed. Databases can trade tables, business applications can trade updates, and document systems can share information.^[2]
It supports Unicode, allowing almost any information in any written human language to be communicated.
It can represent common computer science data structures: records, lists and trees.
Its self-documenting format describes structure and field names as well as specific values.
The strict syntax and parsing requirements make the necessary parsing algorithms extremely simple, efficient, and consistent.
Content-based XML markup enhances searchability, making it possible for agents and search engines to categorize data instead of wasting processing power on context-based full-text searches.^[2]
XML is heavily used as a format for document storage and processing, both online and offline.
It is based on international standards.
It can be updated incrementally.
It allows validation using schema languages such as XSD and Schematron, which makes effective unit-testing, firewalls, acceptance testing, contractual specification and software construction easier.
The hierarchical structure is suitable for most (but not all) types of documents.
It is platform-independent, thus relatively immune to changes in technology.
Forward and backward compatibility are relatively easy to maintain despite changes in DTD or Schema.
Its predecessor, SGML, has been in use since 1986, so there is extensive experience and software available.

Disadvantages

It is difficult for the end-user to understand its capabilities.
XML syntax is redundant or large relative to binary representations of similar data,^[4] especially with tabular data.
The redundancy may affect application efficiency through higher storage, transmission and processing costs.^[5]^[6]
XML syntax is verbose, especially for human readers, relative to other alternative 'text-based' data transmission formats.^[7]^[8]
The hierarchical model for representation is limited in comparison to an object oriented graph.^[9]^[10]
Expressing overlapping (non-hierarchical) node relationships requires extra effort.^[11]
XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser.^[12]
XML is commonly depicted as "self-documenting" but this depiction ignores critical ambiguities.^[13]^[14]
The distinction between content and attributes in XML seems unnatural to some and makes designing XML data structures harder.^[15]
Transformations, even identity transforms, result in changes to format (whitespace, attribute ordering, attribute quoting, whitespace around attributes, newlines). These problems can make diff-ing the XML source very difficult except via Canonical XML.
XML encourages the use of non-relational data structures (data non-normalized).

^ (See e.g., XML-QL Proposal discussing XML benefits, When to use XML, "XML Sucks" on c2.com, Daring to Do Less with XML)
^ ^a ^b ^c Cite error: The named reference simonlwhyxml was invoked but never defined (see the help page).
^ ^a ^b "How Can XML be Used?". W3schools.com. Retrieved 2009-07-31.
^ Harold, Elliotte Rusty (2002). Processing XML with Java(tm): a guide to SAX, DOM, JDOM, JAXP, and TrAX. Addison-Wesley. ISBN 0201771861.XML documents are too verbose compared with binary equivalents.
^ Harold, Elliotte Rusty (2002). XML in a Nutshell: A Desktop Quick Reference. O'Reilly. ISBN 0596002920. XML documents are very verbose and searching is inefficient for high-performance largescale database applications.
^ However, the Binary XML effort strives to alleviate these problems by using a binary representation for the XML document. For example, the Java reference implementation of the Fast Infoset standard parsing speed is better by a factor 10 compared to Java Xerces, and by a factor 4 compared to the Piccolo driver, one of the fastest Java-based XML parser [1].
^ Bierman, Gavin (2005). Database Programming Languages: 10th international symposium, DBPL 2005 Trondheim, Norway. Springer. ISBN 3540309519.XML syntax is too verbose for human readers in for certain applications. Proposes a dual syntax for human readability.
^ Although many purportedly "less verbose" text formats actually cite XML as both inspiration and prior art. See e.g., http://yaml.org/spec/current.html, http://innig.net/software/sweetxml/index.html, http://www.json.org/xml.html.
^ A hierarchical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors, but not both.
^ Lim, Ee-Peng (2002). Digital Libraries: People, Knowledge, and Technology. Springer. ISBN 3540002618.Discusses some of the limitation with fixed hierarchy. Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002, held in Singapore in December 2002.
^ Searle, Leroy F. (2004). Voice, text, hypertext: emerging practices in textual studies. University of Washington Press. ISBN 0295983051. Proposes an alternative system for encoding overlapping elements.
^ (See e.g., http://www-128.ibm.com/developerworks/library/x-abolns.html )
^ "The Myth of Self-Describing XML" (PDF). Retrieved 2007-05-12.
^ (See e.g., Use–mention distinction, Naming collision, Polysemy)
^ "Does XML Suck?". Retrieved 2007-12-15.(See "8. Complexity: Attributes and Content")

[CriticSeeAlso-1] (See e.g., XML-QL Proposal discussing XML benefits, When to use XML, "XML Sucks" on c2.com, Daring to Do Less with XML)

[simonlwhyxml-2] Cite error: The named reference simonlwhyxml was invoked but never defined (see the help page).

[w3chowxmluse-3] "How Can XML be Used?". W3schools.com. Retrieved 2009-07-31.

[Elliotte001-4] Harold, Elliotte Rusty (2002). Processing XML with Java(tm): a guide to SAX, DOM, JDOM, JAXP, and TrAX. Addison-Wesley. ISBN 0201771861.XML documents are too verbose compared with binary equivalents.

[Elliotte000-5] Harold, Elliotte Rusty (2002). XML in a Nutshell: A Desktop Quick Reference. O'Reilly. ISBN 0596002920. XML documents are very verbose and searching is inefficient for high-performance largescale database applications.

[However000-6] However, the Binary XML effort strives to alleviate these problems by using a binary representation for the XML document. For example, the Java reference implementation of the Fast Infoset standard parsing speed is better by a factor 10 compared to Java Xerces, and by a factor 4 compared to the Piccolo driver, one of the fastest Java-based XML parser [1].

[Bierman000-7] Bierman, Gavin (2005). Database Programming Languages: 10th international symposium, DBPL 2005 Trondheim, Norway. Springer. ISBN 3540309519.XML syntax is too verbose for human readers in for certain applications. Proposes a dual syntax for human readability.

[VerbRebut000-8] Although many purportedly "less verbose" text formats actually cite XML as both inspiration and prior art. See e.g., http://yaml.org/spec/current.html, http://innig.net/software/sweetxml/index.html, http://www.json.org/xml.html.

[TreeLimit000-9] A hierarchical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors, but not both.

[Lim000-10] Lim, Ee-Peng (2002). Digital Libraries: People, Knowledge, and Technology. Springer. ISBN 3540002618.Discusses some of the limitation with fixed hierarchy. Proceedings of the 5th International Conference on Asian Digital Libraries, ICADL 2002, held in Singapore in December 2002.

[Searle000-11] Searle, Leroy F. (2004). Voice, text, hypertext: emerging practices in textual studies. University of Washington Press. ISBN 0295983051. Proposes an alternative system for encoding overlapping elements.

[Names000-12] (See e.g., http://www-128.ibm.com/developerworks/library/x-abolns.html )

[selfdesc000-13] "The Myth of Self-Describing XML" (PDF). Retrieved 2007-05-12.

[14] (See e.g., Use–mention distinction, Naming collision, Polysemy)

[XMLSuck8-15] "Does XML Suck?". Retrieved 2007-12-15.(See "8. Complexity: Attributes and Content")

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]