Jump to content

SXML

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 14:23, 10 January 2015 (benefit and criticism, with source). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
SXML
Filename extension
.sxml, .scm
Type codeTEXT
Type of formatmarkup language

SXML is an alternative syntax for writing XML data, using the form of S-expressions, that facilitates working with XML data in Lisp and Scheme. An associated suite of tools implements XPath, SAX and XSLT for SXML in Scheme.[1]

Textual correspondence between SXML and XML for a sample XML snippet is shown below:

XML SXML
<tag attr1="value1"
     attr2="value2">
  <nested>Text node</nested>
  <empty/>
</tag>
(tag (@ (attr1 "value1")
        (attr2 "value2"))
  (nested "Text node")
  (empty))

Compared to other alternative representations for XML and its associated languages, SXML has the benefit of being directly parsable by existing Scheme implementations. The associated tools and documentation were criticized by David Mertz in his IBM developerWorks column for being inconsistent, incomplete and academic in nature.[2]

Example

Take the following simple XHTML page:

 <html xmlns="http://www.w3.org/1999/xhtml"
         xml:lang="en" lang="en">
    <head>
       <title>An example page</title>
    </head>
    <body>
       <h1 id="greeting">Hi, there!</h1>
       <p>This is just an &gt;&gt;example&lt;&lt; to show XHTML &amp; SXML.</p>
    </body>
 </html>

After translating it to SXML, the same page now looks like this:

 (*TOP* (@ (*NAMESPACES* (x "http://www.w3.org/1999/xhtml")))
  (x:html (@ (xml:lang "en") (lang "en"))
    (x:head
       (x:title "An example page"))
    (x:body
       (x:h1 (@ (id "greeting")) "Hi, there")
       (x:p  "This is just an >>example<< to show XHTML & SXML."))))

Each element's tag pair is replaced by a set of parentheses. The tag's name is not repeated at the end, it is simply the first symbol in the list. The element's contents follow, which are either elements themselves or strings. There is no special syntax required for XML attributes. In SXML they are simply represented as just another node, which has the special name of @. This can't cause a name clash with an actual "@" tag, because @ is not allowed as a tag name in XML. This is a common pattern in SXML: anytime a tag is used to indicate a special status or something that is not possible in XML, a name is used that does not constitute a valid XML identifier.

We can also see that there's no need to "escape" otherwise meaningful characters like & and > as &amp; and &gt; entities. All string content is automatically escaped because it is considered to be pure content, and has no tags or entities in it. This also means it is much easier to insert autogenerated content and that there is no danger that we might forget to escape user input when we display it to other users (which could lead to all kinds of nasty cross-site scripting attacks or other annoyances).

SXML shortcomings

SXML can be parsed by a program in any programming language, and then be represented using any desired data structure. Precisely as with XML, implementations vary:[citation needed] XML applications that can process data in a one-pass serial fashion typically use SAX style interfaces that stay very close to the raw input data stream, while applications that must access parts of the data in non-linear random-access fashion use DOM interfaces that mirror the hierarchical structure instead.

It has been claimed[by whom?] that because the underlying structure is based on singly linked lists, nodes have no default access to either the parent node and the siblings nodes, only to their child nodes. But this confuses underlying structure, with a linear representation of a structure. Any disk file is a linear sequence of bytes or characters—but that mundane fact places almost no limits on what structures can be represented.

As a simple example, saying that the following expression's "underlying structure" is either a 21-character string, or a singly-linked list of 11 nodes (4 numbers, 3 arithmetic operators, and 4 grouping delimiters), is at best a gross oversimplification:

   ( 1 + 2 ) * ( 3 + 4 )

Because SXML is so similar to S-expressions syntactically, it is trivial to load it into a LISP or Scheme program just as if it were a generic S-expression. Doing so is utterly trivial to program in such languages, but would lead to each parenthesized group becoming a singly-linked list: a data structure which is far from optimal for kinds of processing commonly anticipated for XML-like structures. Similarly, in any programming language it is trivial to load an entire SXML document into one long string—but it would be a poor choice for most purposes.

In reality, XML, SXML, SGML, or most any data representation is loaded into data structures that facilitate required operations. DOM and other interfaces provide methods to get from an element to its parent, preceding and following siblings, and numbered children directly, and to access attributes by name. Practical DOM implementations make likely operations very fast.[3]

If a program does not do this, then typical operations such as getting the Nth child of an element, or the preceding element in a long list, or the element with a given ID, remain possible but are far from optimal.

Citations

  1. ^ Kiselyov, Oleg; Lisovsky, Kirill (2002). XML, XPath, XSLT Implementations as SXML, SXPath, and SXSLT (PDF). International Lisp Conference.
  2. ^ Mertz, David (23 October 2003). "XML Matters: Investigating SXML and SSAX". IBM developerWorks. Archived from the original on 4 December 2004. Retrieved 10 January 2015.
  3. ^ Steven DeRose. "Architecture and Speed of Common XML Operations." In Proceedings of Extreme Markup Languages. Montreal, 2005.