Canonical XML
此條目目前正依照其他维基百科上的内容进行翻译。 (2010年7月22日) |
Canonical XML is a profile or subset of XML. Any XML document can be converted to Canonical XML, thus normalizing away specific kinds of minor differences while remaining an XML document. Because those specific differences are generally not considered to be "meaningful", converting to Canonical XML is a good way to determine whether two XML documents are logically "the same document" despite differences of detail.
For example, XML permits whitespace to occur at various points within start-tags, and attributes to be specified in any order. Such differences are seldom if ever used to convey meaning, and so these forms are generally considered equivalent:
<p class="a" secure="1">
<p secure = "1" class='a' >
In converting an arbitrary XML document to Canonical XML, attributes are encoded in a normative order (alphabetical by name), and with normative spacing and quoting. Thus, the second form above would be converted to the first.
Canonical XML还说明了一些其他细节,其中的一些是:
- 使用UTF-8编码
- 换行符用0x0A字符表示
- 属性值里的白空格被正规化
- 展开实体引用
- 不用以CDATA标记的部分,替换为相应的字符
- 空元素使用开始/结束对,而不是用特殊的空元素语法
- 缺省属性显示说明
- 删除多余的命名空间声明
讲一个文档转换为规范化形式的XML是幂等的。 也就是说,第一次转换通常产生与原文档不同的字符串,而重复的转换将不再产生变化。
According to the W3C, if two XML documents have the same canonical form, then the two documents are logically equivalent within the given application context (except for limitations regarding a few unusual cases).
However, in a special context users might care about special semantics beyond the generic logical equivalence with which Canonical XML is associated. For example, a steganography system could conceal information in an XML document by varying whitespace, attribute quoting and order, the use of hexadecimal vs. decimal numeric character references, and so on. Obviously converting such a file to Canonical XML would lose those specialized semantics. On the other hand, XML files that differ in their use of upper- vs. lower-case, or that use archaic versus modern spelling, and so on, might be considered equivalent for certain purposes. Such contexts are beyond the scope of Canonical XML.