Office Open XML
Office Open XML (commonly abbreviated as OOXML) is a file format specification for the storage of electronic documents such as memos, reports, books, spreadsheets, charts, presentations and word processing documents. The specification was developed by Microsoft for its Microsoft Office 2007 product suite and was standardized by Ecma International as Ecma 376 in December 2006.[1] OOXML is currently undergoing a standardization proces within the international standards organizations ISO or IEC.
Office Open XML format uses a ZIP container for packaging XML and other data files.[2] Microsoft has stated that its primary goal was backward compatibility with existing documents and full support of the feature set of Microsoft Office.[3]
Microsoft has assured the European Union that the Office Open XML standard meets the European Union definitions of an Open Standard, meaning the specification is freely available and implementable by anyone.
File format and structure
The Office Open XML file is an Open Packaging Convention package containing the individual files that form the basis of the document. As well as XML files with Office markup data the ZIP package can also include embedded (binary) files in formats such as PNG, BMP, AVI or PDF .
According to Microsoft, Office Open XML is backward compatible with Microsoft Office versions 2000, XP and 2003 using Microsoft Office Compatibility Pack [4][3].
Document markup languages
Office Open XML is a container format for several specialized XML-based document markup languages, roughly corresponding to individual applications within the Microsoft Office product line:
- WordprocessingML for word processing documents
- SpreadsheetML for spreadsheets
- PresentationML for presentations
- DataDiagramingML for technical diagrams
- FormTemplate for electronic forms
In terms of its schema, Office Open XML can be characterized as being highly generic, highly systematic and with an emphasis on reducing load time. Currently XML based office documents still seem to be a lot slower then binary formats though [5]. For speed, OOXML uses very short element names for common elements and spreadsheets save dates as index numbers (starting from 1899 or from 1904). For systematicness and genericity, OOXML typically uses separate child elements for data and metadata (element names ending in Pr for properties) rather than using multiple attributes, which allows structured properties. OOXML does not use mixed content but uses elements to put a series of text runs (element name r) into paragraphs (element name p). The result is terse and highly nested in contrast to HTML, for example, which is fairly flat, designed for humans to write in text editors and is more or less congenial for humans to read.
Container structure
Office Open XML files conform to the Open Packaging Convention and different applications have characteristic directory structures and file names within these packages. An OPC-aware application will use the relationships files rather than directory names and file names to locate individual files. In OPC terminology, a file is a part. A part also has accompanying metadata, in particular MIME metadata.
A basic Office Open XML file contains an XML file called [Content_Types].xml at the root level of the ZIP package, along with three folders: _rels, docProps, and a directory specific for the document type (for example, in a .docx word processing file that would be a word directory). The word directory contains the document.xml file which is the core content of the document.
- [Content_Types].xml file
- This file describes the content of the ZIP package. It also contains a mapping for file extensions and overrides for specific URIs.
- _rels Folder
- The _rels folders are where one goes to find the relationships for any given part within the package. To find the relationships for a specific part, one looks for the _rels folder that is a sibling of one's part. If the part has relationships, the _rels folder will contain a file that has one's original part name with a .rels appended to it. For example, if the content types part had any relationships, there would be a file called [Content_Types.xml.rels] inside the _rels folder.
- _rels/.rel
- The root level _rels folder always contains a part called .rels. This URI (/_rels/.rels) and /[Content_Types].xml are the only two reserved URIs for parts in files that adhere to Office Open XML conventions. This is where the "package relationships" are located. Whenever one opens a file using these conventions, one always starts by going to the _rels/.rels file. All relationship files are represented with XML. If one opens it in a text editor, one will see a bunch of XML that outlines each relationship for that part. In a minimal word document containing only the basic document.xml, the top level parts are two metadata parts, and the document.xml part.
- word/document.xml
- This is the main part for any Word document. If one views it in an XML editor, one will see a pretty basic XML file. The body of the word processing document is contained in this part.
Relationships
Relationship files in Office Open XML
An example relationship file in Office Open XML (for example word/_rels/document.xml.rels)
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships"> <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/image" Target="http://en.wikipedia.org/images/wiki-en.png" TargetMode="External" /> <Relationship Id="rId2" Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink" Target="http://www.wikipedia.org" TargetMode="External" /> </Relationships>
Relationship files allow navigation of the package without having to open up each part. For example, images that are referenced in a wordDocument can be found in the relationship file by looking for all relationships that are of type http://schemas.microsoft.com/office/2006/relationships/image
. To point to a different image, you just edit the relationship.
Hyperlink relations
The following code shows an example of inline markup for a hyperlink:
<w:hyperlink w:rel="rId2" w:history="1">
In this example, the URL is represented by "rId2". The actual URL is located by the corresponding "rId2" item in the accompanying relationships file. Linked images, templates, and other items are referenced in the same way. The locations of referenced items can be updated by editing the relationships file.
Embedded or linked media file relations
Pictures can be embedded or linked in the XML files using a tag:
<v:imagedata w:rel="rId1" o:title="example" />
This is the reference to the image file. In Office Open XML, all references are done via relationships. For example a document.xml part has a relationship to the image part. The actual URI is located by the corresponding "rId1" item in the accompanying relationships file. There is a _rels folder in the ZIP package, in the same directory as document.xml. Inside _rels is a file called document.xml.rels. In this file there will be a relationship definition that contains a type, an ID and a location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the location will be an internal location within the ZIP package or an external location defined with an URL.
Standardization
Microsoft stated that Office Open XML would be an open standard, and submitted it to the Ecma standardization process. On the 2005-12-08 Ecma created technical committee 45 (TC45); the press release issued by Ecma the following day stated that TC45 was formed to "produce a formal standard for office productivity applications that is fully compatible with the Office Open XML Formats, submitted by Microsoft". The proposal was co-sponsored by Apple Inc., Barclays Capital, BP, the British Library, Essilor, Intel, Microsoft, NextPage, Statoil ASA and Toshiba[6].
The TC45 committee is co-chaired by two Microsoft employees[7]; it also includes members from Apple, Canon, Intel, NextPage, Novell, Pioneer, Statoil ASA, Toshiba and The United States Library of Congress.[1]
At the General Assembly meeting on 2006-12-07, Ecma International approved Office Open XML as an Ecma standard (Ecma 376).[1] The General Assembly also approved submitting the standard for adoption under the ISO/IEC JTC 1 process.
A full copy of Ecma 376 or a copy in bits can be downloaded from Ecma international.
As an ISO external Category A liaison, Ecma have submitted Ecma 376 to the ISO Fast Track process, the same process available to National Standard Organisations. To meet the requirements of this process[8] Ecma have submitted the documents, "Explanatory report on Office Open XML Standard (Ecma-376) submitted to JTC 1 for fast-track"[9] and "Licensing conditions that Microsoft offers for Office Open XML".[10]
The fast track process allows a 30 day review period by national standardizing bodies (NBs), during this period NBs may identify to the JTC 1 Secretariat any perceived contradiction with other JTC 1, ISO or IEC standards. If such a contradiction is alleged, "the JTC 1 Secretariat and ITTF shall make a best effort to resolve the matter" [8]. A package of materials distributed by JTC 1 on Wednesday, February 28 2007 indicates that six nations may have lodged their formal disapproval of fast track consideration, while another five may have expressed concerns but not objections.[11] The full text of the national bodies submissions is available from the ISO/IEC JTC1 SC32 website[1] at http://jtc1sc32.org/doc/recent/JTC001-N-8530.zip
Ecma responded to the issues raised in the contradiction period in a review of the National bodies comments.[12] However, the JTC 1 directives [8] state that regardless of whether or not resolution is reached on the question of contradiction, a five month ballot commences immediately. So, on 2007-04-02 the ISO JTC 1 Secretariat duly informed Ecma International that the 5-month DIS 29500 (Office Open XML) ballot period had started and will end on 2007-09-02.[13]
At the end of the five-month ballot period, national standards bodies have the chance to raise issues at a meeting of a specially-convened "ballot resolution group", whose members are representatives of the national bodies. JTC 1 states that any national body which has voted "no" during the 5-month ballot has a duty to delegate a representative to this meeting. (JTC 1 Directives [8] clause 13.7). During this meeting, points of contention may be resolved by agreeing to alteration of the text, by Ecma providing satisfactory explanatory comments, by withdrawal of an objection, or by otherwise reaching agreement. JTC 1 states that decisions should be reached preferably by consensus, but that any unavoidable votes should be taken according to normal JTC 1 procedures (JTC 1 Directives [8] clause 13.8). According to these procedures a vote is passed if:
- At least two-thirds of the P-members voting shall have approved
- Not more than one-quarter of the total number of votes cast are negative
Abstentions are excluded from the count. (JTC 1 Directives [8] clause 9.6)
If this meeting fails to agree a final text, the proposal of OOXML for fast-tracking fails and the procedure is terminated: if the meeting does agree a text, any required changes are applied by the editor and OOXML is passed for publication as an ISO standard.
Licensing
The Office Open XML format was initially made available under a free and perpetual license[14].
As there was concern that free and open source software (FOSS) could not use the format under the proposed license[15], Microsoft provided a covenant not to sue.[16] The covenant received a mixed reception, with some in the FOSS community identifying problems[17] and others (such as Lawrence Rosen) endorsing it.[18]
Microsoft also added the Office Open XML format to their Microsoft Open Specification Promise in which Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification ("Covered Implementation"). The Office Open XML 1.0 - Ecma 376 and its predecessor Office 2003 XML format are among the covered specifications [19].
The Office Open XML format therefore can be used under any of the free and perpetual license, the covenant not to sue or the open specification promise.
In support of the licensing arrangements Microsoft commissioned an analysis from the London legal firm Baker & Mckenzie.[20]
Microsoft has assured the European Union that the specifications can be implemented by any interested party, including open-source developers, without additional obligations and/or costs.[21]
The Microsoft Open Specification Promise was included in documents submitted to ISO in support of the Ecma 376 fast track submission[10].
Microsoft states that their "Open Specification Promise" does not "apply to things that are merely referenced in the specification [19]. This means that several formats which are said by Microsoft to be not needed to fully implement OOXML [citation needed], but may still be embedded in OOXML, are not patent claim free. During the ISO standards process, the format was criticized for including mentions of the proprietary Windows Metafiles in the specification. Ecma International responded that they were only listed as examples of what could be embedded. However the issue raised by Kenya identified as [KE13] in that ECMA response document [2] is that the various mentions of the Microsoft proprietary sub-formats which are "merely referenced" by OOXML are described by the Kenyan objection as "normative". In standards terminology, "normative" means "considered to be a prescriptive part of the standard". [3]
Adoption
Office Open XML is the default Office 2007 format if macros are not enabled. Microsoft has also released a compatibility pack for older versions.[22] Using the compatibility pack users can create and edit Office Open XML files from within Office 2000, Office XP and Office 2003. The compatibility pack can also be used as a stand alone converter in combination with Office 97.
There is not yet a converter for the Office Open XML format in Microsoft Office for Mac OS. However, on May 10, 2007, Microsoft's Mac OS BU released a standalone beta version of a converter to allow document format conversion. Mac OS BU developers had previously advised users of Office 2007 to save their files in the old Office binary format[23] until a file converter is released. Beta testing has started on Microsoft Office 2008 for Mac users which will support Office Open XML. The final version is scheduled to release in the second half of 2007.[24]
- Corel has announced that by mid-2007 its WordPerfect Office suite will support Office Open XML as well as OpenDocument.[25]
- Novell has created an Office Open XML plugin for OpenOffice.org, the plugin is released as open source software, and will be submitted for inclusion into the OpenOffice.org project.[27]
- Maarten Balliauw has created a set of PHP classes to create SpreadsheetML markup language documents.[28]
- Panergy Ltd. has developed a converter from WordprocessingML markup language to Rich Text Format (RTF). The converter, called docXConverter, allows Word versions that are not supported by Microsoft's compatibility pack, e.g. Word 97, to open OOXML files containing WordprocessingML markup language. DocXConverter can be used to transfer WordprocessingML data to other applications that read RTF data.[29]
- Wouter van Vugt has developed a package explorer that allows you to edit XML parts and validate parts against the Ecma schemas.[30]
- Apple Inc.'s TextEdit will support Office Open XML in the next version of OS X, Leopard.[31]
- Datawatch supports Office Open XML spreadsheets in its report mining tool Monarch v9.0[32]
Criticism
The Office Open XML standard has been the subject of wide and varied debate in the computing industry, with contributions from members of the free software movement, industry analysts[33], independent software vendors, and Microsoft's competitors Sun Microsystems and IBM, many of whom support the OpenDocument format instead.
The essential premise behind some of this criticism, apart from several technical issues, is that Microsoft is attempting to achieve ISO standardization of a proprietary format in order to prevent the widespread adoption of the OpenDocument format, which could threaten the dominance of Microsoft's own Office suite.
Voiced criticisms include:
- Duplication of, overlap with, and inability to merge with the OpenDocument Format. Objectors complain that user confusion regarding the two standards would be even greater because of the similarity of the "Office Open XML" name to both "OpenDocument" and "OpenOffice".[34]
- At 6000 pages long, the specification is too large to evaluate in the 30-day contradiction-only review and the five-month ballot period.[35]
- Reliance on application-defined behaviors to support important functionality that should be documented or supported via existing standards. For example, book 4 section 6.1.2.19 defines the "equationxml" attribute of "shape" elements, "used to rehydrate an equation using the Office Open XML Math syntax"; however, the "actual format of the contents of this attribute are application-defined".[34]
- A serial date format, not in ISO 8601, is used in spreadsheet cells. The format incorrectly treats 1900 as a leap year in order to remain backwards compatible with previous versions of Excel, which also reproduced a bug introduced by the once-dominant Lotus 1-2-3.[36]
- Use of DrawingML and VML instead of SVG, and of a new mathematical format instead of MathML. MathML and SVG are W3C recommendations.
- Internal inconsistencies and omissions. For example, book 4 section 2.18.4 lists styles such as "apples", "scaredCat", and "heebieJeebies", but does not fully define these styles. Missing properties include height, width, color depth, and orientation.[34]
- Inconsistent notations for percentage units. In book 4, section 2.18.85 uses predefined symbols (like "pct15" for 15%) in 5 or 2.5 percent increments, section 2.15.1.95 uses a decimal number giving the percentage, section 2.18.97 uses a number in fiftieths of a percent, and section 5.1.12.41 uses a number in thousandths of a percent.[34]
- Inflexible numbering format. For example, book 4 section 2.18.66 describes a numbering format that is fixed to a few countries and contradicts both the W3C XSLT recommendation and Unicode ISO 10646 standard.[34]
- Non-standard, inflexible paper size naming. For example, book 4 sections 3.3.1.61 define a "paperSize" attribute for which values 1 through 68 are predefined standard paper sizes such as A4 paper.[34]
- Non-standard language codes and color names.[34]
- Non-extensible bitmasks, despite the fact that many element attributes are defined as bitmasks. For example, book 4 section 2.8.2.16 "sig (Supported Unicode Subranges and Code Pages)" describes the <w:sig> element, the attributes of which are all bitmasks.[34]
- Legacy document rendering compatibility is identified using (deprecated) tags. For example, book 4 section 2.15.3.6, "autoSpaceLikeWord95", book 4 section 2.15.3.31, "lineWrapLikeWord6", and "suppressTopSpacingWP" for a 16-year-old version of WordPerfect. These correspond to the options in the "Compatiblity" tab of Word.[34]
Ecma International has responded to these criticisms in an answer to the national bodies represented in the ISO/IEC JTC 1 29500 standardization process. [37] This response document from Ecma International also quotes the objections raised by the national bodies, and shows them to have drawn heavily from existing material on the Web created by opponents of Office Open XML, particularly from the Grokdoc[34] site. This has raised suspicions among some commentators that the national bodies' documents were not written by them, but by Microsoft competitors, and particularly by IBM [38]. Critics of the format are not satisfied by the Ecma response[39].
The standardisation process has continued into its 5 month ballot period, during which, or at the end of which, any raised objections can be addressed.[40].
References
- ^ a b c "Ecma International approves Office Open XML standard" (Press release). Ecma International. December 7 2006. Retrieved 2006-12-08.
{{cite press release}}
: Check date values in:|date=
(help) - ^ Tom Ngo (December 11 2006). "Office Open XML Overview" (PDF). Ecma International. p. 6. Retrieved 2007-01-23.
{{cite web}}
: Check date values in:|date=
(help) - ^ a b "Q&A: Microsoft Co-Sponsors Submission of Office Open XML Document Formats to Ecma International for Standardization". Microsoft PressPass (Press release). Microsoft. November 21 2005. Retrieved 2007-01-23.
{{cite press release}}
: Check date values in:|date=
(help) - ^ "How to use earlier versions of Excel, PowerPoint, and Word to open and save files from 2007 Office programs". Microsoft. Retrieved 2007-02-09.
- ^ George Ou (2007-04-27). "MS Office 2007 versus Open Office 2.2 shootout". ZDnet.com. Retrieved 2007-04-27.
- ^ "The new open standard safeguards the continued use of billions of existing documents". Ecma International. Retrieved 2007-01-28.
- ^ "TC45 - Office Open XML Formats". Ecma International. Retrieved 2007-02-08.
- ^ a b c d e f "ISO/IEC JTC 1 Directives, 5th Edition, Version 2.0". iso. Retrieved 2007-01-28.
- ^ Explanatory report on Office Open XML Standard (Ecma-376) submitted to JTC 1 for fast-track.
- ^ a b Licensing conditions that Microsoft offers for Office Open XML
- ^ Scott M. Fulton, III (2007-03-12). "ISO to Fast-Track Office Open XML Process". Betaworld.
{{cite web}}
: Unknown parameter|accesdate=
ignored (|access-date=
suggested) (help) - ^ "Response Document: National Body Comments from 30-Day Review of the Fast Track Ballot for ISO/IEC DIS 29500 (ECMA-376) "Office Open XML File Formats"" (PDF). Ecma International. 2007-02-28. Retrieved 2007-04-03.
- ^ "Office Open XML reaches next step in ISO/IEC process". Ecma International. 2007-04-02. Retrieved 2007-04-03.
- ^ Paoli, Jean. "Clarification of License Terms for Office XML Schema". Microsoft. Retrieved 2007-01-23.
- ^ "Open XML Incompatible With GPL". eweek. Retrieved 2007-01-29.
- ^ "Microsoft Covenant Regarding Office 2003 XML Reference Schemas". Microsoft. Retrieved 2006-07-11.
- ^ "2 Escape Hatches in MS's Covenant Not to Sue". Groklaw. Retrieved 2007-01-29.
- ^ Berlind, David (November 28 2005). "Top open source lawyer blesses new terms on Microsoft's XML file format". ZDNet. Retrieved 2007-01-27.
{{cite web}}
: Check date values in:|date=
(help) - ^ a b "Microsoft Open Specification Promise". Microsoft. 2006-09-12. Retrieved 2007-04-22.
{{cite web}}
: Cite has empty unknown parameter:|1=
(help) - ^ Baker & McKenzie (2006). "Standardisation and Licensing of Microsoft's Office Open XML Reference Schema" (PDF). Baker & Mckenzie. Retrieved 2007-02-01.
{{cite web}}
: Unknown parameter|month=
ignored (help) - ^ Pan-European eGovernment Services Committee (2006-12-06). "Conclusions and recommendations on Open Document Formats (see §3.3)". IDABC Expert Group (European Union). Retrieved 2006-02-13.
- ^ "Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats". Microsoft. 2006-11-06. Retrieved 2007-11-18.
- ^ sherjo (2006-12-6). "Converters Coming! Free and (Fairly) Fast". The Office for Mac Team Blog.
{{cite web}}
: Check date values in:|date=
(help); Unknown parameter|accassdate=
ignored (help) - ^ Ina Fried (2007-03-30). "Microsoft starts testing Office 2008 for Mac". cnet.com. Retrieved 2007-04-02.
- ^ "Corel WordPerfect Office To Support Open Document Format and Microsoft Office Open XML". corel. Retrieved 2007-01-30.
- ^ "GNOME Office / Gnumeric". GNOME.org. Retrieved 2006-07-28.
- ^ "Download OpenOffice.org–OpenXML translator". Novell. Retrieved 2007-03-02.
- ^ "Office 2007 SpreadsheetML classes in PHP". Retrieved 2007-02-01.
{{cite web}}
: Text "publisher Maarten Balliauw" ignored (help) - ^ "docXConverter - Features". panergy. Retrieved 2007-01-31.
- ^ "Package Explorer V2.0". Wouter van Vugt. Retrieved 2007-01-31.
- ^ "OS X leopard Text Edit to Support Office 2007?". uneasysilence.
{{cite web}}
: Unknown parameter|acessdate=
ignored (|access-date=
suggested) (help) - ^ "Datawatch Announces Availability of Monarch V.9.0; Supports Microsoft® Windows Vista™ and Extends Excel Capabilities". 2007-02-27.
{{cite web}}
: Unknown parameter|Author=
ignored (|author=
suggested) (help); Unknown parameter|Publisher=
ignored (|publisher=
suggested) (help) - ^ David Berlind (December 20 2006). "Most contrived tech awards" (HTML). ZDNet. p. 1. Retrieved 2007-02-05.
{{cite web}}
: Check date values in:|date=
(help) - ^ a b c d e f g h i j "EOOXML objections". grokdoc. Retrieved 2007-01-02.
- ^ "Six thousand pages, one month, no chance..." Retrieved 2007-02-03.
- ^ Spolsky, Joel (2006-06-16). "My First BillG Review". Joel on Software. Retrieved 2007-01-31.
- ^ Ecma International (2007-03-02). ""Response document - National Body Comments from 30-Day Review of the Fast Track Ballot for ISO/IEC DIS 29500 (ECMA-376)"Office Open XML File Formats""" (PDF). computerworld.com.
- ^ Brian Jones. "A few updates on the OpenXML formats". Retrieved 2007-05-04.
- ^ Edward Macnaghten (2007-03-03). ""When is a standard not a standard?"". Free Software Magazine.
- ^ Eric Lai (2007-03-12). ""Microsoft guns Open XML onto ISO fast track: Agreement over the weekend could lead to vote by August"". computerworld.com.
See also
- List of document markup languages
- Comparison of document markup languages
- Microsoft Office 2003 XML formats
- Comparison of OpenDocument and Office Open XML formats
- Comparison of OpenDocument and Office Open XML licensing
External links
General Office Open XML
- OpenXMLDeveloper.org, Microsoft's Office Open XML site for developers
- Open XML Community site Office Open XML for Microsoft customers and partners
- "MS Fights to Own Your Office Docs", Wired article on Office Open XML
- Template:PDFlink
- ExcelPackage, Open source, server-side creation of Excel 2007 files (SpreadsheetML)
OOXML criticism
- Template:PDFlink
- EOOXML objections on Grokdoc
- Groklaw on OOXML
- "How to hire Guillaume Portes" Redux
- FFII opposes Fasttrack adoption of Microsoft OOXML format as ISO standard
Converters
- opensource Office Open XML to ODF translator add-in for Microsoft Office XP, 2003 and 2007
- docXConverter from Panergy
- docx2doc Convert from docx to doc online
- Open XML Writer OpenXML Writer is an opensource text editor for creating Office Open XML Wordprocessing files (.docx)
- Microsoft Office Compatibility Pack Microsoft's official converter for Office 2000, XP and 2003