Jump to content

Analyzed Layout and Text Object

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Addbot (talk | contribs) at 10:17, 22 March 2013 (Bot: Migrating 1 interwiki links, now provided by Wikidata on d:q2819247). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

ALTO is an open XML standard to describe OCR text and layout information of printed documents. It is often used with METS standard.

Structure

An ALTO file consists of three major sections as children of the root <alto> element:[1]

  • <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.
  • <Styles> section contains the text and paragraph styles with their individual descriptions:
    • <TextStyle> has font descriptions
    • <ParagraphStyle> has paragraph descriptions, e.g. alignment information
  • <Layout> section contains the content information. It is subdivided into <Page> elements.

   <?xml version="1.0"?>
   <alto>
     <Description>
       <MeasurementUnit/>
       <sourceImageInformation/>
       <Processing/>
     </Description>
     <Styles>
       <TextStyle/>
       <ParagraphStyle/>
     </Styles>
     <Layout>
       <Page>
         <TopMargin/>
         <LeftMargin/>
         <RightMargin/>
         <BottomMargin/>
         <PrintSpace/>
       </Page>
     </Layout>
   </alto>

See also

References