Zum Inhalt springen

Analyzed Layout and Text Object

aus Wikipedia, der freien Enzyklopädie
Dies ist eine alte Version dieser Seite, zuletzt bearbeitet am 26. November 2015 um 22:51 Uhr durch 50.126.125.240 (Diskussion) (Metadata Encoding and Transmission Standard). Sie kann sich erheblich von der aktuellen Version unterscheiden.

ALTO (Analyzed Layout and Text Object) is an open XML Schema developed by the Library of Congress for OCR text and layout information. It is often used with Metadata Encoding and Transmission Standard (METS).

Structure

An ALTO file consists of three major sections as children of the root <alto> element:[1]

  • <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.
  • <Styles> section contains the text and paragraph styles with their individual descriptions:
    • <TextStyle> has font descriptions
    • <ParagraphStyle> has paragraph descriptions, e.g. alignment information
  • <Layout> section contains the content information. It is subdivided into <Page> elements.
    <?xml version="1.0"?>
    <alto>
      <Description>
        <MeasurementUnit/>
        <sourceImageInformation/>
        <Processing/>
      </Description>
      <Styles>
        <TextStyle/>
        <ParagraphStyle/>
      </Styles>
      <Layout>
        <Page>
          <TopMargin/>
          <LeftMargin/>
          <RightMargin/>
          <BottomMargin/>
          <PrintSpace/>
        </Page>
      </Layout>
    </alto>

See also

References

Vorlage:Reflist

  1. Structure of ALTO Files