Document processing
Document processing is a field of research and a set of production processes aimed at making an analog document digital. Document processing does not simply aim to photograph or scan a document to obtain a digital image, but also to make it digitally intelligible. This includes extracting the structure of the document or the layout and then the content, which can take the form of text or images. The process can involve traditional computer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related to semantic segmentation, object detection, optical character recognition (OCR), handwritten text recognition (HTR) and, more broadly, transcription, whether automatic or not.[1] The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using natural language processing (NLP) or image classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog archives and historical documents.
Background
Document processing was initially as is still to some extend a kind of production line work dealing with the treatment of documents, such as letters and parcels, in an aim of sorting, extracting or massively extracting data. This work could be performed in-house or through business process outsourcing.[2][3] Document processing can indeed involve some kind of externalized manual labor, such as mechanical turk[disambiguation needed].
As an example of manual document processing, as relatively recent as 2007,[4] document processing for "millions of visa and citizenship applications" was about use of "approximately 1,000 contract workers" working to "manage mailroom and data entry."
While document processing involved data entry via keyboard well before use of a computer mouse or a computer scanner, a 1990 New York Times article regarding what it called the "paperless office" stated that "document processing begins with the scanner."[5]. In this context, a former Xerox Vice-president, Paul Strassman, expressed a critical opinion, saying that computers add rather than reduce the volume of paper in an office.[5] It was said that the engineering and maintenance documents for an airplane weigh "more than the airplane itself"[citation needed].
Automatic document processing
As the state of the art advanced, document processing transitioned to handling "document components ... as database entities."[6]
A technology called automatic document processing or sometimes intelligent document processing emerged as a specific form of Intelligent Process Automation (IPA), combining artificial intelligence such as Machine Learning (ML), Natural Language Processing (NLP) or Intelligent Character Recognition (ICR) to extract data from several types documents.[7][8]
Application
Automatic document processing applies to a whole range of documents, whether structured or not. For instance, in the world of business and finance, technologies may be used to process paper-based invoices, forms, purchase orders, contracts, and currency bills[9].
In medicine, document processing methods have been developed to facilitate patient follow-up and streamline administrative procedures, in particular by digitizing medical or laboratory analysis reports. The goal is also to standardize medical databases[10]. Algorithms are also directly used to assist the physicians in medical diagnosis, e.g. by analyzing magnetic resonance images[11].
See also
- Document automation
- Document modelling
- Data Processing
- Document Imaging
- Duplex scanning
- Text mining
- Workflow
References
- ^ Len Asprey; Michael Middleton (2003). Integrative Document & Content Management: Strategies for Exploiting Enterprise Knowledge. Idea Group Inc (IGI). ISBN 9781591400554.
- ^ Vinod V. Sople (2009-05-25). Business Process Outsourcing: A Supply Chain of Expertises. PHI Learning Pvt. Ltd. ISBN 978-8120338159.
- ^ Mark Kobayashi-Hillary (2005-12-05). Outsourcing to India: The Offshore Advantage. Springer Science & Business Media. ISBN 9783540247944.
- ^ Julia Preston (December 2, 2007). "Immigration Contractor Trims Wages". The New York Times.
- ^ a b Lawrence M. Fisher (July 7, 1990). "Paper, Once Written Off, Keeps a Place in the Office". The New York Times.
- ^ Al Young; Dayle Woolstein; Jay Johnson (February 1996). "Unknown Title". Object Magazine. p. 51.
- ^ "Intelligent Document processing by Floriana Esposito , Stefano Ferilli , Teresa M. A. Basile , Nicola Di Mauro" (PDF). Department of Computer Science – University of Bari. 2005-04-07. Retrieved 2018-09-08.
- ^ Floriana Esposito , Stefano Ferilli , Teresa M. A. Basile , Nicola Di Mauro (2005-04-01). "Intelligent Document Processing" in Proceedings. Eighth International Conference on Document Analysis and Recognition, Seoul, South Korea, 2005 pp. 1100-1104. doi: 10.1109/ICDAR.2005.144.
{{cite book}}
: CS1 maint: multiple names: authors list (link) - ^ US active US7873576B2, John E. Jones; William J. Jones & Frank M. Csultis, "Financial document processing system", published 2011-01-18, issued 2011-01-18
- ^ Adamo, Francesco; Attivissimo, Filippo; Di Nisio, Attilio; Spadavecchia, Maurizio (February 2015). "An automatic document processing system for medical data extraction". Measurement. 61: 88–99. doi:10.1016/j.measurement.2014.10.032. Retrieved 31 January 2021.
- ^ Changwan, Kim; Seong-Il, Lee; Won Joon, Cho (September 2020). "Volumetric assessment of extrusion in medial meniscus posterior root tears through semi-automatic segmentation on 3-tesla magnetic resonance images". Orthopaedics & Traumatology: Surgery & Research. 101 (5): 963–968. doi:10.1016/j.rcot.2020.06.003. Retrieved 31 January 2021.