Jump to content

Automatic indexing

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Chris Capoccia (talk | contribs) at 02:20, 4 December 2019. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Automatic indexing is the computerized process of scanning large volumes of documents against a controlled vocabulary, taxonomy, thesaurus or ontology and using those controlled terms to quickly and effectively index large electronic document depositories. As the number of documents exponentially increases with the proliferation of the Internet, automatic indexing will become essential to maintaining the ability to find relevant information in a sea of irrelevant information. Automatic indexing is the process of analyzing an item to extract the information to be permanently kept in an index. Keywords must be selected to extract the information.[1]

The automated process can encounter problems and these are primarily caused by two factors: 1) the complexity of the language; and, 2) the lack intuitiveness and the difficulty in extrapolating concepts out of statements on the part of the computing technology.[2] These are primarily linguistic challenges and specific problems involve semantic and syntactic aspects of language.[2] These problems occur based on defined keywords. With these keywords you are able to determine the accuracy of the system based on Hits, Misses, and Noise. These terms relate to exact matches, keywords that a computerized system missed that a human wouldn't, and keywords that the computer selected that a human would not have. The Accuracy statistic based on this should be above 85% for Hits out of 100% for human indexing. This puts Misses and Noise combined to be 15% or less.[1]

History

There are scholars who cite that the subject of automatic indexing attracted attention as early as the 1950s, particularly with the demand for faster and more comprehensive access to scientific and engineering literature.[3] This was highlighted by the information explosion, which was predicted in the 1960s[4] and came about through the emergence of information technology and the World Wide Web. This phenomenon required the development of an indexing system that can cope with the challenge of storing and organizing vast amount of data and can facilitate information access.[5][6] New electronic hardware further advanced automated indexing since it overcame the barrier imposed by old paper archives, allowing the encoding of information at the molecular level.[4] With this new electronic hardware there were tools developed for assisting users. These were used to manage files and were organized into different categories such as PDM Suites like Outlook or Lotus Note and Mind Mapping Tools such as MindManager and Freemind. These allow users to focus on storage and building a cognitive model.[7] The automatic indexing is also partly driven by the emergence of the field called computational linguistics, which steered research that eventually produced techniques such as the application of computer analysis to the structure and meaning of languages.[3][8] Automatic indexing is further spurred by research and development in the area of artificial intelligence and self-organizing system also referred to as thinking machine.[3]

See also

References

  1. ^ a b Hlava, Marjorie M. (31 January 2005). "Automatic Indexing: A Matter of Degree". Bulletin of the American Society for Information Science and Technology. 29 (1): 12–15. doi:10.1002/bult.261.
  2. ^ a b Cleveland, Ana; Cleveland, Donald (2013). Introduction to Indexing and Abstracting: Fourth Edition. Santa Barbara, CA: ABC-CLIO. p. 289. ISBN 9781598849769.
  3. ^ a b c Riaz, Muhammad (1989). Advanced Indexing and Abstracting Practies. Delhi: Atlantic Publishers & Distributors. p. 263.
  4. ^ a b Torres-Moreno, Juan-Manuel (2014). Automatic Text Summarization. Hoboken, NJ: John Wiley & Sons. pp. xii. ISBN 9781848216686.
  5. ^ Kapetanios, Epaminondas; Sugumaran, Vijayan; Natural Language and Information Systems: 13th International Conference on Applications of Natural Language to Information Systems, NLDB 2008 London, UK, June 24-27, 2008, Proceedings, Myra (2008). Natural Language and Information Systems: 13th International Conference on Applications of Natural Language to Information Systems, NLDB 2008 London, UK, June 24-27, 2008, Proceedings. Berlin: Springer Science & Business Media. p. 350. ISBN 3540698574.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  6. ^ Basch, Reva (1996). Secrets of the Super Net Searchers: The Reflections, Revelations, and Hard-won Wisdom of 35 of the World's Top Internet Researchers. Medford, NJ: Information Today, Inc. p. 271. ISBN 0910965226.
  7. ^ Jayaweera, Y. D.; Johar, Md Gapar Md; Perera, S. N. "Open Journal Systems". {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ Armstrong, Susan (1994). Using Large Corpora. Cambridge, MA: MIT Press. p. 291. ISBN 0262510820.