Jump to content

Machine-generated data

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Catch22uofi (talk | contribs) at 22:50, 20 December 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Machine Generated Data (MGD) is the generic term for information which was automatically created from a computer process, application, or other machine without the intervention of a human. While Machine Generated Data can be created due to some action by a human, it excludes data manually entered by an end user[1]. Machine generated data crosses all industry sectors, and humans increasingly generate the data unknowingly [2].

Relevance of Machine Generated Data

Machine generated data tends to be amorphous; typically, users never modify this data. Machines often generate this data as a consistent response to an event which occurred. Since the event is historical, the data is less prone to updates and modifications. Partly because of this quality, the U.S. court systems consider machine generated data as highly reliable.[3].

Handling Machine Generated Data

In 2009, Gartner published that data will grow by 650% over the following five years.[4]. Most of the growth in data is the byproduct of machine generated data.[1].

Processing Machine Generated Data

Given the fairly static yet voluminous nature of Machine Generated Data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine generated data is structured[1], so the ETL processing can be fairly simple. The challenge lies mostly with data analytics. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.[5]

Examples of Machine Generated Data

  • Web logs [6]
  • Call detail records [6]
  • Financial instrument trades [6]
  • Network event logs [6]
  • SEIM logs
  • Telemetry collected by the government [6]

Notes

Reference List

  1. ^ a b c Monash, Three Broad Categories of Data
  2. ^ Deloach, Machine Generated Data
  3. ^ Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay
  4. ^ ScienceLogic
  5. ^ Wikipedia, Column Oriented DBMS
  6. ^ a b c d e Monash, Examples of Machine Generated Data

Bibliography