Machine-generated data

Machine Generated Data (MGD), a superset of Computer Generated Data, is the generic term for information which was automatically created from a computer process, application, or other machine without the intervention of a human. While Machine Generated Data can be created due to some action by a human, it excludes data manually entered by an end user^[1]. Machine generated data crosses all industry sectors, and humans increasingly generate the data unknowingly ^[2].

Relevance of Machine Generated Data

Machine generated data tends to be amorphous; typically, the data is never modified. Since the data is created via the same process, it may never need to be updated unless a data quality issue arises. Because of the repeatability and reproducibility of the process, U.S. court systems consider machine generated data as highly reliable. ^[3]. Thus, one may consider machine generated data to be extremely static once it's produced.

Handling Machine Generated Data

In 2009, Gartner published that data will grow by 650% over the following five years.^[4]. Most of the growth in data is the byproduct of machine generated data.^[1].

Processing Machine Generated Data

Given the fairly static yet voluminous nature of Machine Generated Data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine generated data is structured^[1], so the #ETL processing can be fairly simple. The challenge lies mostly with data analytics. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis. ^[5]

Examples of Machine Generated Data

Web logs ^[6]
Call detail records ^[6]
Financial instrument trades ^[6]
Network event logs ^[6]
SIEM logs
Telemetry collected by the government ^[6]

Notes

Reference List

^ ^a ^b ^c Monash, Three Broad Categories of Data
^ Deloach, Machine Generated Data
^ Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay
^ ScienceLogic
^ Wikipedia, Column Oriented DBMS
^ ^a ^b ^c ^d ^e Monash, Examples of Machine Generated Data

Bibliography

[dbms2example-1] Monash, Three Broad Categories of Data

[2] Deloach, Machine Generated Data

[3] Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay

[sciencelogic-4] ScienceLogic

[5] Wikipedia, Column Oriented DBMS

[monashexamples-6] Monash, Examples of Machine Generated Data

[1]

[2]

[3]

[4]

[5]

[6]