Machine-generated data
Machine Generated Data (MGD), a superset of Computer Generated Data, is the generic term for information which was automatically created from a computer process, application, or other machine without the intervention of a human. While Machine Generated Data can be created due to some action by a human, it excludes data manually entered by an end user[1]. Machine generated data crosses all industry sectors, and humans increasingly generate the data unknowingly [2].
Relevance of Machine Generated Data
Machine generated data tends to be amorphous; typically, the data is never modified. Since the data is created via the same process, it may never need to be updated unless a data quality issue arises. Because of the repeatability and reproducibility of the process, U.S. court systems consider machine generated data as highly reliable. [3]. Thus, one may consider machine generated data to be extremely static once it's produced.
Handling Machine Generated Data
In 2009, Gartner published that data will grow by 650% over the following five years.[4]. Most of the growth in data is the byproduct of machine generated data.[1].
Processing Machine Generated Data
Given the fairly static yet voluminous nature of Machine Generated Data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine generated data is structured[1], so the #ETL processing can be fairly simple. The challenge lies mostly with data analytics. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis. [5]
Examples of Machine Generated Data
- Web logs [6]
- Call detail records [6]
- Financial instrument trades [6]
- Network event logs [6]
- SIEM logs
- Telemetry collected by the government [6]