User:Queue0662/sandbox
Machbase
[edit]Machbase is a time series database[1] which processes large volumes of machine data, generated from various environments. It performs real-time data analysis. This columnar database is optimized for fast time series data entry, analysis and search technology in real-time.
History
[edit]The explosion of data was going to be an issue in the IoT[2] world. Conventional databases have limitations to process this new type of data, time series data. Since this new type of data has its own characteristics (i.e., huge amount of data, Importance of newer data, its repetitive patterns, etc.), new architecture database was needed. In the fall of 2013, a time series database, Machbase was developed and designed to process this new machine data.
Technical Overview
[edit]Real-time data processing
[edit]Current big data[3] solutions use batch type processing and hence, they are slow-going. In contrast, Machbase processes (i.e. collecting, storing, indexing, analyzing and visualizing the massive amount of log data) in real-time. Machbase has the shortest real-time interval. In addition, Machbase has a simple, out-of-box type of installation process. Users can start collecting, storing, indexing, analyzing and visualizing their massive amount of log data in real-time just by installing and setting Machbase.
Simple structure
[edit]In order to reduce, as much as possible, complexity and incompatibility whenever new big data solutions are introduced to existing systems, Machbase does not require three to five tiers of complex structures to build a database management system for collecting, storing, indexing, analyzing and visualizing data.
Conventional SQL syntax supported
[edit]Machbase supports standard SQL syntax, including JOIN, GROUP BY, ORDER BY and HAVING clause. Subqueries (including inline view) and various SQL standard functions can be used in Machbase. It also supports analysis of keywords and patterns. Users can directly use the search commands in SQL syntax without learning new language.
Architecture
[edit]Write Once, Read Many
[edit]Once the log data is inserted into the database, they are rarely changed or deleted due to its nature. In order to preserve the integrity, Machbase is designed not to update data. Thus, there will be no possibility of changing or deleting log data by malicious third parties.
Lockless concurrency control
[edit]One of the most important things to process log data is that INSERT, UPDATE and DELETE operations should be executed without collisions with SELECT operation. In order to avoid this issue, Machbase is designed not to assign locks in connection with SELECT operation. Further, SELECT operation will never be in conflict with other operations.
Machbase has the architecture to index data in near real-time while large volumes of data are inserted per second. This is the crucial feature for time series data analysis as it lays the ground for searching large volumes of data per second.
Real-Time Data Compression
[edit]One of the characteristics of machine data is that they are constantly generated. It is natural that the storage space of the database will become insufficient sooner or later, which means that the database will no longer be able to retain sufficient data to process. In order to compress and store big data without sacrificing performance, Machbase stores data with two methods: physical and logical compression. First, the logical data compression codifies the repeated data, allowing more storage space technology if there are many data with the same values. Second, the physical data compression technology compresses data into fixed sized partitions and records the data onto disks. It saves not only the I/O costs, but also improves the storage efficiency by compressing the data hundreds of times smaller than the original source data.
Unmatched Analytical Performance
[edit]With the analytical technology, Machbase can search and statistically analyze millions of stored data at a high speed while inserting billions of data per second. Machbase shows fast speed in both of insertion and analysis due to the indexing technology and it is sure to play a core role when making business decisions in real-time. Machbase can process two or more indexes with one query. Therefore, if the data are processed in parallel, we can expect even faster performance.
Support SQL Syntax for Time Series Data
[edit]For its nature of log data, the latest data is much more valuable than older ones, and data access for recently generated data is several times more frequent than older data. Machbase offers the following additional benefits for time series data to its users. First, Machbase stores the timestamp in nanosecond precision in the field of “_arrival_time” upon the very moment of records being inserted into its database. Thus, all the data can either be searched by time or be given the specific conditions. Second, when searching data, it outputs the most recent data first. In other words, SELECT operation displays the most recent data first. As mentioned earlier, it is the same results with organizing data by "descendant order by" based on the '_arrival_time' column. Third, it provides a DURATION keyword. SQL provides this function since it is typical to designate a particular time span for analyzing machine data. With this feature, users can easily analyze the data without assigning complicated time operators to the WHERE clause.
Support Selective Delete
[edit]For log data, the DELETE operation is not allowed after data insertion. However, if the DBMS[4] is embedded into appliances, it has a limitation of data storage spaces and the end users neglect to pay attention to the storage. In this case, companies are forced to bear the consequences of "disk full" or other possible errors. To solve this issue, Machbase provides the function to delete the records in a special condition. Thus, companies adopted Machbase embedded on their appliances to easily maintain the size of data at a certain level by using CRON or other programs regularly.
Automated Data Collection
[edit]Machbase provides "COLLECTOR" that automatically collects and transmits log data. It collects structured data such as syslogs and web server logs, but also collects user-defined log format.
Multi-node clusters
[edit]In the fall of 2017, Machbase lnc, launched MACHBASE Enterprise Edition to overcome the limitations of Machbase standard edition. This is the product that can process large volumes of data input and queries in distributed environment in fast input speed. This Enterprise Edition is a structure in which several processes reside in a general-purpose server. Each Machbase process is called a Node. It has dedicated multiple processes to analyze large amounts of data. This multi-node cluster is a new product family from standard edition and is scalable to ensure high-availability in order to store and analyze large amounts of data.
References
[edit]- ^ "Time series database". Wikipedia. 2017-11-27.
- ^ "Internet of things". Wikipedia. 2017-12-06.
- ^ "Big data". Wikipedia. 2017-12-03.
- ^ "Database". Wikipedia. 2017-12-06.