Jump to content

Relational data stream management system

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Purple Data (talk | contribs) at 11:08, 7 October 2014 (Added the use case for SQL as a processing language for streaming data as well as static data. Same construct and standards as explained on the SQL page, but with different execution behavior.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based.

RDSMS SQL Query Examples

Continuous SQL queries in a RDSMS conform to the ANSI SQL standards. The most common RDSMS SQL query is performed with the declarative

SELECT

statement. A continuous SQL

SELECT

operates on data across one or more data streams, with optional keywords and clauses that include

FROM

with an optional

JOIN

subclause to specify the rules for joining multiple data streams, the

WHERE

clause and comparison predicate to restrict the records returned by the query,

GROUP BY

to project streams with common values into a smaller set,

HAVING

to filter records resulting from a

GROUP BY

, and

ORDER BY

to sort the results. The following is an example of a continuous data stream aggregation using a

SELECT

query that aggregates a sensor stream from a weather monitoring station. The

SELECT

query aggregates the minimum, maximum and average temperature values over a one second time period, returning a continuous stream of aggregated results at one second intervals.

SELECT STREAM
    FLOOR(WEATHERSTREAM.ROWTIME to SECOND) AS FLOOR_SECOND,
    MIN(TEMP) AS MIN_TEMP,
    MAX(TEMP) AS MAX_TEMP,
    AVG(TEMP) AS AVG_TEMP
FROM WEATHERSTREAM
GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO SECOND);

RDSMS SQL queries also operate on data streams over time or row-based windows. The following example shows a second continuous SQL query using the

WINDOW

clause with a one second duration. The

WINDOW

clause changes the behavior of the query, to output a result for each new record as it arrives. Hence the output is a stream of incrementally updated results with zero result latency.

SELECT STREAM
    ROWTIME,
    MIN(TEMP) OVER W1 AS WMIN_TEMP,
    MAX(TEMP) OVER W1 AS WMAX_TEMP,
    AVG(TEMP) OVER W1 AS WAVG_TEMP
FROM WEATHERSTREAM
WINDOW W1 AS ( RANGE INTERVAL '1' SECOND PRECEDING );

See also