Window
Window is one of the most important tools in stream processing. It splits unbounded stream data into bounded ports. Set operations such as aggregation, sort, and join are only applied to bounded data using windows.
We will explain about windows and related operations in the future but currently you can refer to the following materials to get understand them:
- Streaming 101: The world beyond batch
- Streaming 102: The world beyond batch
- Streaming Systems (book)
- Windows | Apache Flink
Window in SpringQL
Window types
Currently, SpringQL supports the follwing window types:
- Event-time-based:
- fixed window (also called tumbling window)
- sliding window
Count-based windows, global windows, and session windows are not supported.
Window pane
You can create 0 or 1 window in a pump. WINDOW
clause creates a window.
CREATE PUMP p AS
INSERT INTO s2 ...
SELECT ... FROM s1 ...
FIXED WINDOW DURATION_SECS(10);
A window has panes inside. A pane works as a bound of rows from unbounded rows.
You do not explicitly write about panes in SpringQL but SpringQL users may want to know about panes to understand windows' behavior.
When a pane start at for time-based windows?
Users cannot specify the start time of panes.
For example, a pane in a 10-sec window can start from 12:00:00
but cannot from 12:00:05
.
For 1-sec, 2-sec, 3-sec, 5-sec, ... and 60-sec (factors of 60) time-based windows, you may intuitively list start times.
For 7-sec window, when are the start times?
In the current implementation, 1970-01-01 00:00:00
(origin of unixtime) is used as an origin point. So the start times of 7-sec window panes are: 1970-01-01 00:00:00
, 1970-01-01 00:00:07
, ...
Triggers
SpringQL currently provides very limited functionality for triggers.
Downstreams of a window pump observe only the result of the window operation. They cannot see any intermediate values before panes (splits of a window) get closed.
Handling late data in SpringQL
WINDOW
clauses in SpringQL take "allowed latency" argument.
CREATE PUMP p AS
INSERT INTO s2 ...
SELECT ... FROM s1 ...
FIXED WINDOW DURATION_SECS(10), DURATION_SECS(1);
In the above statement, you create 10-sec fixed window allowing 1-sec late data. Late data are allowed to get in a window pane only if their latencies are less than 1 second.
^ At first, no pane is created inside the window.
^ The window gets 12:00:00
row. It updates the watermark and [12:00:00, 12:00:10)
pane is created from the new watermark.
^ 12:00:10
row updates the watermark and creates [12:00:10, 12:00:20)
pane. The [12:00:00, 12:00:10)
pane is not closed by the new watermark because the window allows 1-sec latency.
^ 12:00:05
comes after 12:00:10
but it is in-time due to the 1-sec allowed latency. The row does not update the watermark because the timestamp is less than the watermark.
12:00:11
row updates the watermark and the [12:00:00, 12:00:10)
pane is closed by the new watermark because the row exceeds the allowed latency of the pane. The new watermark creates [12:00:11, 12:00:21)
pane.