A while back I needed to identify time windows within a time series dataset which led me to discover some interesting SQL functions I was not aware of. In short, what I refer to as a window is a start and end timestamp during which certain defined conditions were true. This post will explore a bit more of the problem definition and go over an example SQL query to tackle it.
I’m using TimescaleDB to store my data, but the query should work for any database supporting SQL queries.
Vehicle Data
Let’s take a vehicle as our example data producer in our time series data. While a vehicle is operational it constantly produces hundreds or thousands of time series data points at various rates from different sensors. For example, information like speed, steering wheel angle, gear position, and wheel speed for each wheel may be reported every 20 milliseconds. Storing this data produces a nice time series dataset and allows us to query when the vehicle was in a certain state. We can use those time results to look further at different sensors that the vehicle may possess like cameras, lidar, and ultrasonic sensors. We can ask and answer questions like: When did the vehicle experience hard braking while turning? When did the vehicle experience wheel slippage? When were the windshield wipers on? The answer to these questions is going to produce sets of windows of time that we can then use to reconstruct the entire state of the vehicle. It’s impossible to answer some of these questions simply by doing camera frame analysis or from lidar.
The Window
We define a window to be a set of start and end timestamps during which a certain condition is true. To bound the windows, this means that we also need to define a maximum gap
of allowed time to pass before the window ends and a new window starts. That is, if more than the allowed gap of time has passed since the condition was true, the next true condition results in starting a new window and closing the last window at the last true condition point.
Having recently started dabbling with Apache Flink, the window in this query can be expressed as a Session Window in the Flink world. The Flink documentation and the following post on Flink Session Windows can provide further examples if needed. The maximum gap is expressed as a “session gap” in the following diagram and each point represents a true condition in our time series data.
Building the Query
Start by identifying when the condition we care about is true. After this stage, we will have all timestamps when the condition we care about is true.
SELECT time
FROM (full_time_series_source)
WHERE (filter_conditions)
Next, we identify how much time has passed between points and store that in a tsDelta
column. We use the LAG()
SQL window function to do so. Also added here though unused, is the total elapsed time since the first condition was met in a total
column. You may have a case that uses this value as well.
SELECT time,
time - LAG(time, 1) OVER w AS tsDelta,
time - FIRST_VALUE(time) OVER w total
FROM (previous_result)
WINDOW w AS (ORDER BY time)
After we have the time delta between points we can begin forming the windows by checking each against our maximum allowed gap of time. We treat points with time deltas smaller than the max allowed gap as part of the same window. Each point is placed in a window identified by windowId
.
SELECT time,
COUNT(CASE WHEN tsDelta >= interval '(max_gap)' THEN 1 END)
OVER (ORDER BY time) AS windowId
FROM (previous_result)
With each point now placed in its appropriate window we GROUP BY
that window ID and find where each window starts and ends.
SELECT windowId, MIN(time) as winstart, MAX(time) as winend
FROM (previous_result)
GROUP BY windowId
The final step is to select the desired shape of our result output.
Full Query
Putting all this together into a query we can run in TimescaleDB. The query uses 60 seconds as the max allowed gap.
WITH time_condition AS (
SELECT time
FROM (MY_TABLE)
WHERE (MY_CONDITIONS) AND (MORE_CONDITIONS)
), time_with_delta AS (
SELECT time,
time - LAG(time, 1) OVER w AS tsDelta,
time - FIRST_VALUE(time) OVER w total
FROM time_condition
WINDOW w AS (ORDER BY time)
), windowid_time AS (
SELECT time,
COUNT(CASE WHEN tsDelta >= interval '60s' THEN 1 END)
OVER (ORDER BY time) AS windowId
FROM time_with_delta
), window_definitions AS (
SELECT windowId, MIN(time) as winstart, MAX(time) as winend
FROM windowid_time
GROUP BY windowId
)
SELECT windowid, (winstart - INTERVAL '10' second) as winstart, winend
FROM window_definitions
WHERE winstart != winend
ORDER BY windowId ASC;
The final SELECT
above will discard single event windows, but of course, can be removed if needed. It also prefixes the window by 10 seconds, but that too will vary on a case-by-case basis.