window：时间窗口分组

速查结论

window(time_column, window_duration[, slide_duration[, start_time]]) 是 Spark SQL 中用于按时间窗口将行分组，支持滚动窗口和滑动窗口的函数。

语法

window(time_column, window_duration[, slide_duration[, start_time]])

参数说明

参数	类型	说明
time_column	TimestampType	用作按时间窗口化的时间戳的列或表达式。时间列必须是 `TimestampType` 类型。
window_duration	STRING	指定窗口的宽度，表示为"间隔值"（有关更多详细信息，请参见间隔文字）。注意，持续时间是固定长度的时间，并且不会根据日历而随时间变化。
slide_duration	STRING	指定窗口的滑动间隔，表示为"间隔值"。每个 `slide_duration` 将生成一个新窗口。必须小于或等于 `window_duration`。同样，这个持续时间是绝对的，并且不会根据日历而变化。
start_time	STRING	相对于 UTC 1970-01-01 00:00:00 的偏移量，用以开始窗口间隔。例如，如果要使每小时的滚动窗口从小时的15分钟开始，例如 12:15-13:15，13:15-14:15...，则提供 `start_time` 为 15 分钟。

窗口的开始是包含的，但结束是排除的，例如 12:05 将在窗口 [12:05,12:10) 中，但不在 [12:00,12:05) 中。窗口可以支持微秒精度。不支持按月顺序的窗口。

Examples

> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start;
  A1    2021-01-01 00:00:00 2021-01-01 00:05:00 2
  A1    2021-01-01 00:05:00 2021-01-01 00:10:00 1
  A2    2021-01-01 00:00:00 2021-01-01 00:05:00 1
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start;
  A1    2020-12-31 23:55:00 2021-01-01 00:05:00 2
  A1    2021-01-01 00:00:00 2021-01-01 00:10:00 3
  A1    2021-01-01 00:05:00 2021-01-01 00:15:00 1
  A2    2020-12-31 23:55:00 2021-01-01 00:05:00 1
  A2    2021-01-01 00:00:00 2021-01-01 00:10:00 1

常见报错与避坑指南

不支持月级窗口：window 不支持按月顺序的窗口（如 "1 month"），因为月份的持续时间在日历上是不固定的。需使用固定长度的间隔值（如天、小时、分钟）。
滑动窗口重叠：当 slide_duration 小于 window_duration 时，同一条记录会落入多个窗口，导致聚合结果中出现重复计数。请根据业务需求选择合适的滑动间隔。
window.start 和 window.end 的访问：在 GROUP BY 中使用 window() 后，需通过 window.start 和 window.end 来获取窗口的起止时间。窗口区间为左闭右开 [start, end)。

Since: 2.0.0

📱关注公众号

「数据仓库技术」文章同步更新，不错过每一篇干货

💬加群交流

备注「数据仓库技术」加入社群，每日一道大厂SQL真题

语法​

参数说明​

Examples​

常见报错与避坑指南​

你可能还想看

语法

参数说明

Examples

常见报错与避坑指南