面试真题
字节跳动
1.最高峰同时直播人数

字节跳动大数据面试SQL-最高峰同时直播人数

一、题目

有如下数据记录直播平台主播上播及下播时间,根据该数据计算出平台最高峰同时直播人数。

+----------+----------------------+----------------------+
| user_id  |      start_time      |       end_time       |
+----------+----------------------+----------------------+
| 1        | 2024-04-29 01:00:00  | 2024-04-29 02:01:05  |
| 2        | 2024-04-29 01:05:00  | 2024-04-29 02:03:18  |
| 3        | 2024-04-29 02:00:00  | 2024-04-29 04:03:22  |
| 4        | 2024-04-29 03:15:07  | 2024-04-29 04:33:21  |
| 5        | 2024-04-29 03:34:16  | 2024-04-29 06:10:45  |
| 6        | 2024-04-29 05:22:00  | 2024-04-29 07:01:08  |
| 7        | 2024-04-29 06:11:03  | 2024-04-29 09:26:05  |
| 3        | 2024-04-29 08:00:00  | 2024-04-29 12:34:27  |
| 1        | 2024-04-29 11:00:00  | 2024-04-29 16:03:18  |
| 8        | 2024-04-29 15:00:00  | 2024-04-29 17:01:05  |
+----------+----------------------+----------------------+

二、分析

查询同时最大人数,考察的是对拉链转化为日志的处理方式以及聚合函数的累积计算。

维度评分
题目难度⭐️⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

三、SQL

平台最高峰同时直播人数

1.首先对原始数据进行处理,生成主播上下播的日志数据,同时增加人数变化字段,主播上播为1,主播下播-1。新数据包含 user_id,action_time,change_cnt

--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all 
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log

查询结果

+--------------+----------------------+-----------------+
| _u1.user_id  |   _u1.action_time    | _u1.change_cnt  |
+--------------+----------------------+-----------------+
| 1            | 2024-04-29 01:00:00  | 1               |
| 1            | 2024-04-29 02:01:05  | -1              |
| 2            | 2024-04-29 01:05:00  | 1               |
| 2            | 2024-04-29 02:03:18  | -1              |
| 3            | 2024-04-29 02:00:00  | 1               |
| 3            | 2024-04-29 04:03:22  | -1              |
| 4            | 2024-04-29 03:15:07  | 1               |
| 4            | 2024-04-29 04:33:21  | -1              |
| 5            | 2024-04-29 03:34:16  | 1               |
| 5            | 2024-04-29 06:10:45  | -1              |
| 6            | 2024-04-29 05:22:00  | 1               |
| 6            | 2024-04-29 07:01:08  | -1              |
| 7            | 2024-04-29 06:11:03  | 1               |
| 7            | 2024-04-29 09:26:05  | -1              |
| 3            | 2024-04-29 08:00:00  | 1               |
| 3            | 2024-04-29 12:34:27  | -1              |
| 1            | 2024-04-29 11:00:00  | 1               |
| 1            | 2024-04-29 16:03:18  | -1              |
| 8            | 2024-04-29 15:00:00  | 1               |
| 8            | 2024-04-29 17:01:05  | -1              |
+--------------+----------------------+-----------------+

2.对操作日志按照操作时间进行累积求和

with t as(
--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all 
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log
)
select
user_id,
action_time,
change_cnt,
sum(change_cnt)over(order by action_time asc) as online_cnt
from t

查询结果

+----------+----------------------+-------------+-------------+
| user_id  |     action_time      | change_cnt  | online_cnt  |
+----------+----------------------+-------------+-------------+
| 1        | 2024-04-29 01:00:00  | 1           | 1           |
| 2        | 2024-04-29 01:05:00  | 1           | 2           |
| 3        | 2024-04-29 02:00:00  | 1           | 3           |
| 1        | 2024-04-29 02:01:05  | -1          | 2           |
| 2        | 2024-04-29 02:03:18  | -1          | 1           |
| 4        | 2024-04-29 03:15:07  | 1           | 2           |
| 5        | 2024-04-29 03:34:16  | 1           | 3           |
| 3        | 2024-04-29 04:03:22  | -1          | 2           |
| 4        | 2024-04-29 04:33:21  | -1          | 1           |
| 6        | 2024-04-29 05:22:00  | 1           | 2           |
| 5        | 2024-04-29 06:10:45  | -1          | 1           |
| 7        | 2024-04-29 06:11:03  | 1           | 2           |
| 6        | 2024-04-29 07:01:08  | -1          | 1           |
| 3        | 2024-04-29 08:00:00  | 1           | 2           |
| 7        | 2024-04-29 09:26:05  | -1          | 1           |
| 1        | 2024-04-29 11:00:00  | 1           | 2           |
| 3        | 2024-04-29 12:34:27  | -1          | 1           |
| 8        | 2024-04-29 15:00:00  | 1           | 2           |
| 1        | 2024-04-29 16:03:18  | -1          | 1           |
| 8        | 2024-04-29 17:01:05  | -1          | 0           |
+----------+----------------------+-------------+-------------+

3.求取累计求和中的最大值,即为当天最高峰同时直播人数 查询语句

with t as(
--开播记录
select
user_id,
start_time as action_time,
1 as change_cnt
from t_livestream_log
union all 
--下播记录
select
user_id,
end_time as action_time,
-1 as change_cnt
from t_livestream_log
)
select 
max(online_cnt) as max_online_cnt
from
(
select
user_id,
action_time,
change_cnt,
sum(change_cnt)over(order by action_time asc) as online_cnt
from t
) tt

查询结果

+-----------------+
| max_online_cnt  |
+-----------------+
| 3               |
+-----------------+

四、建表语句和数据插入

CREATE TABLE IF NOT EXISTS t_livestream_log (
    user_id INT, -- 主播ID
    start_time STRING, -- 开始时间
    end_time STRING -- 结束时间
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','  -- 假设字段使用逗号分隔
STORED AS ORC;
 
insert into t_livestream_log(user_id, start_time, end_time) values 
(1,'2024-04-29 01:00:00','2024-04-29 02:01:05'),
(2,'2024-04-29 01:05:00','2024-04-29 02:03:18'),
(3,'2024-04-29 02:00:00','2024-04-29 04:03:22'),
(4,'2024-04-29 03:15:07','2024-04-29 04:33:21'),
(5,'2024-04-29 03:34:16','2024-04-29 06:10:45'),
(6,'2024-04-29 05:22:00','2024-04-29 07:01:08'),
(7,'2024-04-29 06:11:03','2024-04-29 09:26:05'),
(3,'2024-04-29 08:00:00','2024-04-29 12:34:27'),
(1,'2024-04-29 11:00:00','2024-04-29 16:03:18'),
(8,'2024-04-29 15:00:00','2024-04-29 17:01:05');
 
  • 本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;

  • 欢迎扫码关注公众号,添加博主好友加入数据仓库技术交流群;

  • 如果觉得本文对您有帮助,欢迎扫码打赏

扫码备注 “数据仓库”加入数据仓库技术交流群

微信扫码备注「数据仓库」
加入数据仓库技术交流群

关注公众号数据仓库技术

微信关注公众号「数据仓库技术」

微信扫码buy me a coffee

微信扫码打赏

支付宝扫码buy me a coffee

支付宝扫码打赏