字节跳动大数据面试SQL-股票波峰波谷
一、题目
有如下数据,记录每天每只股票的收盘价格,请查出每只股票的波峰和波谷的日期和价格; 波峰:股票价格高于前一天和后一天价格时为波峰 波谷:股票价格低于前一天和后一天价格是为波谷
样例数据
+------------+-------------+--------+
| ts_code | trade_date | close |
+------------+-------------+--------+
| 000001.SZ | 20220104 | 16.66 |
| 000002.SZ | 20220104 | 20.49 |
| 000001.SZ | 20220105 | 17.15 |
| 000002.SZ | 20220105 | 21.17 |
| 000001.SZ | 20220106 | 17.12 |
| 000002.SZ | 20220106 | 21.05 |
| 000001.SZ | 20220107 | 17.2 |
| 000002.SZ | 20220107 | 21.89 |
| 000001.SZ | 20220110 | 17.19 |
| 000002.SZ | 20220110 | 22.16 |
| 000001.SZ | 20220111 | 17.41 |
| 000002.SZ | 20220111 | 22.3 |
| 000001.SZ | 20220112 | 17.0 |
| 000002.SZ | 20220112 | 22.05 |
| 000001.SZ | 20220113 | 16.98 |
| 000002.SZ | 20220113 | 21.53 |
| 000001.SZ | 20220114 | 16.33 |
| 000002.SZ | 20220114 | 20.7 |
| 000001.SZ | 20220117 | 16.22 |
| 000002.SZ | 20220117 | 20.87 |
+------------+-------------+--------+
二、分析
需要比较当天价格与前一天、后一天的价格进行比较,常规想法为进行关联,股票ID相等、日期为当天日期减1,为前一天价格,日期为当天价格加1,为后一天价格,然后进行计算;简化方法为使用lag和lead函数,可以避免进行表关联; lag()函数
LAG(col,n,DEFAULT) 用于统计窗口内往上第n行。参数1为列名,参数2为往上第n行(可选,默认为1),参数3为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)
lead()函数
LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行。参数1为列名,参数2为往下第n行(可选,默认为1),参数3为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.计算前一天和后一天的价格
我们使用lag函数和lead函数,对每支股票分组,开窗计算出每天股票记录的前一天和后一天记录中的价格。
执行SQL
select
ts_code,
trade_date,
close,
lag(close,1)over(partition by ts_code order by trade_date asc) as lastday_close,
lead(close,1)over(partition by ts_code order by trade_date asc) as nextday_close
from t3_stock_test;
执行结果
+------------+-------------+--------+----------------+----------------+
| ts_code | trade_date | close | lastday_close | nextday_close |
+------------+-------------+--------+----------------+----------------+
| 000001.SZ | 20220104 | 16.66 | NULL | 17.15 |
| 000001.SZ | 20220105 | 17.15 | 16.66 | 17.12 |
| 000001.SZ | 20220106 | 17.12 | 17.15 | 17.2 |
| 000001.SZ | 20220107 | 17.2 | 17.12 | 17.19 |
| 000001.SZ | 20220110 | 17.19 | 17.2 | 17.41 |
| 000001.SZ | 20220111 | 17.41 | 17.19 | 17.0 |
| 000001.SZ | 20220112 | 17.0 | 17.41 | 16.98 |
| 000001.SZ | 20220113 | 16.98 | 17.0 | 16.33 |
| 000001.SZ | 20220114 | 16.33 | 16.98 | 16.22 |
| 000001.SZ | 20220117 | 16.22 | 16.33 | NULL |
| 000002.SZ | 20220104 | 20.49 | NULL | 21.17 |
| 000002.SZ | 20220105 | 21.17 | 20.49 | 21.05 |
| 000002.SZ | 20220106 | 21.05 | 21.17 | 21.89 |
| 000002.SZ | 20220107 | 21.89 | 21.05 | 22.16 |
| 000002.SZ | 20220110 | 22.16 | 21.89 | 22.3 |
| 000002.SZ | 20220111 | 22.3 | 22.16 | 22.05 |
| 000002.SZ | 20220112 | 22.05 | 22.3 | 21.53 |
| 000002.SZ | 20220113 | 21.53 | 22.05 | 20.7 |
| 000002.SZ | 20220114 | 20.7 | 21.53 | 20.87 |
| 000002.SZ | 20220117 | 20.87 | 20.7 | NULL |
+------------+-------------+--------+----------------+----------------+
2.判断是否为波峰和波谷
如果当天价格大于昨天和明天的价格则是波峰,如果当天价格小于昨天价格和明天的价格则是波谷,不符合条件为其他
执行SQL
select
ts_code,
trade_date,
close,
lastday_close,
nextday_close,
case when close > lastday_close and close > nextday_close then '波峰'
when close < lastday_close and close < nextday_close then '波谷'
else '其他' end as `point_type`
from
(
select
ts_code,
trade_date,
close,
lag(close,1)over(partition by ts_code order by trade_date asc) as lastday_close,
lead(close,1)over(partition by ts_code order by trade_date asc) as nextday_close
from t3_stock_test
) t
执行结果
+------------+-------------+--------+----------------+----------------+-------------+
| ts_code | trade_date | close | lastday_close | nextday_close | point_type |
+------------+-------------+--------+----------------+----------------+-------------+
| 000001.SZ | 20220104 | 16.66 | NULL | 17.15 | 其他 |
| 000001.SZ | 20220105 | 17.15 | 16.66 | 17.12 | 波峰 |
| 000001.SZ | 20220106 | 17.12 | 17.15 | 17.2 | 波谷 |
| 000001.SZ | 20220107 | 17.2 | 17.12 | 17.19 | 波峰 |
| 000001.SZ | 20220110 | 17.19 | 17.2 | 17.41 | 波谷 |
| 000001.SZ | 20220111 | 17.41 | 17.19 | 17.0 | 波峰 |
| 000001.SZ | 20220112 | 17.0 | 17.41 | 16.98 | 其他 |
| 000001.SZ | 20220113 | 16.98 | 17.0 | 16.33 | 其他 |
| 000001.SZ | 20220114 | 16.33 | 16.98 | 16.22 | 其他 |
| 000001.SZ | 20220117 | 16.22 | 16.33 | NULL | 其他 |
| 000002.SZ | 20220104 | 20.49 | NULL | 21.17 | 其他 |
| 000002.SZ | 20220105 | 21.17 | 20.49 | 21.05 | 波峰 |
| 000002.SZ | 20220106 | 21.05 | 21.17 | 21.89 | 波谷 |
| 000002.SZ | 20220107 | 21.89 | 21.05 | 22.16 | 其他 |
| 000002.SZ | 20220110 | 22.16 | 21.89 | 22.3 | 其他 |
| 000002.SZ | 20220111 | 22.3 | 22.16 | 22.05 | 波峰 |
| 000002.SZ | 20220112 | 22.05 | 22.3 | 21.53 | 其他 |
| 000002.SZ | 20220113 | 21.53 | 22.05 | 20.7 | 其他 |
| 000002.SZ | 20220114 | 20.7 | 21.53 | 20.87 | 波谷 |
| 000002.SZ | 20220117 | 20.87 | 20.7 | NULL | 其他 |
+------------+-------------+--------+----------------+----------------+-------------+
3.查询结果
执行SQL
select
ts_code,
trade_date,
close,
point_type
from
(
select
ts_code,
trade_date,
close,
lastday_close,
nextday_close,
case when close > lastday_close and close > nextday_close then '波峰'
when close < lastday_close and close < nextday_close then '波谷'
else '其他' end as `point_type`
from
(
select
ts_code,
trade_date,
close,
lag(close,1)over(partition by ts_code order by trade_date asc) as lastday_close,
lead(close,1)over(partition by ts_code order by trade_date asc) as nextday_close
from t3_stock_test
) t
) tt
where tt.point_type in('波峰','波谷')
执行结果
+------------+-------------+--------+-------------+
| ts_code | trade_date | close | point_type |
+------------+-------------+--------+-------------+
| 000001.SZ | 20220105 | 17.15 | 波峰 |
| 000001.SZ | 20220106 | 17.12 | 波谷 |
| 000001.SZ | 20220107 | 17.2 | 波峰 |
| 000001.SZ | 20220110 | 17.19 | 波谷 |
| 000001.SZ | 20220111 | 17.41 | 波峰 |
| 000002.SZ | 20220105 | 21.17 | 波峰 |
| 000002.SZ | 20220106 | 21.05 | 波谷 |
| 000002.SZ | 20220111 | 22.3 | 波峰 |
| 000002.SZ | 20220114 | 20.7 | 波谷 |
+------------+-------------+--------+-------------+
四、建表语句和数据插入
create table t3_stock_test(
ts_code string comment '股票代码',
trade_date string comment '交易日期',
close float comment '收盘价'
);
INSERT INTO `t3_stock_test` VALUES
('000001.SZ','20220104',16.66),
('000002.SZ','20220104',20.49),
('000001.SZ','20220105',17.15),
('000002.SZ','20220105',21.17),
('000001.SZ','20220106',17.12),
('000002.SZ','20220106',21.05),
('000001.SZ','20220107',17.2),
('000002.SZ','20220107',21.89),
('000001.SZ','20220110',17.19),
('000002.SZ','20220110',22.16),
('000001.SZ','20220111',17.41),
('000002.SZ','20220111',22.3),
('000001.SZ','20220112',17),
('000002.SZ','20220112',22.05),
('000001.SZ','20220113',16.98),
('000002.SZ','20220113',21.53),
('000001.SZ','20220114',16.33),
('000002.SZ','20220114',20.7),
('000001.SZ','20220117',16.22),
('000002.SZ','20220117',20.87);
- 本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;