常见大数据面试SQL-去掉最大最小值的部门平均薪水
一、题目
有员工薪资表t3_salary,包含员工ID(emp_id),部门ID(depart_id),薪水(salary),请计算去除最高最低薪资后的平均薪水;(每个部门员工数不少于3人)
+---------+------------+-----------+
| emp_id | depart_id | salary |
+---------+------------+-----------+
| 1001 | 1 | 5000.00 |
| 1002 | 1 | 10000.00 |
| 1003 | 1 | 20000.00 |
| 1004 | 1 | 30000.00 |
| 1005 | 1 | 6000.00 |
| 1006 | 1 | 10000.00 |
| 1007 | 1 | 11000.00 |
| 1008 | 2 | 3000.00 |
| 1009 | 2 | 7000.00 |
| 1010 | 2 | 9000.00 |
| 1011 | 2 | 30000.00 |
+---------+------------+-----------+
二、分析
- 找到部门内的最高最低薪资,去掉这些行,考察点为row_number()开窗函数;
- 本题的难点在于同时去掉最高和最低,所以需要按照不同的排序进行处理,所以需要进行两次开窗,排序;
- 去除最高最低薪资之后,按照部门分组进行分组取平均即可;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.分别按照正序和倒序进行开窗,得到部门最高最低薪资记录
执行SQL
select emp_id,
depart_id,
salary,
row_number() over (partition by depart_id order by salary asc) as asc_order,
row_number() over (partition by depart_id order by salary desc) as desc_order
from t3_salary;
查询结果
+---------+------------+-----------+------------+-------------+
| emp_id | depart_id | salary | asc_order | desc_order |
+---------+------------+-----------+------------+-------------+
| 1004 | 1 | 30000.00 | 7 | 1 |
| 1003 | 1 | 20000.00 | 6 | 2 |
| 1007 | 1 | 11000.00 | 5 | 3 |
| 1002 | 1 | 10000.00 | 4 | 4 |
| 1006 | 1 | 10000.00 | 3 | 5 |
| 1005 | 1 | 6000.00 | 2 | 6 |
| 1001 | 1 | 5000.00 | 1 | 7 |
| 1011 | 2 | 30000.00 | 4 | 1 |
| 1010 | 2 | 9000.00 | 3 | 2 |
| 1009 | 2 | 7000.00 | 2 | 3 |
| 1008 | 2 | 3000.00 | 1 | 4 |
+---------+------------+-----------+------------+-------------+
2.去掉最高最低薪资,这里只能有第一行,不能取排序最大值进行去重,那样处理起来太复杂(不能确定排序最大值是多少,这也是进行两次开窗排序的原因)
执行SQL
select emp_id,
depart_id,
salary,
asc_order,
desc_order
from (select emp_id,
depart_id,
salary,
row_number() over (partition by depart_id order by salary asc) as asc_order,
row_number() over (partition by depart_id order by salary desc) as desc_order
from t3_salary) t
where asc_order > 1
and desc_order > 1
查询结果
+---------+------------+-----------+------------+-------------+
| emp_id | depart_id | salary | asc_order | desc_order |
+---------+------------+-----------+------------+-------------+
| 1003 | 1 | 20000.00 | 6 | 2 |
| 1007 | 1 | 11000.00 | 5 | 3 |
| 1002 | 1 | 10000.00 | 4 | 4 |
| 1006 | 1 | 10000.00 | 3 | 5 |
| 1005 | 1 | 6000.00 | 2 | 6 |
| 1010 | 2 | 9000.00 | 3 | 2 |
| 1009 | 2 | 7000.00 | 2 | 3 |
+---------+------------+-----------+------------+-------------+
3.查询部门平均薪水
执行SQL
select depart_id,
avg(salary) as avg_salary
from (select emp_id,
depart_id,
salary,
row_number() over (partition by depart_id order by salary asc) as asc_order,
row_number() over (partition by depart_id order by salary desc) as desc_order
from t3_salary) t
where asc_order > 1
and desc_order > 1
group by depart_id
查询结果
+------------+---------------+
| depart_id | avg_salary |
+------------+---------------+
| 1 | 11400.000000 |
| 2 | 8000.000000 |
+------------+---------------+
引伸:
1.题目中每个部门员工>=3人,则去除最高最低之后至少还有1人,如果部门人员为2人或者1人该如何处理?
2.如果最高和最低薪资不止一人,该如何处理?(row_number、rank究竟用哪个)
四、建表语句和数据插入
--建表语句
CREATE TABLE t3_salary (
emp_id bigint,
depart_id bigint,
salary decimal(16,2)
);
--插入数据
insert into t3_salary (emp_id,depart_id,salary)
values
(1001,1,5000.00),
(1002,1,10000.00),
(1003,1,20000.00),
(1004,1,30000.00),
(1005,1,6000.00),
(1006,1,10000.00),
(1007,1,11000.00),
(1008,2,3000.00),
(1009,2,7000.00),
(1010,2,9000.00),
(1011,2,30000.00);
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;