面试真题
三一重工
1.部门人员数据分析

三一重工大数据面试SQL-部门人员数据分析

一、题目

现有一张员工在职所在部门信息表,包含员工ID、所属部门、开始日期、结束日期,请查询出如下内容

1.2024年1月31日A部门在职员工数;

2.2024年1月份A部门员工最多时有多少员工;

3.2024年1月份A部门平均有多少员工;

+--------------+-------------+-------------+-------------+
| employee_id  | department  | start_date  |  end_date   |
+--------------+-------------+-------------+-------------+
| 1            | A           | 2023-12-20  | 2024-01-22  |
| 2            | A           | 2024-01-02  | 2024-01-11  |
| 2            | B           | 2024-01-11  | 2024-01-25  |
| 2            | A           | 2024-01-25  | 9999-01-01  |
| 3            | A           | 2023-12-20  | 9999-01-01  |
| 4            | A           | 2024-02-02  | 9999-01-01  |
| 5            | A           | 2023-06-20  | 2023-12-22  |
+--------------+-------------+-------------+-------------+

二、分析

  1. 题目本身是一个拉链表的结构,可以认为是一个左闭又开的数据。即开始日期算做在部门内,离开日期不算在部门日期;
  2. 第1问:查询时点数据,我们可以根据记录中的开始日期和结束日期与时间判断,如果时点在区间内,则代表用户在该部门;
  3. 第2问:2024年1月份A部门员工最多时有多少员工,调整表结构为进入离开部门的记录表,并进行计数,进入部门+1,离开部门-1,然后对所有行为进行累积求和,取出出现在1月份的最大值即可;
  4. 第3问:2024年1月份A部门平均有多少员工,存在两种计算方式:1.计算出A部门1月份每天员工数,然后进行求和。2.计算出A部门在1月份的总人 * 天 然后除以1月份天数(31天)。其中计算总人 * 天数可以使用第2问的过程数据,即每个人数状态 * 持续天数来计算;
维度评分
题目难度⭐️⭐️⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

三、SQL

1.2024年1月31日A部门在职员工数

通过判断2024-01-31>=start_date并且2024-01-31 < end_date 证明在部门内。sql如下

select count(1) as a_depart_num
from t1_employee_department_info
where department = 'A'
and start_date <= '2024-01-31'
and end_date >'2024-01-31';

查询结果

+---------------+
| a_depart_num  |
+---------------+
| 2             |
+---------------+

2.2024年1月份A部门员工最多时有多少员工;

2.1 生成员工进入离开部门表

生成员工加入离开部门表,表内包含employee_id, department, enter_or_leave,action_date。其中enter_type 1代表进入,-1代表离开,进入时间用start_date,离开时间用end_date。通过查询2遍员工部门表,并通过union all 来整合到一起。

--加入部门记录
select
  employee_id as employee_id,
  department as department,
  1 as enter_or_leave,
  start_date as action_date
from t1_employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from t1_employee_department_info
where department = 'A'

查询结果

+--------------+-------------+-----------------+--------------+
| employee_id  | department  | enter_or_leave  | action_date  |
+--------------+-------------+-----------------+--------------+
| 1            | A           | 1               | 2023-12-20   |
| 2            | A           | 1               | 2024-01-02   |
| 2            | A           | 1               | 2024-01-25   |
| 3            | A           | 1               | 2023-12-20   |
| 4            | A           | 1               | 2024-02-02   |
| 5            | A           | 1               | 2023-06-20   |
| 1            | A           | -1              | 2024-01-22   |
| 2            | A           | -1              | 2024-01-11   |
| 2            | A           | -1              | 9999-01-01   |
| 3            | A           | -1              | 9999-01-01   |
| 4            | A           | -1              | 9999-01-01   |
| 5            | A           | -1              | 2023-12-22   |
+--------------+-------------+-----------------+--------------+

2.2 添加一条无状态数据,保证1月份有记录,增加两条A部门月初月末无人员变动记录

因为后面使用数据累积,也就是只有在数据变化的时候的才有记录,所以我们增加两条无人员变动记录,employee_id = 0 ,enter_or_leave = 0 代表该用户既不是进入,也不是离开。 时间分别是月初和月末。这样即能保证1月份肯定有数据,也能保证有1月份的初始和结束状态。

--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from t1_employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from t1_employee_department_info
where department = 'A'
--月初记录
union all 
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all 
--月末记录
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date

** 查询结果**

+--------------+-------------+-----------------+--------------+
| employee_id  | department  | enter_or_leave  | action_date  |
+--------------+-------------+-----------------+--------------+
| 1            | A           | 1               | 2023-12-20   |
| 2            | A           | 1               | 2024-01-02   |
| 2            | A           | 1               | 2024-01-25   |
| 3            | A           | 1               | 2023-12-20   |
| 4            | A           | 1               | 2024-02-02   |
| 5            | A           | 1               | 2023-06-20   |
| 1            | A           | -1              | 2024-01-22   |
| 2            | A           | -1              | 2024-01-11   |
| 2            | A           | -1              | 9999-01-01   |
| 3            | A           | -1              | 9999-01-01   |
| 4            | A           | -1              | 9999-01-01   |
| 5            | A           | -1              | 2023-12-22   |
| 0            | A           | 0               | 2024-01-01   |
| 0            | A           | 0               | 2024-01-31   |
+--------------+-------------+-----------------+--------------+

2.3 使用累加方式计算每次变动之后A部门的人数

3)使用sum()over(order by **) 的方式,对A部门的每次人数变化进行累积求和

with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from t1_employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from t1_employee_department_info
where department = 'A'
--月初记录
union all 
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all 
--月末记录
select 
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
)
select 
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave)over(order by action_date asc) as depart_emp_cnt
from t

查询结果

+--------------+-------------+-----------------+--------------+-----------------+
| employee_id  | department  | enter_or_leave  | action_date  | depart_emp_cnt  |
+--------------+-------------+-----------------+--------------+-----------------+
| 5            | A           | 1               | 2023-06-20   | 1               |
| 1            | A           | 1               | 2023-12-20   | 3               |
| 3            | A           | 1               | 2023-12-20   | 3               |
| 5            | A           | -1              | 2023-12-22   | 2               |
| 0            | A           | 0               | 2024-01-01   | 2               |
| 2            | A           | 1               | 2024-01-02   | 3               |
| 2            | A           | -1              | 2024-01-11   | 2               |
| 1            | A           | -1              | 2024-01-22   | 1               |
| 2            | A           | 1               | 2024-01-25   | 2               |
| 0            | A           | 0               | 2024-01-31   | 2               |
| 4            | A           | 1               | 2024-02-02   | 3               |
| 2            | A           | -1              | 9999-01-01   | 0               |
| 3            | A           | -1              | 9999-01-01   | 0               |
| 4            | A           | -1              | 9999-01-01   | 0               |
+--------------+-------------+-----------------+--------------+-----------------+

2.4 时间段限定在1月份,对depart_emp_cnt 求最大值。

with t as (
--加入部门记录
    select employee_id,
           department,
           1          as enter_or_leave,
           start_date as action_date
    from t1_employee_department_info
    where department = 'A'
    union all
-- 离开部门记录
    select employee_id,
           department,
           -1       as enter_or_leave,
           end_date as action_date
    from t1_employee_department_info
    where department = 'A'
--月初记录
    union all
    select 0            as employee_id,
           'A'          as department,
           0            as enter_or_leave,
           '2024-01-01' as action_date
    union all
--月末记录
    select 0            as employee_id,
           'A'          as department,
           0            as enter_or_leave,
           '2024-01-31' as action_date
)
select max(depart_emp_cnt) as max_emp_cnt
from (
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave)over(order by action_date asc) as depart_emp_cnt
from t) tt
where action_date >= '2024-01-01'
and action_date <= '2024-01-31'

查询结果

+--------------+
| max_emp_cnt  |
+--------------+
| 3            |
+--------------+

3.2024年1月份A部门平均有多少员工;

3.1 求每个阶段员工人数持续天数

求平均有多少员工,我们可以根据2.3的结果进行计算,查看每个阶段(两次员工变化之间的日期为同一个阶段)的人数和持续天数。然后相乘,再求和得出最终1月份在职员工的人*天,然后除以1月份天数得出平均在职人数。使用lead函数,计算出下一次变动日期,然后使用下一次变化日期-当前日期即为当前状态持续时间。这里因为在月末的时候,人数持续一天,所以我们需要在原始记录中增加2024-02-01一条无人员变动记录。

with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from t1_employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from t1_employee_department_info
where department = 'A'
--月初记录
union all
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all
--月末记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
union all
--2月初记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-02-01' as action_date
),
t2 as(
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave) over( order by action_date asc) as depart_emp_cnt
from t)
select
employee_id,
department,
enter_or_leave,
action_date,
depart_emp_cnt,
datediff(lead(action_date)over(order by action_date asc),action_date) as keep_days
from t2

查询结果

+--------------+-------------+-----------------+--------------+-----------------+------------+
| employee_id  | department  | enter_or_leave  | action_date  | depart_emp_cnt  | keep_days  |
+--------------+-------------+-----------------+--------------+-----------------+------------+
| 5            | A           | 1               | 2023-06-20   | 1               | 183        |
| 1            | A           | 1               | 2023-12-20   | 3               | 0          |
| 3            | A           | 1               | 2023-12-20   | 3               | 2          |
| 5            | A           | -1              | 2023-12-22   | 2               | 10         |
| 0            | A           | 0               | 2024-01-01   | 2               | 1          |
| 2            | A           | 1               | 2024-01-02   | 3               | 9          |
| 2            | A           | -1              | 2024-01-11   | 2               | 11         |
| 1            | A           | -1              | 2024-01-22   | 1               | 3          |
| 2            | A           | 1               | 2024-01-25   | 2               | 6          |
| 0            | A           | 0               | 2024-01-31   | 2               | 1          |
| 0            | A           | 0               | 2024-02-01   | 2               | 1          |
| 4            | A           | 1               | 2024-02-02   | 3               | 2912777    |
| 2            | A           | -1              | 9999-01-01   | 0               | 0          |
| 3            | A           | -1              | 9999-01-01   | 0               | 0          |
| 4            | A           | -1              | 9999-01-01   | 0               | NULL       |
+--------------+-------------+-----------------+--------------+-----------------+------------+

3.2计算最终结果

查询1月份的日期数据,然后对depart_emp_cnt* last_date 求和,再除以31 即为1月份的平均在职人数

with t as (
--加入部门记录
select
  employee_id,
  department,
  1 as enter_or_leave,
  start_date as action_date
from t1_employee_department_info
where department = 'A'
union all
-- 离开部门记录
select
  employee_id,
  department,
  -1 as enter_or_leave,
  end_date as action_date
from t1_employee_department_info
where department = 'A'
--月初记录
union all
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-01' as action_date
union all
--月末记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-01-31' as action_date
union all
--2月初记录
select
0 as employee_id,
'A' as department,
0 as enter_or_leave,
'2024-02-01' as action_date
),
t2 as(
select
employee_id,
department,
enter_or_leave,
action_date,
sum(enter_or_leave) over( order by action_date asc) as depart_emp_cnt
from t),
t3 as (
select
employee_id,
department,
enter_or_leave,
action_date,
depart_emp_cnt,
datediff(lead(action_date)over(order by action_date asc),action_date) as keep_days
from t2)
select
sum(depart_emp_cnt* keep_days)/31 as avg_emp_cnt
from t3
where action_date >='2024-01-01' and action_date <= '2024-01-31'

查询结果

+--------------------+
|    avg_emp_cnt     |
+--------------------+
| 2.193548387096774  |
+--------------------+

四、建表语句和数据插入

CREATE TABLE IF NOT EXISTS t1_employee_department_info (
    employee_id INT, -- 员工ID
    department STRING, -- 所属部门
    start_date STRING, -- 开始日期
    end_date STRING -- 结束日期
);
 
insert into t1_employee_department_info(employee_id, department, start_date,end_date) values 
(1, 'A', '2023-12-20','2024-01-22'),
(2, 'A', '2024-01-02','2024-01-11'),
(2, 'B', '2024-01-11','2024-01-25'),
(2, 'A', '2024-01-25','9999-01-01'),
(3, 'A', '2023-12-20','9999-01-01'),
(4, 'A', '2024-02-02','9999-01-01'),
(5, 'A', '2023-06-20','2023-12-22');

本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;