Skip to main content

字节跳动大数据面试SQL-共同使用ip用户检测问题

一、题目

现有用户登录日志表,记录了每个用户登录的IP地址,请查询共同使用过3个及以上IP的用户对;

+----------+-----------------+----------------------+
| user_id | ip | time_stamp |
+----------+-----------------+----------------------+
| 2 | 223.104.41.101 | 2023-08-24 07:00:00 |
| 4 | 223.104.41.122 | 2023-08-24 10:00:00 |
| 5 | 223.104.41.126 | 2023-08-24 11:00:00 |
| 4 | 223.104.41.126 | 2023-08-24 13:00:00 |
| 1 | 223.104.41.101 | 2023-08-24 16:00:00 |
| 3 | 223.104.41.101 | 2023-08-24 16:02:00 |
| 2 | 223.104.41.104 | 2023-08-24 16:30:00 |
| 1 | 223.104.41.121 | 2023-08-24 17:00:00 |
| 2 | 223.104.41.122 | 2023-08-24 17:05:00 |
| 3 | 223.104.41.103 | 2023-08-24 18:11:00 |
| 2 | 223.104.41.103 | 2023-08-24 19:00:00 |
| 1 | 223.104.41.104 | 2023-08-24 19:00:00 |
| 3 | 223.104.41.122 | 2023-08-24 19:07:00 |
| 1 | 223.104.41.122 | 2023-08-24 21:00:00 |
+----------+-----------------+----------------------+

二、分析

  1. 题目给出的数据是登录记录,需要使用IP进行关联,找到使用相同IP的记录;
  2. 因为要使用ip进行关联,首先保证每个用户同一个IP只有一条记录,否则关联会导致结果数据重复;
  3. 自关联,会导致使用相同IP的用户,出现1-2,2-1两条记录、1-1,2-2自己的记录,这些记录需要去重和剔除;
  4. 计算共同使用过的IP数量,得出结果;
维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️

三、SQL

1.将所有用户登录记录按照用户ID和登录IP去重

执行SQL

select user_id,
ip
from t6_login_log
group by user_id, ip

查询结果

+----------+-----------------+
| user_id | ip |
+----------+-----------------+
| 1 | 223.104.41.101 |
| 1 | 223.104.41.104 |
| 1 | 223.104.41.121 |
| 1 | 223.104.41.122 |
| 2 | 223.104.41.101 |
| 2 | 223.104.41.103 |
| 2 | 223.104.41.104 |
| 2 | 223.104.41.122 |
| 3 | 223.104.41.101 |
| 3 | 223.104.41.103 |
| 3 | 223.104.41.122 |
| 4 | 223.104.41.122 |
| 4 | 223.104.41.126 |
| 5 | 223.104.41.126 |
+----------+-----------------+

2.通过IP地址进行自关联,去重,剔除相同用户。

执行SQL

with tmp as
(select user_id,
ip
from t6_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id,
t1.ip
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id

查询结果

+-------------+-------------+-----------------+
| t1.user_id | t2.user_id | t1.ip |
+-------------+-------------+-----------------+
| 1 | 2 | 223.104.41.101 |
| 1 | 3 | 223.104.41.101 |
| 2 | 3 | 223.104.41.101 |
| 2 | 3 | 223.104.41.103 |
| 1 | 2 | 223.104.41.104 |
| 1 | 2 | 223.104.41.122 |
| 1 | 3 | 223.104.41.122 |
| 1 | 4 | 223.104.41.122 |
| 2 | 3 | 223.104.41.122 |
| 2 | 4 | 223.104.41.122 |
| 3 | 4 | 223.104.41.122 |
| 4 | 5 | 223.104.41.126 |
+-------------+-------------+-----------------+

3.根据用户组计算使用共同IP的个数

执行SQL

with tmp as
(select user_id,
ip
from t6_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id,
count(t1.ip)
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id
group by t1.user_id,
t2.user_id

查询结果

+-------------+-------------+------+
| t1.user_id | t2.user_id | _c2 |
+-------------+-------------+------+
| 1 | 2 | 3 |
| 1 | 3 | 2 |
| 1 | 4 | 1 |
| 2 | 3 | 3 |
| 2 | 4 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 1 |
+-------------+-------------+------+

4.查询共同使用过3个以上IP的用户对

执行SQL

with tmp as
(select user_id,
ip
from t6_login_log
group by user_id, ip)
select t1.user_id,
t2.user_id
from tmp as t1
join
tmp as t2
on t1.ip = t2.ip
where t1.user_id < t2.user_id
group by t1.user_id,
t2.user_id
having count(t1.ip) >= 3

查询结果

+-------------+-------------+
| t1.user_id | t2.user_id |
+-------------+-------------+
| 1 | 2 |
| 2 | 3 |
+-------------+-------------+

四、建表语句和数据插入

--建表语句
CREATE TABLE t6_login_log (
user_id bigint COMMENT '用户ID',
ip string COMMENT '用户登录ip地址',
time_stamp string COMMENT '登录时间'
) COMMENT '用户登录记录表';
-- 插入数据
insert into t6_login_log(user_id,ip,time_stamp)
values
(1,'223.104.41.101','2023-08-24 16:00:00'),
(1,'223.104.41.121','2023-08-24 17:00:00'),
(1,'223.104.41.104','2023-08-24 19:00:00'),
(1,'223.104.41.122','2023-08-24 21:00:00'),
(1,'223.104.41.122','2023-08-24 22:00:00'),
(2,'223.104.41.101','2023-08-24 07:00:00'),
(2,'223.104.41.103','2023-08-24 19:00:00'),
(2,'223.104.41.104','2023-08-24 16:30:00'),
(2,'223.104.41.122','2023-08-24 17:05:00'),
(3,'223.104.41.103','2023-08-24 18:11:00'),
(3,'223.104.41.122','2023-08-24 19:07:00'),
(3,'223.104.41.101','2023-08-24 16:02:00'),
(4,'223.104.41.126','2023-08-24 13:00:00'),
(5,'223.104.41.126','2023-08-24 11:00:00'),
(4,'223.104.41.122','2023-08-24 10:00:00');