数据库写sql面试题

第一题

表结构:uid,subject_id,score。求:找出所有科目成绩都大于某一学科平均成绩的学生

1
2
3
4
5
6
7
8
9
10
--数据集
1001 01 90
1001 02 90
1001 03 90
1002 01 85
1002 02 85
1002 03 70
1003 01 70
1003 02 70
1003 03 85

解题思路:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
(1)求出每个学科平均成绩
select uid,
score,
avg(score) over(partition by subject_id) AS avg_score from
score;t1

这里比较关注的是OVER 窗口函数:
AVG(score) OVER (PARTITION BY subject_id): 这是窗口函数的一部分,表示按 subject_id 分组计算每组的平均分数。PARTITION BY subject_id: 将数据分成多个分区,每个分区对应一个 subject_id。AVG(score): 对每个分区内的 score 列计算平均值。

2)根据是否大于平均成绩记录 flag,大于则记为 0 否则记为 1
select uid,
if(score>avg_score,0,1) AS flag from
t1;t2

3)根据学生 id 进行分组统计 flag 的和,和为 0 则是所有学科都大于平均成绩
select uid
from
t2 group by
uid having
sum(flag)=0;

4)最终SQL
select uid
from (select
uid,
if(score>avg_score,0,1) flag from
(select uid,
score,
avg(score) over(partition by subject_id) avg_score from
score)t1)t2
group by
uid having
sum(flag)=0;

第二题

有 50W 个京东店铺,每个顾客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为 Visit,访客的用户 id 为 user_id,被访问的店铺名称为shop,请统计:每个店铺访问次数 top3 的访客信息。输出店铺名称、访客 id、访问次数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
--数据集
u1 a
u2 b
u1 b
u1 a
u3 c
u4 b
u1 a
u2 c
u5 b
u4 b
u6 c
u2 c
u1 b
u2 a
u2 a
u3 a
u5 a
u5 a
u5 a

解题思路:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
(1)查询每个店铺被每个用户访问次数
select shop,user_id,count(*) ct
from visit
group by shop,user_id;t1

(2)计算每个店铺被用户访问次数排名
select shop,user_id,ct,rank() over(partition by shop order by ct) rk
from t1;t2
这里也要关注一下rank()函数,RANK() OVER (PARTITION BY shop ORDER BY ct): 这是窗口函数的一部分,表示按 shop 分组计算每组内的时间戳排名。
PARTITION BY shop: 将数据分成多个分区,每个分区对应一个 shop。
ORDER BY ct: 在每个分区内按上面的count(*)排序。
RANK(): 计算每个分区内按时间戳排序后的排名。

(3)取每个店铺排名前 3
select shop,user_id,ct
from t2
where rk<=3;

(4)最终SQL
select shop,
user_id,
ct from
(select
shop,
user_id,
ct,
rank() over(partition by shop order by ct) rk
from (select
shop,
user_id,
count(*) ct
from visit
group by
shop,
user_id)t1
)t2
where rk<=3;

第三题

请用 sql 写出所有用户中在今年 10 月份第一次购买商品的金额,表 ordertable 字段(购买用户:userid,金额:money,购买时间:paymenttime(格式:2017-10-01),订单 id:orderid)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(1)查询出用户最小使用时间
select
userid,
min(paymenttime) paymenttime
from
ordertable
where
date_format(paymenttime,'yyyy-MM')='2017-10'
group by
userid;t1
(2) 拿到用户金额
select
t1.userid,
t1.paymenttime,
od.money
from t1
join
ordertable od
on
t1.userid=od.userid
and
t1.paymenttime=od.paymenttime;
3)最终SQL
select
t1.userid,
t1.paymenttime,
od.money
from (select
userid,
min(paymenttime) paymenttime
from
ordertable
where
date_format(paymenttime,'yyyy-MM')='2017-10'
group by
userid)t1
join
ordertable od
on
t1.userid=od.userid
and
t1.paymenttime=od.paymenttime;

第四题

有日志如下,请写出代码求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)

1
2
3
4
5
6
7
8
9
10
11
12
--数据集
日期dt 用户user_id 年龄age
2019-02-11,test_1,23
2019-02-11,test_2,19
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-12,test_2,19
2019-02-13,test_1,23
2019-02-15,test_2,19
2019-02-16,test_2,19

解题思路:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
(1)按照日期以及用户分组,按照日期排序并给出排名
select dt,
user_id,
min(age) age,
rank() over(partition by user_id order by dt) rk
from
user_age
group by
dt,user_id;t1

(2)计算日期及排名的差值
select
user_id,
age,
date_sub(dt,rk) flag
from
t1;t2
这里注意一下:DATE_SUB(dt, INTERVAL rk DAY) AS flag: 使用 DATE_SUB 函数从日期 dt 中减去 rk 天数,生成新的字段 flag。

(3)过滤出差值大于等于 2 的,即为连续两天活跃的用户
select
user_id,
min(age) age
from
t2 group by
user_id,flag
having
count(*)>=2;t3

(4)对数据进行去重处理(一个用户可以在两个不同的时间点连续登录),例如:a 用户在 110111号以及120号和1214天登录。
select
user_id,
min(age) age
from
t3 group by
user_id;t4

(5)计算活跃用户(两天连续有访问)的人数以及平均年龄
select
count(*) ct,
cast(sum(age)/count(*) as decimal(10,2))
from t4;

(6) 对全量数据集进行按照用户去重
select
user_id,
min(age) age
from
user_age
group by
user_id;t5

(7)计算所有用户的数量以及平均年龄
select
count(*) user_count,
cast((sum(age)/count(*)) as decimal(10,1))
from t5;

(8) 将第 5 步以及第 7 步两个数据集进行 union all 操作
select
0 user_total_count,
0 user_total_avg_age,
count(*) twice_count,
cast(sum(age)/count(*)
twice_count_avg_age
from
(
select
user_id,
min(age) age
as decimal(10,2))
from (select
user_id,
min(age) age
from
( select
user_id,
age,
date_sub(dt,rk) flag
from (
select dt,
user_id,
min(age) age,
rank() over(partition by user_id order by dt) rk
from
user_age
group by
dt,user_id
)t1
)t2 group by
user_id,flag
having
count(*)>=2)t3
group by
user_id
)t4
union all
select
count(*) user_total_count,
cast((sum(age)/count(*)) as decimal(10,1)),
0 twice_count,
0 twice_count_avg_age
from (
user_id
)t5;t6

(9) 求和并拼接为最终SQL
select
sum(user_total_count),
sum(user_total_avg_age),
sum(twice_count),
sum(twice_count_avg_age)
from (select
0 user_total_count,
0 user_total_avg_age,
count(*) twice_count, cast(sum(age)/count(*) as
twice_count_avg_age
from
(
select
user_id,
min(age) age
from (select
user_id,
min(age) age
from
( select
user_id,
age,
date_sub(dt,rk) flag
decimal(10,2))
from (
select dt,
user_id,
min(age) age,
rank() over(partition by user_id order by dt) rk
from
user_age
group by
dt,user_id
)t1
)t2 group by
user_id,flag
having
count(*)>=2)t3
group by
user_id )t4
union all
select
count(*) user_total_count,
cast((sum(age)/count(*)) as decimal(10,1)),
0 twice_count,
0 twice_count_avg_age
from (
user_id
)t5)t6;

以上只是个人的写法,如果有更好的或者实际情况中有更符合现实的可以更加优化,让我们一起追求数据库查询的艺术。

感谢您请我喝咖啡~O(∩_∩)O,如果要联系请直接发我邮箱chenx1242@163.com,我会回复你的
-------------本文结束感谢您的阅读-------------