Linux运维工程师笔试题第二十一套

第一题

表结构:uid,subject_id,score。求:找出所有科目成绩都大于某一学科平均成绩的学生

解题思路:

(1）求出每个学科平均成绩
    select uid,
    score,
    avg(score) over(partition by subject_id) AS avg_score from
    score;t1

    这里比较关注的是OVER 窗口函数：
	AVG(score) OVER (PARTITION BY subject_id): 这是窗口函数的一部分，表示按 subject_id 分组计算每组的平均分数。PARTITION BY subject_id: 将数据分成多个分区，每个分区对应一个 subject_id。AVG(score): 对每个分区内的 score 列计算平均值。

（2）根据是否大于平均成绩记录 flag，大于则记为 0 否则记为 1
    select uid,
    if(score>avg_score,0,1) AS flag from
    t1;t2

（3）根据学生 id 进行分组统计 flag 的和，和为 0 则是所有学科都大于平均成绩
    select uid
    from
    t2 group by
    uid having
       sum(flag)=0;

（4）最终SQL
    select uid
    from (select
    uid,
    if(score>avg_score,0,1) flag from
    (select uid,
    score,
    avg(score) over(partition by subject_id) avg_score from
       score)t1)t2
    group by
    uid having
       sum(flag)=0;

第二题

有 50W 个京东店铺，每个顾客访问任何一个店铺的任何一个商品时都会产生一条访问日志，访问日志存储的表名为 Visit，访客的用户 id 为 user_id，被访问的店铺名称为shop，请统计:每个店铺访问次数 top3 的访客信息。输出店铺名称、访客 id、访问次数。

--数据集
u1 a
u2 b
u1 b
u1 a
u3 c
u4 b
u1 a
u2 c
u5 b
u4 b
u6 c
u2 c
u1 b
u2 a
u2 a
u3 a
u5 a
u5 a
u5 a

解题思路:

(1)查询每个店铺被每个用户访问次数
      select shop,user_id,count(*) ct
      from visit
      group by shop,user_id;t1

(2)计算每个店铺被用户访问次数排名
      select shop,user_id,ct,rank() over(partition by shop order by ct) rk
      from t1;t2
      这里也要关注一下rank（)函数，RANK() OVER (PARTITION BY shop ORDER BY ct): 这是窗口函数的一部分，表示按 shop 分组计算每组内的时间戳排名。
			PARTITION BY shop: 将数据分成多个分区，每个分区对应一个 shop。
			ORDER BY ct: 在每个分区内按上面的count(*)排序。
			RANK(): 计算每个分区内按时间戳排序后的排名。

(3)取每个店铺排名前 3 的
      select shop,user_id,ct
      from t2
      where rk<=3;

(4)最终SQL
      select shop,
      user_id,
      ct from
         (select
             shop,
             user_id,
             ct,
             rank() over(partition by shop order by ct) rk
      from (select
                shop,
                user_id,
                count(*) ct
             from visit
             group by
      shop,
                user_id)t1
         )t2
      where rk<=3;

第三题

请用 sql 写出所有用户中在今年 10 月份第一次购买商品的金额，表 ordertable 字段(购买用户:userid，金额:money，购买时间:paymenttime(格式:2017-10-01)，订单 id:orderid)

(1)查询出用户最小使用时间
    select
      userid,
      min(paymenttime) paymenttime
    from
    ordertable
    where
      date_format(paymenttime,'yyyy-MM')='2017-10'
    group by
      userid;t1
(2) 拿到用户金额
    select
      t1.userid,
      t1.paymenttime,
      od.money
    from t1
    join
      ordertable od
    on
    t1.userid=od.userid
    and
    t1.paymenttime=od.paymenttime;
（3）最终SQL
    select
      t1.userid,
      t1.paymenttime,
      od.money
    from (select
    userid,
      min(paymenttime) paymenttime
    from
      ordertable
    where
      date_format(paymenttime,'yyyy-MM')='2017-10'
    group by
      userid)t1
    join
      ordertable od
    on
      t1.userid=od.userid
      and
      t1.paymenttime=od.paymenttime;

第四题

有日志如下，请写出代码求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)

--数据集
日期dt   用户user_id 年龄age
2019-02-11,test_1,23
2019-02-11,test_2,19
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-12,test_2,19
2019-02-13,test_1,23
2019-02-15,test_2,19
2019-02-16,test_2,19

解题思路:

(1)按照日期以及用户分组，按照日期排序并给出排名
    select dt,
      user_id,
      min(age) age,
      rank() over(partition by user_id order by dt) rk
    from
      user_age
    group by
      dt,user_id;t1

(2)计算日期及排名的差值
    select
      user_id,
    age,
      date_sub(dt,rk) flag
    from
    t1;t2
    这里注意一下：DATE_SUB(dt, INTERVAL rk DAY) AS flag: 使用 DATE_SUB 函数从日期 dt 中减去 rk 天数，生成新的字段 flag。

(3)过滤出差值大于等于 2 的，即为连续两天活跃的用户
    select
    user_id,
      min(age) age
    from
    t2 group by
      user_id,flag
    having
      count(*)>=2;t3

(4)对数据进行去重处理(一个用户可以在两个不同的时间点连续登录)，例如:a 用户在 1 月10号1月11号以及1月20号和1月21号4天登录。
    select
      user_id,
      min(age) age
    from
    t3 group by
    user_id;t4

(5)计算活跃用户(两天连续有访问)的人数以及平均年龄
    select
      count(*) ct,
      cast(sum(age)/count(*) as decimal(10,2))
    from t4;

(6) 对全量数据集进行按照用户去重
    select
      user_id,
    min(age) age
    from
      user_age
    group by
      user_id;t5

(7)计算所有用户的数量以及平均年龄
    select
      count(*) user_count,
      cast((sum(age)/count(*)) as decimal(10,1))
    from t5;

(8) 将第 5 步以及第 7 步两个数据集进行 union all 操作
    select
      0 user_total_count,
      0 user_total_avg_age,
      count(*) twice_count,
      cast(sum(age)/count(*)
    twice_count_avg_age
    from
    (
      select
      user_id,
      min(age) age
    as decimal(10,2))
    from (select
    user_id,
      min(age) age
    from
    ( select
      user_id,
      age,
      date_sub(dt,rk) flag
    from (
    select dt,
          user_id,
          min(age) age,
          rank() over(partition by user_id order by dt) rk
       from
          user_age
       group by
          dt,user_id
    )t1
    )t2 group by
      user_id,flag
    having
      count(*)>=2)t3
    group by
      user_id
    )t4
    union all
    select
      count(*) user_total_count,
    cast((sum(age)/count(*)) as decimal(10,1)),
      0 twice_count,
      0 twice_count_avg_age
    from (
           user_id
      )t5;t6

(9) 求和并拼接为最终SQL
    select
       sum(user_total_count),
       sum(user_total_avg_age),
       sum(twice_count),
       sum(twice_count_avg_age)
    from (select
    0 user_total_count,
    0 user_total_avg_age,
    count(*) twice_count, cast(sum(age)/count(*) as
    twice_count_avg_age
    from
    (
      select
      user_id,
      min(age) age
    from (select
    user_id,
      min(age) age
    from
    ( select
      user_id,
      age,
      date_sub(dt,rk) flag
    decimal(10,2))
    from (
    select dt,
          user_id,
          min(age) age,
          rank() over(partition by user_id order by dt) rk
    from
    user_age
       group by
          dt,user_id
       )t1
    )t2 group by
      user_id,flag
    having
      count(*)>=2)t3
    group by
    user_id )t4
    union all
    select
      count(*) user_total_count,
      cast((sum(age)/count(*)) as decimal(10,1)),
      0 twice_count,
      0 twice_count_avg_age
    from (
           user_id
      )t5)t6;

以上只是个人的写法，如果有更好的或者实际情况中有更符合现实的可以更加优化，让我们一起追求数据库查询的艺术。