南大通用GBase 8a OLAP分析函数cume_dist的使用样例

GBase 8a支持cume_dist 函数,用于计算小于等于或者大于等于(根据order的顺序)该值的百分比。

语法

cume_disk() over([partition by ] order by [desc])

说明

其中partition是否开窗,否则全部数值统一处理。

order by 的顺序,ASC(默认) = 小于等于,DESC =大于等于

样例

小于等于

如下例子是默认的升序,第一行表示:小于等于数值1的行数比例为20%

gbase> select id,cume_dist()over(order by id) cr from t2;
+------+-----+
| id   | cr  |
+------+-----+
|    1 | 0.2 |
|    2 | 0.4 |
|    3 | 0.6 |
|    4 | 0.8 |
|    5 |   1 |
+------+-----+
5 rows in set (Elapsed: 00:00:00.01)

大于等于

排序为desc,第一行表示:大于等于数值5的行数比例为20%。

gbase> select id,cume_dist()over(order by id desc) cr from t2;
+------+-----+
| id   | cr  |
+------+-----+
|    5 | 0.2 |
|    4 | 0.4 |
|    3 | 0.6 |
|    2 | 0.8 |
|    1 |   1 |
+------+-----+
5 rows in set (Elapsed: 00:00:00.02)

带开窗partition

每个partition内不分别计算百分比。

gbase> select * from t4;
+------+------+
| id   | type |
+------+------+
|    1 | A    |
|    2 | A    |
|    3 | A    |
|    1 | B    |
|    2 | B    |
|    3 | B    |
|    4 | B    |
+------+------+
7 rows in set (Elapsed: 00:00:00.00)

gbase> select type,id,cume_dist()over(partition by type order by id) cr from t4;
+------+------+-------------------+
| type | id   | cr                |
+------+------+-------------------+
| A    |    1 | 0.333333333333333 |
| A    |    2 | 0.666666666666667 |
| A    |    3 |                 1 |
| B    |    1 |              0.25 |
| B    |    2 |               0.5 |
| B    |    3 |              0.75 |
| B    |    4 |                 1 |
+------+------+-------------------+
7 rows in set (Elapsed: 00:00:00.07)

与Percent_rank的对比

percent_rank是计算相对位置,包含起点0,而cume_dist是包含等于的,所以不会出现0。如果数据只有1行,那么percent_rank为

起点0, 而cume_dist为1(100%)。

gbase> select id,cume_dist()over(order by id) cr,percent_rank()over(order by id) pr  from t2;
+------+-----+------+
| id   | cr  | pr   |
+------+-----+------+
|    1 | 0.2 |    0 |
|    2 | 0.4 | 0.25 |
|    3 | 0.6 |  0.5 |
|    4 | 0.8 | 0.75 |
|    5 |   1 |    1 |
+------+-----+------+
5 rows in set (Elapsed: 00:00:00.01)

一行数据

gbase> select id,cume_dist()over(order by id) cr,percent_rank()over(order by id) pr  from t5;
+------+----+----+
| id   | cr | pr |
+------+----+----+
|    1 |  1 |  0 |
+------+----+----+
1 row in set (Elapsed: 00:00:00.02)