ITPUB论坛－专业的IT技术社区

标题: mysql group by 的奇怪地方？是bug 还是它的特性 [打印本页]

作者: carcase 时间: 2009-6-18 10:08
标题: mysql group by 的奇怪地方？是bug 还是它的特性
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7397 to server version: 5.1.31

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use nowhill;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table test(a int,b int,c int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test(1,1,1),(1,1,2),(1,1,3),(1,2,5),(1,2,1),(1,2,6);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1,1,1),(1,1,2),(1,1,3),(1,2,5),(1,2,1),(1,2,6)' at line 1
mysql> insert into test values(1,1,1),(1,1,2),(1,1,3),(1,2,5),(1,2,1),(1,2,6);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from test;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 1 | 2 | 5 |
| 1 | 2 | 1 |
| 1 | 2 | 6 |
+------+------+------+
6 rows in set (0.00 sec)

mysql> select a,b,c from test group by a,b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 5 |
+------+------+------+
2 rows in set (0.00 sec)

mysql> select * from (select a,b,c from test  order by a,b,c) group by a,b;
ERROR 1248 (42000): Every derived table must have its own alias
mysql> select * from (select a,b,c from test  order by a,b,c) b group by a,b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+------+------+------+
2 rows in set (0.00 sec)

select * from (select a,b,c from test  order by a,b,c) b group by a,b;
这样的sql 在orace 下是行不通的
但是在mysql 下起到了意向不到的效果

相当得到了 a,b两列分组后的 c的最小值  ，如果排序相反，还可以得到最大值，

如:
mysql> select * from (select a,b,c from test  order by a,b,c desc) b group by a,b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 3 |
| 1 | 2 | 6 |
+------+------+------+
2 rows in set (0.00 sec)

是bug  还是 mysql的特性

刚用mysql 请指教？

作者: carcase 时间: 2009-6-18 10:13
在oracle下测试：

SQL> select a,b,c from (select a,b,c,row_number() over(partition by a,b order by a,b,c desc) rn from test) where rn=1;

      A       B       C
---------- ---------- ----------
      1       1       3
      1       2       6

SQL> select * from test;

      A       B       C
---------- ---------- ----------
      1       1       1
      1       1       2
      1       1       3
      1       2       3
      1       2       6
      1       2       1

6 rows selected

SQL> select a,b,c from (select a,b,c,row_number() over(partition by a,b order by a,b,c desc) rn from test) where rn=1;

      A       B       C
---------- ---------- ----------
      1       1       3
      1       2       6

SQL> select * from (select a,b,c from test  order by a,b,c desc) b group by a,b
  2  ;

select * from (select a,b,c from test  order by a,b,c desc) b group by a,b

ORA-00979: 不是 GROUP BY 表达式

oracle 还需要用到分析函数，mysql  用group by 就实现了部分功能

奇怪哦
请指教
如果不是mysql的bug ，我就用这个功能了

作者: 巫婆_ 时间: 2009-6-18 10:55
看看

作者: carcase 时间: 2009-6-18 11:38
论坛的人气好差哦 mysql用的人这么少吗？？

作者: hjwb1216 时间: 2009-6-18 16:37
首先确认楼主的问题
1.这个不是mysql的bug，这个是mysql对group by功能的扩展。
2.得到的不是你说的最小值和最大值之类的。

对于group by在mysql中的使用和Oracle的差异性很大，准确的说不光和Oracle和别的数据库差异性一样，这些有点不太遵循标准SQL。我们知道常规的sql，对于group by来说一定要结合聚合函数，而且选择的字段除了聚合函数外，还必须在group by中出现，否则报错，但是在mysql中扩展了这样的功能
首先对于不加聚合函数的sql来说，它的功能结合了limit来得出结果，仔细想想的时候有点Oracle分析函数的味道，limit的时候得到的并不是最大最小的值，而是某一下group by结果集中第一行，也就是刚才说的相当与先group by，然后在每个group by下面进行limit 1。
其次，刚才还说了常规的group by结合聚合函数的时候，选择的字段除了聚合函数外，必须在group by中存在，但是在mysql中不是这样了，它具有隐含字段的功能，例如：

(root:im-mysql:16:34:45)[test]> select a,b,c,count(*) from test1 group by a,b;
+------+------+------+----------+
| a | b | c | count(*) |
+------+------+------+----------+
| 1 | 1 | 1 | 3 |
| 1 | 2 | 5 | 3 |
+------+------+------+----------+

对于没有选择的字段，上面是c，c的值还是和上面说到的一样，是根据得到的结果集然后根据每个group by 进行limit 1得到的结果。

作者: jinguanding 时间: 2009-6-20 20:01
标题: 回复 #5 hjwb1216 的帖子
兄弟分析的非常对......LZ感怀疑也是好事情

作者: carcase 时间: 2009-6-22 15:27
谢谢大家解答

其实问这个问题的目的是解决一个问题
现有数据
mysql> select * from test;
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 1 | 2009-06-22 15:04:55 |
| 1 | 1 | 2 | 2009-06-22 15:05:09 |
| 1 | 1 | 3 | 2009-06-22 15:05:16 |
| 1 | 2 | 5 | 2009-06-22 15:05:32 |
| 1 | 2 | 1 | 2009-06-22 15:05:47 |
| 1 | 2 | 6 | 2009-06-22 15:06:06 |
+------+------+------+---------------------+
6 rows in set (0.00 sec)

怎样通过sql语句，进行一次表扫描得到
根据a,b 列 groupby ，取出c的最小值，同时取出c的最小值时的时间
相当于得到这两行数据

  1 | 1 | 1 | 2009-06-22 15:04:55
1 | 2 | 1 | 2009-06-22 15:05:47 |

如果上面的问题能解决，看看能不能，进行一次全扫描取出
根据a,b group by,取出 c的最小值，最大值，同时取出c的最小值和最大值的时间
类似于
这样的数据
a    b    最小值    最小值的时间    最大值       最大值的时间
1 | 1 | 1 | 2009-06-22 15:04:55 3 | 2009-06-22 15:05:16
   1 | 2 | 1 | 2009-06-22 15:05:47  6 | 2009-06-22 15:06:06

谢谢大家

作者: carcase 时间: 2009-6-22 15:40
to  hjwb1216 ;
谢谢确认，因为初次用觉得奇怪，非常感谢!
我可以放心的用这个功能了

是根据得到的结果集然后根据每个group by 进行limit 1得到的结果  这句话好啊

我就是想利用下这个limit 1 的功能

mysql> select * from test;
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 1 | 2009-06-22 15:04:55 |
| 1 | 1 | 2 | 2009-06-22 15:05:09 |
| 1 | 1 | 3 | 2009-06-22 15:05:16 |
| 1 | 2 | 5 | 2009-06-22 15:05:32 |
| 1 | 2 | 1 | 2009-06-22 15:05:47 |
| 1 | 2 | 6 | 2009-06-22 15:06:06 |
+------+------+------+---------------------+
6 rows in set (0.00 sec)

怎样通过sql语句，进行一次表扫描得到
根据a,b 列 groupby ，取出c的最小值，同时取出c的最小值时的时间
相当于得到这两行数据

  1 | 1 | 1 | 2009-06-22 15:04:55
1 | 2 | 1 | 2009-06-22 15:05:47 |

为了实现上面的功能
我用了limit 1 的功能
测试结果如下：
mysql> select * from test;
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 1 | 2009-06-22 15:04:55 |
| 1 | 1 | 2 | 2009-06-22 15:05:09 |
| 1 | 1 | 3 | 2009-06-22 15:05:16 |
| 1 | 2 | 5 | 2009-06-22 15:05:32 |
| 1 | 2 | 1 | 2009-06-22 15:05:47 |
| 1 | 2 | 6 | 2009-06-22 15:06:06 |
+------+------+------+---------------------+
6 rows in set (0.00 sec)

mysql> select * from (select a,b,c,d from test order by a,b,c) t group by a,b;          得到了最小值和最小值时的时间
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 1 | 2009-06-22 15:04:55 |
| 1 | 2 | 1 | 2009-06-22 15:05:47 |
+------+------+------+---------------------+
2 rows in set (0.00 sec)

mysql> select * from (select a,b,c,d from test order by a,b,c desc) t group by a,b;  得到了最大值和最大值的时间
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 3 | 2009-06-22 15:05:16 |
| 1 | 2 | 6 | 2009-06-22 15:06:06 |
+------+------+------+---------------------+
2 rows in set (0.00 sec)

虽然方法有点土，的确实现了这个功能;

不知这种方法效率如何
相比以下的方法：
mysql> select * from test where (a,b,c) in(select a,b,min(c) from test group by a,b);  取得最小值和最小值的时间
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 1 | 2009-06-22 15:04:55 |
| 1 | 2 | 1 | 2009-06-22 15:05:47 |
+------+------+------+---------------------+
2 rows in set (0.00 sec)

mysql> select * from test where (a,b,c) in(select a,b,max(c) from test group by a,b);  得到了最大值和最大值的时间
+------+------+------+---------------------+
| a | b | c | d                |
+------+------+------+---------------------+
| 1 | 1 | 3 | 2009-06-22 15:05:16 |
| 1 | 2 | 6 | 2009-06-22 15:06:06 |
+------+------+------+---------------------+

因为用惯了oracle，总是不用重复扫描表
那个大哥帮帮忙看看谢谢了

作者: carcase 时间: 2009-6-22 15:55
根据a,b group by,取出 c的最小值，最大值，同时取出c的最小值和最大值的时间
类似于
这样的数据
a    b    最小值    最小值的时间    最大值       最大值的时间
1 | 1 | 1 | 2009-06-22 15:04:55 3 | 2009-06-22 15:05:16
   1 | 2 | 1 | 2009-06-22 15:05:47  6 | 2009-06-22 15:06:06

为了实现上面的功能
我用如下两种方式，就是把最大值，最小值的的方式进行连接，效率一定是低的

方法如下：
mysql> select t1.a,
->       t1.b,
->       t1.c '最小值',
->       t1.d '取得最小值的时间',
->       t2.c '最大值',
->       t2.d '取得最大值的时间'
-> from
-> (select a,b,c,d from (select a,b,c,d from test order by a,b,c) t group by a,b) t1,
-> (select a,b,c,d from (select a,b,c,d from test order by a,b,c desc) t group by a,b) t2
->  where t1.a = t2.a
-> and t1.b = t2.b;
+------+------+--------+---------------------+--------+---------------------+
| a | b | 最小值 | 取得最小值的时间 | 最大值 | 取得最大值的时间 |
+------+------+--------+---------------------+--------+---------------------+
| 1 | 1 |    1 | 2009-06-22 15:04:55 |    3 | 2009-06-22 15:05:16 |
| 1 | 2 |    1 | 2009-06-22 15:05:47 |    6 | 2009-06-22 15:06:06 |
+------+------+--------+---------------------+--------+---------------------+
2 rows in set (0.00 sec)

mysql> select t1.a,
->       t1.b,
->       t1.c '最小值',
->       t1.d '取得最小值的时间',
->       t2.c '最大值',
->       t2.d '取得最大值的时间'
-> from (select a, b, c, d from test where (a, b, c) in (select a, b, min(c) from test group by a, b)) t1,
->       (select a, b, c, d from test where (a, b, c) in (select a, b, max(c) from test group by a, b)) t2
->  where t1.a = t2.a
-> and t1.b = t2.b;
+------+------+--------+---------------------+--------+---------------------+
| a | b | 最小值 | 取得最小值的时间 | 最大值 | 取得最大值的时间 |
+------+------+--------+---------------------+--------+---------------------+
| 1 | 1 |    1 | 2009-06-22 15:04:55 |    3 | 2009-06-22 15:05:16 |
| 1 | 2 |    1 | 2009-06-22 15:05:47 |    6 | 2009-06-22 15:06:06 |
+------+------+--------+---------------------+--------+---------------------+
2 rows in set (0.00 sec)

看了一下执行计划
mysql> explain
-> select t1.a,
->       t1.b,
->       t1.c '最小值',
->       t1.d '取得最小值的时间',
->       t2.c '最大值',
->       t2.d '取得最大值的时间'
-> from (select a, b, c, d from test where (a, b, c) in (select a, b, min(c) from test group by a, b)) t1,
->       (select a, b, c, d from test where (a, b, c) in (select a, b, max(c) from test group by a, b)) t2
->  where t1.a = t2.a
-> and t1.b = t2.b;
+----+--------------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type       | table    | type | possible_keys | key  | key_len | ref  | rows | Extra                         |
+----+--------------------+------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY          | <derived2> | ALL  | NULL       | NULL | NULL | NULL | 2 |                               |
|  1 | PRIMARY          | <derived4> | ALL  | NULL       | NULL | NULL | NULL | 2 | Using where; Using join buffer  |
|  4 | DERIVED          | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using where                   |
|  5 | DEPENDENT SUBQUERY | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
|  2 | DERIVED          | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using where                   |
|  3 | DEPENDENT SUBQUERY | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
+----+--------------------+------------+------+---------------+------+---------+------+------+---------------------------------+
6 rows in set (0.01 sec)

mysql> explain
-> select t1.a,
->       t1.b,
->       t1.c '最小值',
->       t1.d '取得最小值的时间',
->       t2.c '最大值',
->       t2.d '取得最大值的时间'
-> from
-> (select a,b,c,d from (select a,b,c,d from test order by a,b,c) t group by a,b) t1,
-> (select a,b,c,d from (select a,b,c,d from test order by a,b,c desc) t group by a,b) t2
->  where t1.a = t2.a
-> and t1.b = t2.b;
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra                         |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY    | <derived2> | ALL  | NULL       | NULL | NULL | NULL | 2 |                               |
|  1 | PRIMARY    | <derived4> | ALL  | NULL       | NULL | NULL | NULL | 2 | Using where; Using join buffer  |
|  4 | DERIVED    | <derived5> | ALL  | NULL       | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
|  5 | DERIVED    | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using filesort                |
|  2 | DERIVED    | <derived3> | ALL  | NULL       | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
|  3 | DERIVED    | test    | ALL  | NULL       | NULL | NULL | NULL | 6 | Using filesort                |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
6 rows in set (0.00 sec)

我觉得相对于oracle 而言这两种方法效率太差了
当时偶水平有限，
那位兄弟帮帮忙，有没有一次性就可以取出这样的结果出来
不要扫描表那么多次
谢谢了

作者: congcong2010 时间: 2010-11-16 17:01
今天也正好碰到了这个问题，非常疑惑，原来是mysql扩展了group by的原因

欢迎光临 ITPUB论坛－专业的IT技术社区 (http://www.itpub.net/)