探讨一下这个sql的执行！

oldwain · 发表于 2001-10-10 10:23

1. 用等价的exists语句代替

SELECT COUNT(*)
FROM TABLE_A A
WHERE NOT EXISTS
( SELECT 1 FROM
   FROM TABLE_B B
   WHERE B.DH='XXX'
   AND SUBSTR(A.COL,1,7) = SUBSTR(B.COL, 1, 7)
)

注意在a.col, b.col可能为null时与原语句语义上的差别, 并考虑使用外连接.

2. 建立基于SUBSTR(A.COL, 7, 1), SUBSTR(B.COL,1,7)的function based index会有帮助

zhuyg · 发表于 2001-10-10 10:28

Not in is the slowest
use out join instead
Not exists is the fastest way to deal with your problem

SELECT COUNT(*)
FROM TABLE_A A
WHERE not exists (SELECT 'X'
                  FROM TABLE_B B
               WHERE B.DH='XXX' and
                     SUBSTR(A.COL,1,7) = SUBSTR(B.COL,1,7) )

create index on table_aa and table_bb but also consider to deal with the substr function since it will prevend u to use the index

kezizi · 发表于 2001-10-10 10:44

if you still need my help please run following sql and let me know the result and how long it takes

create table x as select substr(a.col,1,7) s ,count(*) c from a group by s
select count(*), sum(c) from x

ttdb · 发表于 2001-10-10 11:11

就本例来说，是
tidycc的minues快
还是oldwain的not exists快？

俺想知道结果的说^^

当然还可以比较 create function index前后的差别

alan_yang · 发表于 2001-10-10 11:53

最初由 kezizi 发布
[B]if you still need my help please run following sql and let me know the result and how long it takes

create table x as select substr(a.col,1,7) s ,count(*) c from a group by s
select count(*), sum(c) from x [/B]

The first sql takes about 1000 seconds
and the second sql return follow results:
COUNT(S) SUM(C)
15       12126713

And kezizi,  are you have good advices?

BTW ,  谢谢各位大虾的鼎力相助！
如果我的语句还要统计其他项目，如统计其他列的和呢？
例如：
SELECT COUNT(*),SUM(A.COL2)
   FROM TABLE_A A
   WHERE  .......(省略)

kezizi · 发表于 2001-10-10 12:17

count(*) = 15?!!, so small?

1. create index on table_a and/or table_b will not help because of the table size, plus it creates trouble later when data is updated
2. the different between 'not in' and 'exists' is a good point where I didn't pay attention. acturelly Internally in any kind of database, the 'exists' is use the same algorithm (in certain degree) to get data as the 'method 2' i mentioned before. ( use sort join or other join instead of inner loop join as in 'not in')

if 30 minutes is acceptable, then just do following (much better than several hours).
by the way, There following sql can be optimized as well, but I don't believe any tuning will speed it up for another 10 times. because the following sql perform full table scan for 2 times (on two table) and perform a join on small tables whose time can be ignored. Mathematiclly there are no other significant algorithm better than this one

create table x as select substr(a.col,1,7) s ,count(*) c from a group by s
create table y as select distinct substr(b.col,1,7) s from b where b.dh = 'xxx'
select sum(c) from x where not exists (select 1 from x,y where x.s = y.s)

kezizi · 发表于 2001-10-10 12:19

by the way the sum(c) is the result you want

alan_yang · 发表于 2001-10-10 12:32

呵呵，你说15 太少？那是因为我是按substr（a.col1,1,7）
去分组统计的，如果不是那就和后面那个一千多万的数是一样的？

非常感谢！在您的英明指导下问题已经解决！

非常想跟你多学习！因为平时处理大数据量的查询统计工作
比较多。

kezizi · 发表于 2001-10-10 12:36

I was wrong. create index on b(bh) and/or b(col,1,7) will help on step two. and save you a few minutes.

you can also save several minutes by tuning configuration, fragmentation and refine the sql to speed the query up, but not easy to speed it up for another 10 times.

alan_yang · 发表于 2001-10-10 12:41

最初由 kezizi 发布
[B]I was wrong. create index on b(bh) and/or b(col,1,7) will help on step two. and save you a few minutes.

you can also save several minutes by tuning configuration, fragmentation and refine the sql to speed the query up, but not easy to speed it up for another 10 times. [/B]

run the query  takes about 30 minutes !
Thanks a lot!
BTW, you may check you message!
Can we take a secret  talk ?
Hope to make friends with you !
And hope to learn  more from you about the sql tunning !