|
count(*) = 15?!!, so small?
1. create index on table_a and/or table_b will not help because of the table size, plus it creates trouble later when data is updated
2. the different between 'not in' and 'exists' is a good point where I didn't pay attention. acturelly Internally in any kind of database, the 'exists' is use the same algorithm (in certain degree) to get data as the 'method 2' i mentioned before. ( use sort join or other join instead of inner loop join as in 'not in')
if 30 minutes is acceptable, then just do following (much better than several hours).
by the way, There following sql can be optimized as well, but I don't believe any tuning will speed it up for another 10 times. because the following sql perform full table scan for 2 times (on two table) and perform a join on small tables whose time can be ignored. Mathematiclly there are no other significant algorithm better than this one
create table x as select substr(a.col,1,7) s ,count(*) c from a group by s
create table y as select distinct substr(b.col,1,7) s from b where b.dh = 'xxx'
select sum(c) from x where not exists (select 1 from x,y where x.s = y.s) |
|