|
Indexing Only Some of the Rows
In addition to transparently helping out queries that use built-in functions like UPPER, LOWER, and so on,
function-based indexes can be used to selectively index only some of the rows in a table. As we’ll discuss
a little later, B*Tree indexes do not contain entries for entirely NULL keys. That is, if you have an index I
on a table T
Create index I on t(a,b);
and you have a row where A and B are both NULL, there will be no entry in the index structure. This
comes in handy when you are indexing just some of the rows in a table.
Consider a large table with a NOT NULL column called PROCESSED_FLAG that may take one of two
values, Y or N, with a default value of N. New rows are added with a value of N to signify not processed, and
as they are processed, they are updated to Y to signify processed. We would like to index this column to
be able to retrieve the N records rapidly, but there are millions of rows and almost all of them are going to
have a value of Y. The resulting B*Tree index will be large, and the cost of maintaining it as we update
from N to Y will be high. This table sounds like a candidate for a bitmap index (this is low cardinality, after
all!), but this is a transactional system and lots of people will be inserting records at the same time with
the processed column set to N and, as we discussed earlier, bitmaps are not good for concurrent
modifications. When we factor in the constant updating of N to Y in this table as well, then bitmaps
would be out of the question, as this process would serialize entirely.
So, what we would really like is to index only the records of interest (the N records). We’ll see how to
do this with function-based indexes, but before we do, let’s see what happens if we just use a regular
index. Using the standard BIG_TABLE script described in the setup section, we’ll update the TEMPORARY
column, flipping the Ys to Ns and the Ns to Ys
ops$tkyte@ORA11GR2> update big_table set temporary = decode(temporary,'N','Y','N');
1000000 rows updated.
and we’ll check out the ratio of Ys to Ns
ops$tkyte@ORA11GR2> select temporary, cnt,
2 round( (ratio_to_report(cnt) over ()) * 100, 2 ) rtr
3 from (
4 select temporary, count(*) cnt
5 from big_table
6 group by temporary
7 )
8 /
T CNT RTR
- ---------- ----------
N 1779 .18
Y 998221 99.82
As we can see, of the 1,000,000 records in the table, only about one-fifth of 1 percent of the data
should be indexed. If we use a conventional index on the TEMPORARY column (which is playing the role of
the PROCESSED_FLAG column in this example), we would discover that the index has 1,000,000 entries,
consumes almost 14MB of space, and has a height of 3:
ops$tkyte@ORA11GR2> create index processed_flag_idx
2 on big_table(temporary);
Index created.
ops$tkyte@ORA11GR2> analyze index processed_flag_idx
2 validate structure;
Index analyzed.
ops$tkyte@ORA11GR2> select name, btree_space, lf_rows, height
2 from index_stats;
NAME BTREE_SPACE LF_ROWS HEIGHT
------------------------------ ----------- ---------- ----------
PROCESSED_FLAG_IDX 14528892 1000000 3
Any retrieval via this index would incur three I/Os to get to the leaf blocks. This index is not only
wide, but also tall. To get the first unprocessed record, we will have to perform at least four I/Os (three
against the index and one against the table).
How can we change all of this? We need to make it so the index is much smaller and easier to
maintain (with less runtime overhead during the updates). Enter the function-based index, which
allows us to simply write a function that returns NULL when we don’t want to index a given row and
returns a non-NULL value when we do. For example, since we are interested just in the N records, let’s
index just those:
ops$tkyte@ORA11GR2> drop index processed_flag_idx;
Index dropped.
ops$tkyte@ORA11GR2> create index processed_flag_idx
2 on big_table( case temporary when 'N' then 'N' end );
Index created.
ops$tkyte@ORA11GR2> analyze index processed_flag_idx
2 validate structure;
Index analyzed.
ops$tkyte@ORA11GR2> select name, btree_space, lf_rows, height
2 from index_stats;
NAME BTREE_SPACE LF_ROWS HEIGHT
------------------------------ ----------- ---------- ----------
PROCESSED_FLAG_IDX 40012 1779 2
That is quite a difference—the index is some 40KB, not 14MB. The height has decreased as well. If
we use this index, we’ll perform one less I/O than we would using the previous taller index. |
|