关于散列分区表删除分区的问题

ZALBB · 发表于 2012-9-29 21:06

gszoracle 发表于 2012-9-29 20:55
我最近一直没实验环境，做不了试验。
用200条不同记录插入到8个分区的表中，观察记录分布；
再把同样20 ...

假设2的N次幂时数据的分布是均衡，问，删除某个分区后，此分区里的数据一起被删除了，这时剩余分区数里
的数据量还是均衡的，又何来不均衡而需要移动各分区中的数据？这说法不成立，可能原因还是我上面提的，
非2的N次幂，导致后来的数据分布不均衡，所以ORACLE只允许TRUNCATE，而不许DROP，当然，这隐然包
含一个前提，原来的分区数是2的N次幂。

gszoracle · 发表于 2012-9-29 21:17

ZALBB 发表于 2012-9-29 21:06
假设2的N次幂时数据的分布是均衡，问，删除某个分区后，此分区里的数据一起被删除了，这时剩余分区数里
...

按照Oracle的箅法，此时分区间数据是不均衡的。
假如是两个分区，每个分区8条记录，数据是均衡的。若增加一个分区，oracle只是把第一个分区中的记录按照算法一分为二，4条记录继续留在分区1中，另4条记录进入分区3中，分区2还是8条记录，三个分区记录数不同，在Oracle看来是均衡的。反上非2的n次幂分区数中，若记录数相同，则是不均衡的。

ZALBB · 发表于 2012-9-30 08:59

gszoracle 发表于 2012-9-29 21:17
按照Oracle的箅法，此时分区间数据是不均衡的。
假如是两个分区，每个分区8条记录，数据是均衡的。若增加 ...

文档提示，当HASH分区数为POWER(2，N)，数据分布最佳。因此，照你的想法，若删除分区后，会自动均衡其余分区数据，那其实ORACLE应该在用户建表时，就禁止其创建非POWER(2，N)个数的HASH分区，这样才能真正保证数据均衡分布。

gszoracle · 发表于 2012-9-30 13:16

ZALBB 发表于 2012-9-30 08:59
文档提示，当HASH分区数为POWER(2，N)，数据分布最佳。因此，照你的想法，若删除分区后，会自动均衡其余分 ...

当HASH分区数不为POWER(2，N)时，数据分布不佳。可以把分区数看成POWER(2，N)+m，0<=m<=POWER(2，N),m=0时分区数为POWER(2，N)，m=POWER(2，N),分区数为POWER(2，N+1）；当0<=m<POWER(2，N）时每增加一个分区，只是把前面的某一个分区一分为二。我理解2的N次幂，N不同时，数据库内部的分布算法不一定相同。只是一个问题，当删除一个分区后，再插入此分区的数据，该往哪个分区插？

Yong Huang · 发表于 2012-10-1 04:56

Of all books or articles I read, Jonathan Lewis's "Practical Oracle8i" (pp.251-258) is still the best in describing the algorithm of how data are distributed in hash partitions, which partition's data is moved to a new partition when adding a partition, and which two partitions are "merged" when coalescing.

First, the number of values has to be large for data to be close to even distribution. So if you test, make sure you have at least a few thounsand, not 8条记录 for instance. And even thounsands of values does not mean they're absolutely evenly split.

Second, the book says you can use dbms_utility.get_hash_value to predict which partition to go to, for a string type column. Use base 0. And the result plus one is the partition number.

When you add a new hash partition, the partition whose data is partially moved to the new partition is the remainder (余数, as m in gszoracle's message #24) plus one. For instance, first you have 6 partitions. 6=4+2. When you add a new partition, it's partition 3 (i.e. 2+1) that contributes some rows to the new partition 7.

When you coalesce, it's the top partition and the remainder-numered partition. In the above case (without adding partition 7), it's 6 and 2 being merged.

Jonathan doesn't think dropping a hash partition would "make much sense". Unfortunately, he mentioned truncating a hash partition in the same paragraph without comments on why a truncate is allowed, by design.

gszoracle · 发表于 2012-10-1 22:08

Yong Huang 发表于 2012-10-1 04:56
Of all books or articles I read, Jonathan Lewis's "Practical Oracle8i" (pp.251-258) is still the bes ...

in 10g ，ora_hash() can instead of DBMS_utility.get_hash_value()

gszoracle · 发表于 2012-10-1 22:12

非常荣幸，能和两位版主讨论！

Yong Huang · 发表于 2012-10-2 01:08

> in 10g ，ora_hash() can instead of DBMS_utility.get_hash_value()

Thanks for that info. Indeed dbms_utility.get_hash_value('Hello', 0 , x) returns
0 when x=1
1 when x=2
2 when x=3
3 when x=4
0 when x=5
5 when x=6
2 when x=7
3 when x=8
and ora_hash('Hello', x) returns the same values respectively.

Actually, I read Jonathan's book but I didn't test the hash partition assignment.

gszoracle · 发表于 2012-10-2 16:33

Yong Huang 发表于 2012-10-2 01:08
> in 10g ，ora_hash() can instead of DBMS_utility.get_hash_value()

Thanks for that info. Indeed d ...

几年前，曾处理一个海量数据项目，计划通过直接路径导入和分区交换的方式，把数据导入到hash分区表。为控制数据能导入到正确的分区中，做过这方面测试。我现在理解也基于那时的测试，理解的不一定对，表述的也不一定准确，请大家批评指正。

Bluefox_ora · 发表于 2012-10-6 10:23

看了大家的回复，我觉得散列分区的删除还是考虑到了代价的原因，删除一个分区，肯定是需要把剩下的所有数据重新均衡，但是增加一个分区的话，只会把之前的某个分区给拆成两个，这比删除的代价小多了，所以oracle实现了添加而没有实现删除。

[讨论] 关于散列分区表删除分区的问题

浏览过的版块