|
本帖最后由 Cherish_j_wang 于 2012-3-9 20:59 编辑
之利用Hashkey解决宽字段检索的性能问题 场景 就像你所知道的那样,宽字段(我们暂且将长度超过2000的字段)是不适合建立Index的,就算建立了Index对IO的消耗也不会少。那么,当我们不得不对超宽字段Search时(where col=constant),那么这种情况,我们如何来做优化呢?
优化思路
上面情形的优化灵感来自于Join中的Hash Join,我们是否可以建立一个Hash key字段,这个字段来自于宽字段的hash函数值,然后我们在该字段上建立Index。如此以来,我们就可以将常量的Hash值与表中的Hash值进行匹配,这样就可以大大提高查询效率。
根据这个优化思路,我们有多种方法来实现:
1. 使用Trigger来自动维护HashKey:这个太复杂,至少要实现更新和插入触发器
2. 使用计算列的方式来自动维护HashKey:这个比较简单
3. 开发人员手动维护这个HashKey(也就是在Update和Insert宽字段的时,同时生成HashKey):这个是最简单的
注意:Hash Key很有可能存在Hash值对撞的可能性,我们下面会将如何解决这个问题
脚本
虽然我们实现的方法有多种,在此,我们以方法2为例,上脚本:
HashKey_V2.sql
(2.05 KB, 下载次数: 784)
- use test
- go
- --drop test table if exists
- if OBJECT_ID('dbo.test_for_hashkey','U') is not null
- drop table dbo.test_for_hashkey
- GO
- --create test table
- create table dbo.test_for_hashkey
- (
- id int identity(1,1) primary key
- ,SearchKeyword varchar(2000) not null
- ,SearchKeyword_hashkey as checksum(SearchKeyword) PERSISTED not null
- );
- --create index of hask key column
- create index ix_DBA_SearchKeyword_hashkey
- ON dbo.test_for_hashkey(SearchKeyword_hashkey);
- GO
- --data init.
- declare
- @loop int
- ,@do int
- ,@SearchKeyword varchar(2000)
- ;
- select
- @loop = 100000
- ,@do = 1
- ;
- while @do <= @loop
- begin
- set
- @SearchKeyword = REPLICATE(newid(),55)
- ;
- insert into dbo.test_for_hashkey
- select @SearchKeyword
- ;
- set @do = @do + 1
- end
- go
- -------------------performance testing
- declare
- @SearchKeyword varchar(2000)
- ,@SearchKeyword_hashkey int
- ;
- select TOP 1
- @SearchKeyword = SearchKeyword
- FROM dbo.test_for_hashkey WITH(NOLOCK)
- where id = 20120;
- set
- @SearchKeyword_hashkey = CHECKSUM(@SearchKeyword)
- ;
- --statistics time & io opening
- set statistics time on
- set statistics io on
- select *
- FROM dbo.test_for_hashkey WITH(NOLOCK)
- where SearchKeyword = @SearchKeyword
- ;
- /*如果直接使用SearchKeyword来检索,性能消耗
- Table 'test_for_hashkey'. Scan count 3, logical reads 27602, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- SQL Server Execution Times:
- CPU time = 109 ms, elapsed time = 121 ms.
- */
- select *
- FROM dbo.test_for_hashkey WITH(NOLOCK)
- where SearchKeyword_hashkey = @SearchKeyword_hashkey
- --这个条件是为了防止hash对撞
- and SearchKeyword = @SearchKeyword
- ;
- /*如果使用hash key来检索,性能消耗
- Table 'test_for_hashkey'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- SQL Server Execution Times:
- CPU time = 0 ms, elapsed time = 0 ms.
- */
- set statistics time off
- set statistics io off
复制代码
说明
优化前
/*如果直接使用SearchKeyword来检索,性能消耗
Table 'test_for_hashkey'. Scan count 3, logical reads 27602, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 121 ms.
*/
优化后
/*如果使用hash key来检索,性能消耗
Table 'test_for_hashkey'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
*/
|
|