|
PostgreSQL 中英文分词的性能,以及带索引的情况下的插入,查询性能。
测试环境:
CPU:Intel(R) Xeon(R) CPU X7460 @ 2.66GHz
MEM:60G
OS:CentOS 6.x x64
PostgreSQL 9.4.1
中文分词使用 http://blog.163.com/digoal@126/b ... 040201422410175698/
性能数据汇总:
英语分词性能:~ 900万 words每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
中文分词性能:~ 400万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
英文分词+插入性能:~ 666万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
中文分词+插入性能:~ 290万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
查询性能和查询条件,数据量都有关系,没有很好的评估标准,大多数查询可以在毫秒级返回。
分词性能测试过程如下:
为了测试效果,首先要修改to_tsvector的函数稳定性,避免一次计算多次调用。
对稳定性不了解的话,介绍请参考我以前写的文章或视频。
digoal=> \df+ to_tsvector
List of functions
Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source
code | Description
------------+-------------+------------------+---------------------+--------+----------+------------+----------+----------+---------
---------+-----------------------
pg_catalog | to_tsvector | tsvector | regconfig, text | normal | invoker | immutable | postgres | internal | to_tsvec
tor_byid | transform to tsvector
pg_catalog | to_tsvector | tsvector | text | normal | invoker | stable | postgres | internal | to_tsvec
tor | transform to tsvector
(2 rows)
digoal=# alter function to_tsvector(regconfig,text) stable;
ALTER FUNCTION
Time: 0.632 ms
创建英语和中文的测试函数:
英文272个单词,中文包含291个字或单词.
create or replace function f_testts_en() returns void as $$
declare
begin
perform to_tsvector('english', 'Before you can use PostgreSQL you need to install it, of course. It is possible that PostgreSQL is already installed at your site, either because it was included in your operating system distribution or because the system administrator already installed it. If that is the case, you should obtain information from the operating system documentation or your system administrator about how to access PostgreSQL.
If you are not sure whether PostgreSQL is already available or whether you can use it for your experimentation then you can install it yourself. Doing so is not hard and it can be a good exercise. PostgreSQL can be installed by any unprivileged user; no superuser (root) access is required.
If you are installing PostgreSQL yourself, then refer to Chapter 15 for instructions on installation, and return to this guide when the installation is complete. Be sure to follow closely the section about setting up the appropriate environment variables.
If your site administrator has not set things up in the default way, you might have some more work to do. For example, if the database server machine is a remote machine, you will need to set the PGHOST environment variable to the name of the database server machine. The environment variable PGPORT might also have to be set. The bottom line is this: if you try to start an application program and it complains that it cannot connect to the database, you should consult your site administrator or, if that is you, the documentation to make sure that your environment is properly set up. If you did not understand the preceding paragraph then read the next section.');
return;
end;
$$ language plpgsql strict;
分词结果:
'15':126 'access':63,113 'administr':38,59,158,242 'alreadi':19,39,73 'also':209 'applic':225 'appropri':152 'avail':74 'bottom':21
5 'cannot':232 'case':46 'chapter':125 'close':145 'complain':229 'complet':140 'connect':233 'consult':239 'cours':12 'databas':180
,201,236 'default':166 'distribut':33 'document':55,249 'either':24 'environ':153,194,205,255 'exampl':177 'exercis':101 'experiment
':83 'follow':144 'good':100 'guid':135 'hard':94 'includ':28 'inform':50 'instal':9,20,40,87,105,119,130,138 'instruct':128 'line':
216 'machin':182,186,203 'make':251 'might':169,208 'name':198 'need':7,189 'next':271 'obtain':49 'oper':31,53 'paragraph':267 'pgh
ost':193 'pgport':207 'possibl':15 'postgresql':5,17,64,71,102,120 'preced':266 'program':226 'proper':257 'read':269 'refer':123 'r
emot':185 'requir':115 'return':132 'root':112 'section':147,272 'server':181,202 'set':149,161,191,213,258 'site':23,157,241 'start
':223 'superus':111 'sure':69,142,252 'system':32,37,54,58 'thing':162 'tri':221 'understand':264 'unprivileg':108 'use':4,79 'user'
:109 'variabl':154,195,206 'way':167 'whether':70,76 'work':173
(1 row)
create or replace function f_testts_zh() returns void as $$
declare
begin
perform to_tsvector('testzhcfg','
<PostgreSQL发展历程,社区介绍,资源介绍 2小时>
<PostgreSQL SQL基础 2天>
第一天
PostgreSQL的安装, 架构基础介绍, 如何创建和访问数据库;
SQL语言基础(创建表, 操作表数据, 表关联查询, 聚合查询);
SQL高级应用(视图, 外键, 事务, 窗口函数, 继承);
SQL语义结构介绍(关键字, 常量, 操作符, 特殊字符, 注释, 操作符的优先级);
值表达式介绍(列, 位置参数, 数组下标, field引用, 操作符调用, 函数调用, 聚合表达式, 窗口函数调用, 类型转换, collate表达式, 标量子查询, 数组构造器, 行构造器, 表达式逻辑运算规则);
函数调用用法介绍(位置参数调用, 命名参数调用, 混合调用);
数据定义(默认值, 约束, 系统列, 表结构和定义修改, 权限, schema, 继承, 分区, 外部数据);
第二天
数据查询(from子句, where子句, group by having子句, 窗口子句, select子句, 结果集合操作, 排序, 限制, 位移, values子句, CET用法);
数据类型(数字, 货币, 字符, 字节, 时间, 布尔, 枚举, 几何, 网络, 比特, 文本搜索, UUID, XML, JSON, 数组, 复合类型, 范围类型, 对象类型, 伪类型);
函数与操作符(逻辑, 比较, 数学运算, 字符串, 字节流, 比特位, 样式匹配, 格式化, 时间, 枚举, 几何, 网络, 全文检索, XML, JSON, 序列, 条件, 数组, 范围, 聚合, 窗口, 子查询, 行与数组的比较, SRF函数, 系统信息函数, 系统管理函数, 触发器函数, 事件触发器函数);
类型转换;
索引用法;
全文检索用法;
并行控制;
SQL性能优化;
<PostgreSQL 数据库管理 3天>
第一天
PostgreSQL源代码安装(源码结构介绍, 配置介绍, 自定义FLAG介绍, 内核参数优化, 安装);
数据库服务器配置(OS USER, 初始化集群, 启动数据库集群, 关闭数据库集群, 升级数据库软件, 防止欺骗, 数据封装方法, SSL, SSH隧道)
数据库集群进程结构介绍(postmaster, logger, checkpointer, writer, wal writer, autovacuum launcher, autovacuum worker, stats collector, backend, worker);
认证和连接(客户端认证方法介绍, 认证配置, 常用认证方法使用举例, 常见认证错误排错);
pgAdmin III 使用介绍(安装, 使用, 调试函数, 维护数据);
数据库配置(guc.c, 参数优先级, 参数含义, 开发参数, guc隐含参数, 如何修改参数并使之生效, 如何查看当前参数值, 如何查看参数值的范围);
用户管理(创建角色, 角色权限管理, 角色membership管理, 角色成员权限管理);
数据库逻辑结构(cluster, database, schema, object, field, access privilege);
数据库物理结构(tablespace, datafile, segment, block, controlfile, xlog, archivelog);
第二天
数据库管理(创建表空间, 创建数据库, 创建数据库模板, 数据库配置, 删除数据库, 跨数据库的数据访问);
基于角色的对象权限管理(表, 列, 序列, 数据库, 域, 外部数据, 函数, 语言, 大对象, schema, 表空间, 类型);
默认权限和继承权限(default privilege, inherit privilege);
数据库安全(认证, 网络, 数据存储, 密码, 注入, 欺骗);
数据库监控(状态监控, 趋势监控, 预警, 常用监控工具和插件nagios, zabbix, pg_statsinfo);
多版本并发控制和垃圾回收;
日常维护(垃圾回收, 数据重组, 索引维护, VM维护, 预防XID溢出, 日志维护);
第三天
数据迁移(逻辑备份和还原, 增量数据迁移的方法, 异构数据迁移的方法(如oracle to postgresql));
如何打数据库补丁;
数据库版本升级(小版本升级方法, 大版本升级方法);
建模与Benchmark(如何根据业务形态建立数据库测试模型, 测试工具的使用, 根据测试模型和硬件标准输出benchmark);
数据库日志分析(错误代码介绍, 日志的筛选过滤, 日志的维护);
数据库审计(参数层面的审计开关, 定制化审计(如触发器结合HSTORE), 审计数据的维护手段);
本地化(字符集, collate, 编码转换, 如何防止乱码);
数据库性能分析(OS性能分析报表, SQL性能分析报表);
数据库巡检(如何定制巡检项, 指标, 如何分析巡检报告);');
return;
end;
$$ language plpgsql strict;
分词结果:
'access':352 'archivelog':363 'autovacuum':271,273 'backend':277 'benchmark':486,500 'block':360 'by':123 'c':309 'cet':137 'checkp
ointer':267 'cluster':347 'collate':75,530 'collector':276 'controlfile':361 'database':348 'datafile':358 'default':405 'field':63,
351 'flag':230 'from':118 'group':122 'guc':308,316 'having':124 'hstore':523 'iii':297 'inherit':407 'json':154,181 'launcher':272
'logger':266 'membership':338 'nagios':429 'object':350 'oracle':468 'os':239,537 'pg_statsinfo':431 'pgadmin':296 'postgresql':1,9,
15,215,221,470 'postmaster':265 'privilege':353,406,408 'schema':112,349,397 'segment':359 'select':127 'sql':10,23,35,43,212,540 's
rf':191 'ssh':258 'ssl':257 'stats':275 'tablespace':357 'to':469 'user':240 'uuid':152 'values':135 'vm':444 'wal':269 'where':120
'worker':274,278 'writer':268,270 'xid':447 'xlog':362 'xml':153,180 'zabbix':430 '下标':62 '业务':487 '举例':291 '乱码':534 '事件':
201 '事务':40 '二天':116,364 '介绍':5,7,19,46,57,91,226,228,231,264,284,299,506 '代码':505 '优先级':54,311 '优化':214,234 '位移':134
'位置':59,92 '使':321 '使用':290,298,301,494 '信息':194 '修改':110,319 '值':55,103,325,328 '全文检索':179,208 '关联':31 '关键字':47
'关闭':246 '内核':232 '几何':148,177 '函数':41,67,71,88,163,192,195,198,200,203,303,393 '分区':114 '分析':503,548 '列':58,106,388 '
创建':20,25,332,367,370,372 '初始化':241 '删除':377 '匹配':173 '升级':249,476,479,483 '历程':3 '参数':60,93,96,233,310,312,315,318,3
20,324,327,514 '发展':2 '含义':313 '启动':243 '命名':95 '回收':436,439 '垃圾':435,438 '域':391 '基础':11,18 '增量':458 '备份':456 '
合':156 '大':395,481 '天':12,14,218,220,452 '如':467,520 '子句':119,121,125,126,128,136 '字':170 '字符':51,143 '字符串':169 '字符集'
:529 '字节':144 '存储':414 '安全':410 '安装':16,223,235,300 '定义':101,109,229 '定制':518,545 '审计':513,516,519,524 '客户端':281 '
码':415 '对象':160,384,396 '封装':255 '小':477 '小时':8 '层面':515 '巡检':544,546,549 '工具':427 '布尔':146 '常用':287,425 '常见':29
2 '常量':48 '并发':433 '并行':210 '序列':182,389 '应用':37 '建模':485 '建立':489 '开关':517 '开发':314 '异':462 '引用':64 '形态':488
'性能':213 '性能分析':536,538,541 '成员':341 '手段':527 '打':471 '报告':550 '报表':539,542 '指标':547 '排序':132 '排错':295 '控制':
211,434 '插件':428 '搜索':151 '操作':27,131 '操作符':49,53,65,164 '数字':141 '数学':167 '数据':29,100,115,139,254,305,381,392,413,44
0,453,459,464,525 '数据库':22,216,236,244,247,250,260,306,344,354,365,371,373,375,378,380,390,409,418,472,474,490,501,512,535,543 '
据查询':117 '数组':61,80,155,184,189 '文本':150 '方法':256,283,289,461,466,480,484 '日志':449,502,507,510 '时间':145,175 '服务器':23
7 '本地化':528 '权限':111,335,342,385,402,404 '条件':183 '构':463 '构造器':81,83 '枚举':147,176 '架构':17 '查看':323,326 '查询':32,3
4,79,187 '标':77 '标准':498 '样式':172 '格式化':174 '模型':492,496 '模板':374 '欺骗':253,417 '比较':166,190 '注入':416 '注释':52 '测
试':491,495 '测试工具':493 '混合':98 '源代码':222 '源码':224 '溢出':448 '版本':432,475,478,482 '物理':355 '特殊':50 '状态':420 '生效
':322 '用户':330 '用法':90,138,207,209 '监控':419,421,423,426 '硬件':497 '社区':4 '空间':369,399 '第一':13,219 '第三':451 '筛选':508
'管理':197,217,331,336,339,343,366,386 '类型':73,140,157,159,161,162,204,400 '系统':105,193,196 '索引':206,442 '约束':104 '结合':52
2 '结构':45,108,225,263,346,356 '结果':129 '继承':42,113,403 '维护':304,437,443,445,450,511,526 '编码':531 '网络':149,178,412 '聚合'
:33,69,186 '节流':171 '范围':158,185,329 '行':82,188 '补丁':473 '表':26,28,30,107,368,387,398 '表达式':56,70,76,84 '规则':87 '视图':
38 '角色':333,334,337,340,383 '触发器':199,202,521 '认证':279,282,285,288,293,411 '访问':21,382 '语义':44 '语言':394 '语言基础':24 '
调用':66,68,72,89,94,97,99 '调试':302 '货币':142 '资源':6 '趋势':422 '跨':379 '转换':74,205,532 '软件':251 '输出':499 '迁移':454,460
,465 '过滤':509 '运算':86,168 '还原':457 '进程':262 '连接':280 '逻辑':85,165,345,455 '配置':227,238,286,307,376 '重组':441 '量子':78
'错误':294,504 '键':39 '防止':252,533 '限制':133 '隐含':317 '隧道':259 '集合':130 '集群':242,245,248,261 '预警':424 '预防':446 '高
':36 '默认':102,401
(1 row)
英文分词性能测试结果:
英语分词性能:~ 900万 words每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> vi test.sql
select f_testts_en();
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000
progress: 5.0 s, 33072.6 tps, lat 1.083 ms stddev 0.371
progress: 10.0 s, 33218.9 tps, lat 1.082 ms stddev 0.366
progress: 15.0 s, 33213.4 tps, lat 1.083 ms stddev 0.365
progress: 20.0 s, 33202.3 tps, lat 1.083 ms stddev 0.364
progress: 25.0 s, 33204.5 tps, lat 1.083 ms stddev 0.364
progress: 30.0 s, 33215.0 tps, lat 1.083 ms stddev 0.366
progress: 35.0 s, 33220.8 tps, lat 1.082 ms stddev 0.367
progress: 40.0 s, 33210.7 tps, lat 1.083 ms stddev 0.364
progress: 45.0 s, 33220.4 tps, lat 1.082 ms stddev 0.365
progress: 50.0 s, 33217.2 tps, lat 1.083 ms stddev 0.365
progress: 55.0 s, 33231.6 tps, lat 1.082 ms stddev 0.367
progress: 60.0 s, 33234.4 tps, lat 1.082 ms stddev 0.367
progress: 65.0 s, 33232.7 tps, lat 1.082 ms stddev 0.367
progress: 70.0 s, 33212.2 tps, lat 1.083 ms stddev 0.369
progress: 75.0 s, 33232.4 tps, lat 1.082 ms stddev 0.367
progress: 80.0 s, 33222.7 tps, lat 1.082 ms stddev 0.367
progress: 85.0 s, 33221.1 tps, lat 1.082 ms stddev 0.365
progress: 90.0 s, 33220.4 tps, lat 1.082 ms stddev 0.365
progress: 95.0 s, 33213.6 tps, lat 1.083 ms stddev 0.364
progress: 100.0 s, 33227.5 tps, lat 1.082 ms stddev 0.368
progress: 105.0 s, 33230.8 tps, lat 1.082 ms stddev 0.366
progress: 110.0 s, 33228.1 tps, lat 1.082 ms stddev 0.366
中文分词性能测试结果:
中文分词性能:~ 400万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> vi test.sql
select f_testts_zh();
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000
progress: 5.0 s, 13874.6 tps, lat 2.582 ms stddev 0.892
progress: 10.0 s, 13947.0 tps, lat 2.580 ms stddev 0.875
progress: 15.0 s, 13949.0 tps, lat 2.580 ms stddev 0.874
progress: 20.0 s, 13950.1 tps, lat 2.579 ms stddev 0.871
progress: 25.0 s, 13951.1 tps, lat 2.579 ms stddev 0.876
progress: 30.0 s, 13950.6 tps, lat 2.579 ms stddev 0.873
progress: 35.0 s, 13950.5 tps, lat 2.579 ms stddev 0.877
progress: 40.0 s, 13951.8 tps, lat 2.579 ms stddev 0.876
progress: 45.0 s, 13953.2 tps, lat 2.579 ms stddev 0.878
progress: 50.0 s, 13949.9 tps, lat 2.579 ms stddev 0.876
progress: 55.0 s, 13947.4 tps, lat 2.580 ms stddev 0.875
progress: 60.0 s, 13947.8 tps, lat 2.580 ms stddev 0.877
progress: 65.0 s, 13948.4 tps, lat 2.580 ms stddev 0.877
progress: 70.0 s, 13952.1 tps, lat 2.579 ms stddev 0.878
progress: 75.0 s, 13948.1 tps, lat 2.580 ms stddev 0.877
progress: 80.0 s, 13950.3 tps, lat 2.579 ms stddev 0.875
progress: 85.0 s, 13951.3 tps, lat 2.579 ms stddev 0.875
progress: 90.0 s, 13950.9 tps, lat 2.579 ms stddev 0.879
progress: 95.0 s, 13949.6 tps, lat 2.579 ms stddev 0.876
progress: 100.0 s, 13950.1 tps, lat 2.579 ms stddev 0.876
progress: 105.0 s, 13951.9 tps, lat 2.579 ms stddev 0.874
progress: 110.0 s, 13950.7 tps, lat 2.579 ms stddev 0.878
分词+插入性能测试:
使用gist或gin索引,索引的适用场景和优化手段可以参考我以前写的文章。
(静态数据考虑GIN,动态数据考虑GIST,GIN可以通过增加WORK_MEM优化动态性能)。
digoal=> create table test_ts(ts tsvector);
CREATE TABLE
digoal=> create index idx_test_ts on test_ts using gist(ts);
CREATE INDEX
create or replace function f_testts_en() returns void as $$
declare
begin
insert into test_ts values (to_tsvector('Before you can use PostgreSQL you need to install it, of course. It is possible that PostgreSQL is already installed at your site, either because it was included in your operating system distribution or because the system administrator already installed it. If that is the case, you should obtain information from the operating system documentation or your system administrator about how to access PostgreSQL.
If you are not sure whether PostgreSQL is already available or whether you can use it for your experimentation then you can install it yourself. Doing so is not hard and it can be a good exercise. PostgreSQL can be installed by any unprivileged user; no superuser (root) access is required.
If you are installing PostgreSQL yourself, then refer to Chapter 15 for instructions on installation, and return to this guide when the installation is complete. Be sure to follow closely the section about setting up the appropriate environment variables.
If your site administrator has not set things up in the default way, you might have some more work to do. For example, if the database server machine is a remote machine, you will need to set the PGHOST environment variable to the name of the database server machine. The environment variable PGPORT might also have to be set. The bottom line is this: if you try to start an application program and it complains that it cannot connect to the database, you should consult your site administrator or, if that is you, the documentation to make sure that your environment is properly set up. If you did not understand the preceding paragraph then read the next section.'));
return;
end;
$$ language plpgsql strict;
create or replace function f_testts_zh() returns void as $$
declare
begin
insert into test_ts values ( to_tsvector('testzhcfg','
<PostgreSQL发展历程,社区介绍,资源介绍 2小时>
<PostgreSQL SQL基础 2天>
第一天
PostgreSQL的安装, 架构基础介绍, 如何创建和访问数据库;
SQL语言基础(创建表, 操作表数据, 表关联查询, 聚合查询);
SQL高级应用(视图, 外键, 事务, 窗口函数, 继承);
SQL语义结构介绍(关键字, 常量, 操作符, 特殊字符, 注释, 操作符的优先级);
值表达式介绍(列, 位置参数, 数组下标, field引用, 操作符调用, 函数调用, 聚合表达式, 窗口函数调用, 类型转换, collate表达式, 标量子查询, 数组构造器, 行构造器, 表达式逻辑运算规则);
函数调用用法介绍(位置参数调用, 命名参数调用, 混合调用);
数据定义(默认值, 约束, 系统列, 表结构和定义修改, 权限, schema, 继承, 分区, 外部数据);
第二天
数据查询(from子句, where子句, group by having子句, 窗口子句, select子句, 结果集合操作, 排序, 限制, 位移, values子句, CET用法);
数据类型(数字, 货币, 字符, 字节, 时间, 布尔, 枚举, 几何, 网络, 比特, 文本搜索, UUID, XML, JSON, 数组, 复合类型, 范围类型, 对象类型, 伪类型);
函数与操作符(逻辑, 比较, 数学运算, 字符串, 字节流, 比特位, 样式匹配, 格式化, 时间, 枚举, 几何, 网络, 全文检索, XML, JSON, 序列, 条件, 数组, 范围, 聚合, 窗口, 子查询, 行与数组的比较, SRF函数, 系统信息函数, 系统管理函数, 触发器函数, 事件触发器函数);
类型转换;
索引用法;
全文检索用法;
并行控制;
SQL性能优化;
<PostgreSQL 数据库管理 3天>
第一天
PostgreSQL源代码安装(源码结构介绍, 配置介绍, 自定义FLAG介绍, 内核参数优化, 安装);
数据库服务器配置(OS USER, 初始化集群, 启动数据库集群, 关闭数据库集群, 升级数据库软件, 防止欺骗, 数据封装方法, SSL, SSH隧道)
数据库集群进程结构介绍(postmaster, logger, checkpointer, writer, wal writer, autovacuum launcher, autovacuum worker, stats collector, backend, worker);
认证和连接(客户端认证方法介绍, 认证配置, 常用认证方法使用举例, 常见认证错误排错);
pgAdmin III 使用介绍(安装, 使用, 调试函数, 维护数据);
数据库配置(guc.c, 参数优先级, 参数含义, 开发参数, guc隐含参数, 如何修改参数并使之生效, 如何查看当前参数值, 如何查看参数值的范围);
用户管理(创建角色, 角色权限管理, 角色membership管理, 角色成员权限管理);
数据库逻辑结构(cluster, database, schema, object, field, access privilege);
数据库物理结构(tablespace, datafile, segment, block, controlfile, xlog, archivelog);
第二天
数据库管理(创建表空间, 创建数据库, 创建数据库模板, 数据库配置, 删除数据库, 跨数据库的数据访问);
基于角色的对象权限管理(表, 列, 序列, 数据库, 域, 外部数据, 函数, 语言, 大对象, schema, 表空间, 类型);
默认权限和继承权限(default privilege, inherit privilege);
数据库安全(认证, 网络, 数据存储, 密码, 注入, 欺骗);
数据库监控(状态监控, 趋势监控, 预警, 常用监控工具和插件nagios, zabbix, pg_statsinfo);
多版本并发控制和垃圾回收;
日常维护(垃圾回收, 数据重组, 索引维护, VM维护, 预防XID溢出, 日志维护);
第三天
数据迁移(逻辑备份和还原, 增量数据迁移的方法, 异构数据迁移的方法(如oracle to postgresql));
如何打数据库补丁;
数据库版本升级(小版本升级方法, 大版本升级方法);
建模与Benchmark(如何根据业务形态建立数据库测试模型, 测试工具的使用, 根据测试模型和硬件标准输出benchmark);
数据库日志分析(错误代码介绍, 日志的筛选过滤, 日志的维护);
数据库审计(参数层面的审计开关, 定制化审计(如触发器结合HSTORE), 审计数据的维护手段);
本地化(字符集, collate, 编码转换, 如何防止乱码);
数据库性能分析(OS性能分析报表, SQL性能分析报表);
数据库巡检(如何定制巡检项, 指标, 如何分析巡检报告);'));
return;
end;
$$ language plpgsql strict;
英语分词+插入性能:
英文分词+插入性能:~ 666万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000
progress: 5.0 s, 24825.6 tps, lat 1.443 ms stddev 1.082
progress: 10.0 s, 24609.2 tps, lat 1.462 ms stddev 1.112
progress: 15.0 s, 24766.2 tps, lat 1.452 ms stddev 1.106
progress: 20.0 s, 24578.8 tps, lat 1.463 ms stddev 1.095
progress: 25.0 s, 24761.1 tps, lat 1.453 ms stddev 1.106
progress: 30.0 s, 24558.6 tps, lat 1.465 ms stddev 1.129
progress: 35.0 s, 22420.1 tps, lat 1.604 ms stddev 1.376
progress: 40.0 s, 25480.7 tps, lat 1.412 ms stddev 1.108
progress: 45.0 s, 25528.4 tps, lat 1.409 ms stddev 1.097
progress: 50.0 s, 25239.3 tps, lat 1.425 ms stddev 1.113
progress: 55.0 s, 25236.4 tps, lat 1.425 ms stddev 1.117
progress: 60.0 s, 25392.1 tps, lat 1.417 ms stddev 1.079
progress: 65.0 s, 25521.0 tps, lat 1.409 ms stddev 1.100
progress: 70.0 s, 25154.2 tps, lat 1.430 ms stddev 1.117
progress: 75.0 s, 25260.6 tps, lat 1.424 ms stddev 1.105
progress: 80.0 s, 25364.2 tps, lat 1.418 ms stddev 1.086
progress: 85.0 s, 25383.5 tps, lat 1.417 ms stddev 1.094
progress: 90.0 s, 25392.8 tps, lat 1.417 ms stddev 1.116
progress: 95.0 s, 25651.1 tps, lat 1.402 ms stddev 1.102
progress: 100.0 s, 25663.7 tps, lat 1.401 ms stddev 1.082
progress: 105.0 s, 24486.3 tps, lat 1.469 ms stddev 1.204
progress: 110.0 s, 22847.7 tps, lat 1.574 ms stddev 1.362
progress: 115.0 s, 24687.0 tps, lat 1.457 ms stddev 1.248
progress: 120.0 s, 25433.6 tps, lat 1.414 ms stddev 1.095
progress: 125.0 s, 25667.7 tps, lat 1.401 ms stddev 1.094
progress: 130.0 s, 25581.7 tps, lat 1.406 ms stddev 1.116
progress: 135.0 s, 25411.5 tps, lat 1.415 ms stddev 1.128
progress: 140.0 s, 25507.5 tps, lat 1.410 ms stddev 1.105
progress: 145.0 s, 25503.0 tps, lat 1.410 ms stddev 1.087
progress: 150.0 s, 25533.5 tps, lat 1.409 ms stddev 1.110
progress: 155.0 s, 25708.8 tps, lat 1.399 ms stddev 1.087
progress: 160.0 s, 25574.1 tps, lat 1.406 ms stddev 1.118
progress: 165.0 s, 25580.5 tps, lat 1.406 ms stddev 1.098
progress: 170.0 s, 25428.9 tps, lat 1.414 ms stddev 1.111
progress: 175.0 s, 25522.0 tps, lat 1.409 ms stddev 1.102
progress: 180.0 s, 23641.7 tps, lat 1.521 ms stddev 1.276
progress: 185.0 s, 23065.6 tps, lat 1.559 ms stddev 1.425
progress: 190.0 s, 24623.5 tps, lat 1.461 ms stddev 1.218
progress: 195.0 s, 25486.2 tps, lat 1.411 ms stddev 1.142
progress: 200.0 s, 25507.4 tps, lat 1.410 ms stddev 1.108
中文分词+插入性能:
中文分词+插入性能:~ 290万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000
progress: 5.0 s, 10934.3 tps, lat 3.277 ms stddev 2.349
progress: 10.0 s, 10855.9 tps, lat 3.315 ms stddev 2.359
progress: 15.0 s, 10828.8 tps, lat 3.323 ms stddev 2.426
progress: 20.0 s, 10849.6 tps, lat 3.316 ms stddev 2.361
progress: 25.0 s, 10909.1 tps, lat 3.299 ms stddev 2.320
progress: 30.0 s, 10855.7 tps, lat 3.315 ms stddev 2.388
progress: 35.0 s, 10857.4 tps, lat 3.315 ms stddev 2.367
progress: 40.0 s, 10799.4 tps, lat 3.332 ms stddev 2.503
progress: 45.0 s, 10962.9 tps, lat 3.283 ms stddev 2.390
progress: 50.0 s, 10843.2 tps, lat 3.319 ms stddev 2.466
progress: 55.0 s, 9665.7 tps, lat 3.723 ms stddev 3.115
progress: 60.0 s, 9950.5 tps, lat 3.618 ms stddev 2.918
progress: 65.0 s, 10822.1 tps, lat 3.324 ms stddev 2.470
progress: 70.0 s, 10764.6 tps, lat 3.343 ms stddev 2.517
progress: 75.0 s, 10887.8 tps, lat 3.306 ms stddev 2.481
progress: 80.0 s, 10897.6 tps, lat 3.303 ms stddev 2.474
progress: 85.0 s, 10814.6 tps, lat 3.328 ms stddev 2.458
progress: 90.0 s, 10805.0 tps, lat 3.330 ms stddev 2.552
progress: 95.0 s, 10921.5 tps, lat 3.296 ms stddev 2.422
progress: 100.0 s, 10820.0 tps, lat 3.326 ms stddev 2.518
progress: 105.0 s, 9833.8 tps, lat 3.659 ms stddev 3.212
progress: 110.0 s, 9609.7 tps, lat 3.744 ms stddev 3.084
progress: 115.0 s, 10159.1 tps, lat 3.543 ms stddev 2.917
progress: 120.0 s, 10849.9 tps, lat 3.316 ms stddev 2.520
progress: 125.0 s, 10820.0 tps, lat 3.327 ms stddev 2.505
progress: 130.0 s, 10828.0 tps, lat 3.323 ms stddev 2.549
progress: 135.0 s, 10819.1 tps, lat 3.326 ms stddev 2.536
查询性能:
查询性能视查询条件,数据内容而定。一般可以在毫秒级返回。当然还要关注实际查询条件和索引存储结构,有些查询组合可能使查询时间较长,因为需要扫描的数据链路更长。
digoal=> explain analyze select * from test_ts where ts @@ tsquery '源代码' limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
------
Limit (cost=0.41..0.85 rows=1 width=4426) (actual time=0.156..0.156 rows=1 loops=1)
-> Index Scan using idx_test_ts on test_ts (cost=0.41..1039026.29 rows=2371193 width=4426) (actual time=0.154..0.154 rows=1 loo
ps=1)
Index Cond: (ts @@ '''源代码'''::tsquery)
Planning time: 0.222 ms
Execution time: 0.236 ms
(5 rows)
Time: 0.881 ms
digoal=> explain analyze select * from test_ts where ts @@ tsquery '源代码 & !集群' limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=178.00..180.01 rows=1 width=4426) (actual time=363.066..363.066 rows=0 loops=1)
-> Bitmap Heap Scan on test_ts (cost=178.00..180.01 rows=1 width=4426) (actual time=363.064..363.064 rows=0 loops=1)
Recheck Cond: (ts @@ '''源代码'' & !''集群'''::tsquery)
-> Bitmap Index Scan on idx_1 (cost=0.00..178.00 rows=1 width=0) (actual time=363.061..363.061 rows=0 loops=1)
Index Cond: (ts @@ '''源代码'' & !''集群'''::tsquery)
Planning time: 0.167 ms
Execution time: 363.105 ms
(7 rows)
Time: 363.683 ms
digoal=> explain analyze select * from test_ts where ts @@ tsquery '德哥' limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=181.88..183.68 rows=1 width=4426) (actual time=0.019..0.019 rows=0 loops=1)
-> Bitmap Heap Scan on test_ts (cost=181.88..21478.97 rows=11856 width=4426) (actual time=0.018..0.018 rows=0 loops=1)
Recheck Cond: (ts @@ '''德哥'''::tsquery)
-> Bitmap Index Scan on idx_1 (cost=0.00..178.92 rows=11856 width=0) (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: (ts @@ '''德哥'''::tsquery)
Planning time: 0.180 ms
Execution time: 0.052 ms
(7 rows)
Time: 0.592 ms
存储空间:
digoal=> select count(*) from test_ts;
count
---------
2371193
(1 row)
约6.9亿中文字,分词后约6.4亿中英文词组。
分词占用10GB,索引占用5GB
digoal=> \dt+ test_ts
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+--------+-------+-------------
digoal | test_ts | table | digoal | 10 GB |
(1 row)
digoal=> \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------+-------+--------+---------+---------+-------------
digoal | idx_test_ts | index | digoal | test_ts | 4959 MB |
(1 row)
[参考]
1. http://blog.163.com/digoal@126/b ... 040201422410175698/
2. http://blog.163.com/digoal@126/b ... 040201552102814822/ |
|