您现在的位置是:网站首页> 编程资料编程资料
postgresql关于like%xxx%的优化操作_PostgreSQL_
2023-05-27
423人已围观
简介 postgresql关于like%xxx%的优化操作_PostgreSQL_
任何一个关系型数据库关于模糊匹配(like)的优化都是一件痛苦的事,相对而言,诸如like 'abc%'之类的还好一点,可以通过创建索引来优化,但对于like 'c%'之类的,真的就没有办法了。
这里介绍一种postgresql关于like 'c%'的优化方法,是基于全文检索的特性来实现的。
测试数据准备(环境centos6.5 + postgresql 9.6.1)。
postgres=# create table ts(id int,name text); CREATE TABLE postgres=# \d ts Table "public.ts" Column | Type | Modifiers --------+---------+----------- id | integer | name | text | postgres=# insert into ts select n,n||'_pjy' from generate_series(1,2000) n; INSERT 0 2000 postgres=# insert into ts select n,n||'_mdh' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_lmm' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_syf' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_wbd' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_hhh' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_sjw' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_jjs' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_ymd' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_biu' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# insert into ts select n,n||'_dfl' from generate_series(1,2000000) n; INSERT 0 2000000 postgres=# select count(*) from ts; count ---------- 20002000 (1 row)
开始测试:
postgres=# explain analyze select * from ts where name like '%pjy%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on ts (cost=0.00..358144.05 rows=2000 width=15) (actual time=0.006..1877.087 rows=2000 loops=1) Filter: (name ~~ '%pjy%'::text) Rows Removed by Filter: 20000000 Planning time: 0.031 ms Execution time: 1877.178 ms (5 rows)
关键一步:
postgres=# create index idx_name on ts using gin (to_tsvector('english',name)); CREATE INDEX postgres=# vacuum analyze ts; VACUUM postgres=# \d ts Table "public.ts" Column | Type | Modifiers --------+---------+----------- id | integer | name | text | Indexes: "idx_name" gin (to_tsvector('english'::regconfig, name)) postgres=# explain analyze select * from ts where to_tsvector('english',name) @@ to_tsquery('pjy'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on ts (cost=39.75..8187.70 rows=2000 width=15) (actual time=0.016..0.016 rows=0 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, name) @@ to_tsquery('pjy'::text)) -> Bitmap Index Scan on idx_name (cost=0.00..39.25 rows=2000 width=0) (actual time=0.016..0.016 rows=0 loops=1) Index Cond: (to_tsvector('english'::regconfig, name) @@ to_tsquery('pjy'::text)) Planning time: 0.094 ms Execution time: 0.036 ms (6 rows)大家可以看到,执行时间从2秒下降到了0.04毫秒!!!
关于pg的全文检索,tsvector和tsquery,这里就不详细介绍了,大家可以自己查阅手册。
补充:postgresql子查询优化(提升子查询)
问题背景
在开发项目过程中,客户要求使用gbase8s数据库(基于informix),简单的分页页面响应很慢。排查发现分页sql是先查询出数据在外面套一层后再取多少条,如果去掉嵌套的一层,直接获取则很快。日常使用中postgresql并没有这样的操作也很快,这是为什么呢?
说明
在数据库实现早期,查询优化器对子查询一般采用嵌套执行的方式,即父查询中的每一行,都要执行一次子查询,这样子查询会执行很多次,效率非常低。
本篇主要讲postgresql针对子查询的优化。
项目中使用子查询的地方非常多,如何写出高效的sql,掌握子查询的优化是非常有必要的。
执行计划对比(gbase8s vs postgresql):
gbase8s慢sql执行计划:
--gbase8s执行计划
SET EXPLAIN ON ; SET EXPLAIN FILE TO '/home/gbasedbt/sqexplain.out' ; select skip 0 first 15 * from ( select * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc ) Estimated Cost: 3207 Estimated # of Rows Returned: 6172 1) gbasedbt.t_szgl_jdry: INDEX PATH (1) Index Name: gbasedbt.i_t_szgl_jdry_updatetime Index Keys: updatetime (Reverse) (Serial, fragments: ALL) QUERY: (OPTIMIZATION TIMESTAMP: 12-21-2017 03:20:43) ------ select skip 0 first 15 * from ( select * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc ) Estimated Cost: 232 Estimated # of Rows Returned: 6172 1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN Query statistics: ----------------- The final cost of the plan is reduced because of the FIRST n specification in the query. Table map : ---------------------------- Internal name Table name ---------------------------- t1 t_szgl_jdry t2 (Temp Table For Collection Subquery) type table rows_prod est_rows rows_scan time est_cost ------------------------------------------------------------------- scan t1 6173 6172 6173 00:00.05 3207 --查询执行用 222 ms,15行受影响
gbase8s修改后执行计划
select skip 0 first 15 * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc Estimated Cost: 7 Estimated # of Rows Returned: 6172 1) gbasedbt.t_szgl_jdry: INDEX PATH (1) Index Name: gbasedbt.i_t_szgl_jdry_updatetime Index Keys: updatetime (Reverse) (Serial, fragments: ALL) Query statistics: ----------------- The final cost of the plan is reduced because of the FIRST n specification in the query. Table map : ---------------------------- Internal name Table name ---------------------------- t1 t_szgl_jdry type table rows_prod est_rows rows_scan time est_cost ------------------------------------------------------------------- scan t1 15 6172 15 00:00.00 8 QUERY: (OPTIMIZATION TIMESTAMP: 12-21-2017 03:23:25) ------ select 1 from sysusers Estimated Cost: 2 Estimated # of Rows Returned: 1 1) gbasedbt.sysusers: SEQUENTIAL SCAN ... --查询执行用 18 ms,15行受影响
第一个执行计划中 (1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN)可以看出是将子查询的结果查询出来后,在这个基础上获取了15条记录
对比postgresql执行计划
--分页执行计划-不嵌套
db_jcxxzypt=# explain select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq limit 15 offset 0; QUERY PLAN ------------------------------------------------------------------------- Limit (cost=0.44..28.17 rows=15 width=879) -> Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj (cost=0.44..32374439.85 rows=17507700 width=879) (2 rows) --子查询执行计划-嵌套一层 db_jcxxzypt=# explain db_jcxxzypt-# select * from ( db_jcxxzypt(# select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq db_jcxxzypt(# )tab1 limit 15 offset 0; QUERY PLAN ------------------------------------------------------------------------- Limit (cost=0.44..28.32 rows=15 width=879) -> Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj (cost=0.44..32374439.85 rows=17507700 width=879) (2 rows) --子查询执行计划-嵌套两层 db_jcxxzypt=# explain db_jcxxzypt-# select * from ( db_jcxxzypt(# select * from ( db_jcxxzypt(# select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq db_jcxxzypt(# )tab1 )tab2 limit 15 offset 0; QUERY PLAN ------------------------------------------------------------------------- Limit (cost=0.44..28.32 rows=15 width=879) -> Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj (cost=0.44..32374439.85 rows=17507700 width=879) (2 rows)
postgresql的子查询即使嵌套多层,执行计划还是和未嵌套一样。原因就是postgresql在重写sql的阶段上拉子查询(提升子查询),把子查询合并到父查询中。
postgresql子查询优化
子查询可分为三类:一、([not]in/all/any/some),二、([not]exists),三、其他子查询(sjp子查询 选择、投影、连接)
子查询可以出现在目标列、form子句、where子句、join/on子句、group by子句、having子句、orderby子句等位置。
db_jcxxzypt=# explain select * from t_jcxxzy_tjaj aj ,(select * from t_jcxxzy_ajdsr) dsr where dsr.c_ajbm = '1301020400000120090101'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.56..1252119.58 rows=17507700 width=1098) -> Index Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr (cost=0.56..8.57 rows=1 width=219) Index Cond: (c_ajbm = '1301020400000120090101'::bpchar) -> Seq Scan on t_jcxxzy_tjaj aj (cost=0.00..1077034.00 rows=17507700 width=879) (4 rows) Time: 1.101 ms
postgresql子链接([not]in,[not]exists,all,some,any)
子查询和子链接区别:子查询是不在表达式中的子句,子链接在表达式中的子句
--in子链接
(1).
db_jcxxzypt=# explain select * from t_jcxxzy_tjaj aj where aj.c_ajbm in (select dsr.c_ajbm from t_jcxxzy_ajdsr dsr); 转化为: select * from t_jcxxzy_tjaj aj join t_jcxxzy_ajdsr dsr aj.c_ajbm = dsr.c_ajbm; QUERY PLAN ------------------------------------------------------------------------- Hash Semi Join (cost=362618.61..5537768.07 rows=7957409 width=879) Hash Cond: (t_jcxxzy_tjaj.c_ajbm = t_jcxxzy_ajdsr.c_ajbm) -> Seq Scan on t_jcxxzy_tjaj (cost=0.00..1077034.00 rows=17507700 width=879) -> Hash (cost=237458.59..237458.59 rows=6817202 width=23) -> Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr (cost=0.56..237458.59 rows=6817202 wi dth=23) (5 rows) --in等价于=any hash semi join表示执行的是两张表的hash半连接, 原始sql中没有(t_jcxxzy_tjaj.c_ajbm = t_jcxxzy_ajdsr.c_ajbm),表明此in子查询被优化,优化后采用hash semi join算法。 (2).相关子查询 --当加入条件where aj.d_slrq='2001-06-14'后不能提升子链接,如果把where aj.d_slrq ='2001-06-14'放到父查询 是支持子链接优化的 db_jcxxzypt=# explain db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where c_ajbm in (select c_ajbm from t_jcxxzy_ajdsr dsr where aj.d_slrq='2001-06-14') ; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on t_jcxxzy_tjaj aj (cost=0.00..2227874766580.75 rows=8753850 width=879) Filter: (SubPlan 1) SubPlan 1 -> Result (cost=0.56..237458.59 rows=6817202 width=23) One-Time Filter: (aj.d_slrq = '2001-06-14'::date) -> Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr dsr (cost=0.56..237458.59 rows=6817 202 width=23) (6 rows (3). -- not in不能提升子链接 db_jcxxzypt=# explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm not in (select c_ajbm from db_jcxx.t_jcxxzy_ajdsr); QUERY PLAN ------------------------------------------------------------------------- Seq Scan on t_jcxxzy_tjaj (cost=0.56..2875921362927.06 rows=8753850 width=879) Filter: (NOT (SubPlan 1)
相关内容
- PostgreSQL模糊匹配走索引的操作_PostgreSQL_
- PostgreSQL忘记postgres账号密码的解决方法_PostgreSQL_
- postgresql模糊匹配大杀器(推荐)_PostgreSQL_
- PostgreSQL 实现子查询返回多行的案例_PostgreSQL_
- Visual Studio Code(VS Code)查询PostgreSQL拓展安装教程图解_PostgreSQL_
- PostgreSQL利用递归优化求稀疏列唯一值的方法_PostgreSQL_
- 浅析postgresql 数据库 TimescaleDB 修改分区时间范围_PostgreSQL_
- Postgresql分布式插件plproxy的使用详解_PostgreSQL_
- 在PostgreSQL中设置表中某列值自增或循环方式_PostgreSQL_
- 基于PostgreSQL密码重置操作_PostgreSQL_
点击排行
本栏推荐
