PostgreSQL pg_stats used to estimate top N freps values and explain rows
標籤
PostgreSQL , 陣列 , pg_stats , 統計資訊 , TOP N , 詞頻
背景
本文要聊的是如何利用統計資訊規避一些複雜的精確統計.
去年寫過一篇關於 PostgreSQL 9.2 新增array元素統計收集的相關文章. 本文也會講到.
ofollow,noindex" target="_blank">《PostgreSQL 9.2 add array elements statistics》
Release 9.2 Release Date: 2012-09-10 * Move the frequently accessed members of the PGPROC shared memory array to a separate array (Pavan Deolasee, Heikki Linnakangas, Robert Haas)
在日常的資料庫統計中, count(*), 排名這類的統計非常多, 同時這類統計的開銷也非常大, 特別是當表的資料量巨大時.
接下來模擬幾個場景。
1、統計某條件下的記錄條數.
digoal=# create table test_1 (id serial4 primary key, info text, appid int, crt_time timestamp); CREATE TABLE digoal=# insert into test_1 (info,appid,crt_time) select md5(random()::text),round(10000*random())::int,clock_timestamp() from generate_series(1,2000000); INSERT 0 2000000
預設的統計目標值為100
digoal=# show default_statistics_target ; -[ RECORD 1 ]-------------+---- default_statistics_target | 100 digoal=# analyze test_1; ANALYZE
為了得到準確的統計資訊, 如果沒有開啟autovacuum, 最好手動收集一次統計資訊.
digoal=# select * from pg_stat_all_tables where relname='test_1'; -[ RECORD 1 ]-----+------------------------------ relid| 91368 schemaname| public relname| test_1 seq_scan| 1 seq_tup_read| 0 idx_scan| 0 idx_tup_fetch| 0 n_tup_ins| 2000000 n_tup_upd| 0 n_tup_del| 0 n_tup_hot_upd| 0 n_live_tup| 2000000 n_dead_tup| 0 last_vacuum| last_autovacuum| last_analyze| 2013-08-11 11:15:04.975523+08 last_autoanalyze| 2013-08-11 11:14:53.663951+08 vacuum_count| 0 autovacuum_count| 0 analyze_count| 1 autoanalyze_count | 1
查詢appid=1的記錄數有多少
digoal=# select count(*) from test_1 where appid=1; count ------- 189 (1 row)
使用explain輸出appid=1的記錄數有多少, 這裡顯示為197. 和使用count(*)得到的存在一點差異.
digoal=# explain select * from test_1 where appid=1; QUERY PLAN ------------------------------------------------------------- Seq Scan on test_1(cost=0.00..45619.00 rows=197 width=49) Filter: (appid = 1) (2 rows)
如果將統計目標調整為10000, 取樣行數會大大增加, 消耗的資源加大.
digoal=# alter table test_1 alter column appid SET STATISTICS 10000; ALTER TABLE digoal=# analyze verbose test_1; INFO:analyzing "public.test_1" INFO:"test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 2000000 rows in sample, 2000000 estimated total rows ANALYZE
但是統計行數更加準確了, 現在為188. 只相差1行.
digoal=# explain select * from test_1 where appid=1; QUERY PLAN ------------------------------------------------------------- Seq Scan on test_1(cost=0.00..45619.00 rows=188 width=49) Filter: (appid = 1) (2 rows)
範圍查詢看看是否準確呢?
digoal=# explain select * from test_1 where appid>1000; QUERY PLAN ----------------------------------------------------------------- Seq Scan on test_1(cost=0.00..45619.00 rows=1800419 width=49) Filter: (appid > 1000) (2 rows)
也非常準確.
digoal=# select count(*) from test_1 where appid>1000; count --------- 1800263 (1 row)
誤差為 : 0.00008665
組合條件的輸出行評估 :
digoal=# explain select * from test_1 where appid>1000 and crt_time>now(); QUERY PLAN ------------------------------------------------------------- Seq Scan on test_1(cost=0.00..55619.00 rows=180 width=49) Filter: ((appid > 1000) AND (crt_time > now())) (2 rows) digoal=# select * from test_1 where appid>1000 and crt_time>now(); id | info | appid | crt_time ----+------+-------+---------- (0 rows)
2、分組排行, 例如要查詢哪個appid的記錄條數最多.
digoal=# alter table test_1 alter column appid SET STATISTICS 100; ALTER TABLE digoal=# analyze verbose test_1; INFO:analyzing "public.test_1" INFO:"test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 30000 rows in sample, 2000000 estimated total rows ANALYZE digoal=# select most_common_vals,most_common_freqs from pg_stats where tablename='test_1' and attname='appid'; -[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- most_common_vals| {3677,6460,1210,1291,2052,3374,3633,4386,4729,4756,5602,320,383,479,906,1003,1018,1102,1243,2594,2625,2762,3092,3243,3376,3511,4842,5595,5967,6135,6412,6821,6824,6966,7828,7984,8118,8310,8378,8952,9012,9840,9922,22,68,359,632,899,933,1034,1227,1369,1554,1615,1706,1744,1824,1995,2034,2056,2215,2412,2770,2988,3488,3722,3780,3834,3937,4079,4124,4224,4424,4723,4811,4870,5287,5490,5596,5609,5665,5751,5881,6236,6562,6656,6694,6827,6865,6980,6996,7008,7021,7097,7274,7285,7289,7330,7367,7449} most_common_freqs | {0.0004,0.000366667,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667}
值太均勻, 所以這個不準確.
digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 100; appid | count -------+------- 9853 |253 6502 |249 1688 |249 464 |249 9974 |248 1540 |248 6622 |247 6669 |247 4643 |247 1046 |246 3051 |246 6359 |246 9103 |246 348 |246 2213 |244 138 |244 8135 |244 3980 |244 5870 |243 9349 |243 6210 |243 4575 |243 3421 |242 207 |242 3224 |242 7056 |242 4561 |242 8770 |241 3011 |241 3731 |241 4951 |241 1066 |240 5501 |240 9354 |240 7430 |240 7621 |240 2058 |240 5460 |240 6578 |239 7431 |239 5473 |239 7305 |239 9563 |239 3275 |239 2968 |239 8825 |239 3426 |238 3850 |238 6835 |238 5928 |238 8567 |238 4083 |238 1137 |238 4862 |238 4238 |238 1058 |238 6745 |237 5854 |237 3196 |237 3165 |237 724 |237 9643 |237 7326 |237 6661 |237 3685 |236 2590 |236 9685 |236 8366 |236 3931 |236 7074 |236 6140 |236 4402 |236 4635 |236 7628 |236 5967 |236 24 |236 987 |236 2472 |236 8724 |236 6404 |236 9504 |235 5816 |235 1261 |235 5551 |235 874 |235 1880 |235 5248 |235 404 |235 5738 |235 583 |235 7799 |235 2362 |235 1789 |235 7707 |235 3091 |234 9245 |234 6107 |234 8657 |234 7460 |234 2252 |234 (100 rows)
當傾斜較大時(符合現實世界理論,二八原則), 就特別準.
digoal=#insert into test_1 (info,appid,crt_time) select 'test',1,now() from generate_series(1,100000); INSERT 0 100000 digoal=# analyze verbose test_1; INFO:analyzing "public.test_1" INFO:"test_1": scanned 21262 of 21262 pages, containing 2101000 live rows and 0 dead rows; 30000 rows in sample, 2101000 estimated total rows ANALYZE digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 5; appid | count -------+-------- 1 | 101189 9853 |253 6502 |249 464 |249 1688 |249 (5 rows)
3、陣列的元素值排行統計
例如有一個表中記錄了客戶允許的應用程式資訊, 應用程式儲存為一個數組, 程式以id形式存到陣列中.
如果要統計全國終端中最火爆的程式排行.
digoal=# create table test_2(id serial primary key, appid int[], crt_time timestamp); CREATE TABLE
假設appid為0-10的程式比較火爆, 模擬100秒插入請求.
vi test.sql insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t; pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 100 digoal transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 100 s number of transactions actually processed: 389701 tps = 3896.755035 (including connections establishing) tps = 3897.686073 (excluding connections establishing) statement latencies in milliseconds: 4.103117insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t;
假設appid為10以上的程式不火爆, 模擬10秒插入請求.
vi test.sql insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t; pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 10 digoal transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 10 s number of transactions actually processed: 37256 tps = 3718.362208 (including connections establishing) tps = 3725.838361 (excluding connections establishing) statement latencies in milliseconds: 4.290334insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t;
總記錄數如下 :
digoal=# select count(*) from test_2; count -------- 472583 (1 row)
收集統計資訊
digoal=# analyze verbose test_2; INFO:analyzing "public.test_2" INFO:"test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows ANALYZE
通過group by和order by獲取實際的排名 :
digoal=# select appid,count(*) from (select unnest(appid) as appid from test_2) t group by appid order by count(*) desc limit 20; appid | count -------+-------- 9 | 872831 6 | 871908 3 | 871867 7 | 871551 8 | 871436 4 | 871391 1 | 871051 5 | 870770 2 | 870692 10 | 435583 0 | 435342 387 |831 69 |824 665 |822 703 |816 651 |811 520 |809 435 |809 783 |806 671 |806 (20 rows)
接下來要用9.2新增的array 統計資訊檢視排名了 :
注意元素的收集個數為default_statistics_target 的10倍, 這個可以參考原始碼 :
+/* +* Set up static pointer for use by subroutines.We wait till here in +* case std_compute_stats somehow recursively invokes us (probably not +* possible, but ...) +*/ +array_extra_data = extra_data; + +/* +* We want statistics_target * 10 elements in the MCELEM array. This +* multiplier is pretty arbitrary, but is meant to reflect the fact that +* the number of individual elements tracked in pg_statistic ought to be +* more than the number of values for a simple scalar column. +*/ +num_mcelem = stats->attr->attstattarget * 10;
所以預設default_statistics_target=100的情況下, 將會收集1000個數組型元素的出現概率.
digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid'; -[ RECORD 1 ]+----- array_length | 1000 array_length | 1003 most_common_elem_freqs中最後3個的值代表min, max, null_freqs. 具體見
《PostgreSQL 9.2 add array elements statistics》
如果把統計資訊修改為50, 那麼統計資訊的元素個數將變成500.
digoal=# alter table test_2 alter column appid set statistics 50; ALTER TABLE digoal=# analyze verbose test_2; INFO:analyzing "public.test_2" INFO:"test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows ANALYZE digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid'; -[ RECORD 1 ]+---- array_length | 500 array_length | 503
下面是統計資訊 :
digoal=# select most_common_elems,most_common_elem_freqs from pg_stats where tablename='test_2' and attname='appid'; most_common_elems| {0,1,2,3,4,5,6,7,8,9,10,12,13,15,16,17,18,22,23,24,27,30,31,32,34,35,36,37,40,41,42,44,49,50,52,55,56,60,63,67,69,73,77,78,81,85,89,90,94,95,97,100,101,103,104,108,109,114,116,117,120,121,123,125,128,132,133,134,136,139,140,143,144,146,149,150,151,152,153,154,155,158,159,162,165,167,170,172,173,174,175,180,182,183,184,186,189,192,196,197,200,202,208,209,210,211,214,217,221,222,226,227,232,233,234,239,241,242,243,245,246,249,250,253,254,259,260,263,264,265,268,269,272,273,275,276,277,281,283,287,288,289,291,294,295,296,297,300,301,304,305,307,311,315,316,317,318,322,324,325,333,334,335,338,340,342,344,346,348,351,352,353,354,355,357,359,362,363,365,366,367,368,370,374,376,377,379,382,384,387,389,391,393,397,398,400,401,402,406,407,410,415,418,419,420,422,423,424,425,428,430,431,432,433,434,436,437,438,441,442,443,445,446,448,449,450,451,455,456,457,458,460,463,464,465,467,470,471,472,474,476,478,481,483,484,486,488,490,491,492,494,495,496,498,499,503,504,505,510,512,513,514,515,518,520,522,523,529,530,532,538,539,542,543,545,553,556,559,560,562,565,567,568,569,571,572,574,575,576,579,580,582,587,588,589,593,596,597,599,601,605,606,607,608,609,610,612,616,617,619,621,622,624,626,628,631,632,637,638,642,644,646,648,655,656,657,660,661,663,664,665,666,667,669,670,672,675,678,679,681,682,683,685,687,689,691,692,693,696,697,698,699,701,702,703,707,708,709,712,714,716,718,719,720,722,723,724,725,727,728,729,734,735,736,738,741,742,743,744,746,747,752,753,759,760,761,762,763,767,769,770,772,773,775,776,777,778,779,782,784,785,787,788,789,790,791,792,795,797,799,800,801,804,805,807,809,811,815,816,818,819,824,825,827,828,829,830,831,832,833,836,838,840,842,843,844,847,848,850,851,854,858,860,861,862,864,865,868,869,870,873,874,875,880,881,884,886,893,894,895,898,900,904,906,907,912,913,914,922,924,928,931,933,939,941,942,944,946,947,954,955,956,957,960,961,962,963,965,967,968,970,972,975,978,987,988,991,994,996,997} most_common_elem_freqs | {0.590833,0.810967,0.809233,0.808433,0.807467,0.806667,0.808067,0.8102,0.806233,0.808367,0.588033,0.00186667,0.0019,0.0018,0.0018,0.00183333,0.00193333,0.00176667,0.002,0.00176667,0.00223333,0.00163333,0.00213333,0.0019,0.00193333,0.0017,0.00166667,0.00166667,0.00163333,0.00186667,0.00176667,0.0022,0.00206667,0.00196667,0.00183333,0.00163333,0.0018,0.002,0.0017,0.0017,0.00166667,0.00183333,0.00173333,0.0017,0.0017,0.00163333,0.00183333,0.00173333,0.00193333,0.0017,0.0017,0.0018,0.00173333,0.00176667,0.00166667,0.00213333,0.0021,0.00173333,0.00193333,0.00186667,0.00166667,0.0019,0.00166667,0.00213333,0.0021,0.00176667,0.00163333,0.00173333,0.00176667,0.00176667,0.00186667,0.00193333,0.00176667,0.00173333,0.0019,0.00193333,0.00176667,0.002,0.00176667,0.00173333,0.00173333,0.00186667,0.00173333,0.00166667,0.0021,0.00163333,0.00193333,0.00176667,0.00166667,0.00176667,0.00163333,0.00193333,0.00166667,0.00166667,0.00193333,0.00176667,0.0017,0.00163333,0.00176667,0.00196667,0.002,0.00196667,0.00213333,0.0017,0.00193333,0.00173333,0.00176667,0.00166667,0.0018,0.00163333,0.0021,0.00216667,0.00203333,0.00193333,0.0019,0.00176667,0.00193333,0.00176667,0.00193333,0.00176667,0.00166667,0.00186667,0.00196667,0.00173333,0.0019,0.002,0.00193333,0.0021,0.00166667,0.0017,0.00166667,0.00196667,0.0019,0.0019,0.00163333,0.0017,0.00163333,0.0023,0.00173333,0.00186667,0.00193333,0.00173333,0.0017,0.00166667,0.00166667,0.00173333,0.0018,0.00166667,0.00183333,0.0018,0.00173333,0.00163333,0.0017,0.00176667,0.00176667,0.00166667,0.0019,0.00163333,0.00166667,0.00163333,0.0017,0.00196667,0.0021,0.00166667,0.0019,0.00173333,0.00166667,0.002,0.00233333,0.0017,0.00163333,0.00183333,0.0018,0.00186667,0.00183333,0.00186667,0.0019,0.0018,0.00166667,0.00196667,0.0018,0.00176667,0.0017,0.00166667,0.00163333,0.00166667,0.00163333,0.0017,0.00166667,0.0018,0.00166667,0.00173333,0.0017,0.0017,0.002,0.00176667,0.00166667,0.00186667,0.00166667,0.00183333,0.00163333,0.00176667,0.00183333,0.00183333,0.00173333,0.0018,0.0018,0.0017,0.00166667,0.0018,0.0018,0.0017,0.00186667,0.00166667,0.00193333,0.0018,0.0017,0.0019,0.00163333,0.00173333,0.00203333,0.0017,0.00166667,0.002,0.00163333,0.00186667,0.002,0.0019,0.00163333,0.0017,0.00186667,0.00163333,0.00173333,0.00166667,0.0018,0.00203333,0.00166667,0.00166667,0.00186667,0.0018,0.0018,0.00163333,0.0019,0.00186667,0.00166667,0.00203333,0.00176667,0.00166667,0.002,0.00163333,0.00186667,0.00183333,0.00186667,0.00176667,0.00166667,0.00196667,0.0021,0.0018,0.00186667,0.00193333,0.00196667,0.0017,0.00176667,0.0018,0.0017,0.0017,0.00183333,0.00173333,0.0021,0.002,0.0019,0.00193333,0.0018,0.00166667,0.00173333,0.00183333,0.0017,0.00176667,0.0018,0.00196667,0.00206667,0.0018,0.00173333,0.00186667,0.00173333,0.00186667,0.00166667,0.0017,0.00183333,0.00173333,0.0018,0.00163333,0.00203333,0.0018,0.00166667,0.00216667,0.00183333,0.0017,0.00166667,0.0018,0.00163333,0.00183333,0.0019,0.00163333,0.00183333,0.0018,0.00206667,0.0017,0.00176667,0.00196667,0.00163333,0.00166667,0.00206667,0.0017,0.0018,0.00196667,0.00166667,0.00193333,0.00173333,0.00166667,0.0017,0.00173333,0.0018,0.00206667,0.00166667,0.00166667,0.00186667,0.00163333,0.0017,0.00163333,0.00186667,0.00173333,0.00166667,0.00183333,0.0018,0.002,0.00163333,0.00176667,0.00206667,0.00173333,0.00183333,0.00173333,0.00163333,0.00173333,0.00183333,0.00163333,0.00163333,0.00203333,0.00173333,0.0017,0.00196667,0.00176667,0.00166667,0.00173333,0.0018,0.0018,0.00173333,0.00183333,0.0019,0.002,0.00176667,0.002,0.00173333,0.00203333,0.0018,0.00163333,0.00173333,0.00166667,0.00166667,0.00213333,0.00173333,0.00163333,0.00183333,0.002,0.00203333,0.00166667,0.0019,0.00173333,0.00166667,0.00193333,0.0017,0.00166667,0.00176667,0.0017,0.0017,0.00166667,0.00166667,0.0018,0.00163333,0.0017,0.00203333,0.00193333,0.00176667,0.00176667,0.0019,0.0018,0.00176667,0.00166667,0.0018,0.0019,0.00196667,0.00173333,0.00173333,0.0018,0.0018,0.00166667,0.00166667,0.00213333,0.0018,0.00206667,0.00173333,0.00163333,0.00166667,0.00166667,0.00176667,0.00166667,0.0017,0.00186667,0.00176667,0.00183333,0.00176667,0.002,0.00163333,0.0017,0.0018,0.00206667,0.00186667,0.0018,0.00173333,0.00196667,0.0017,0.0017,0.0018,0.00173333,0.0017,0.0017,0.00203333,0.0018,0.0018,0.00163333,0.00173333,0.00176667,0.00163333,0.0017,0.00213333,0.0017,0.00186667,0.00206667,0.00163333,0.0017,0.00176667,0.00163333,0.0021,0.0018,0.0021,0.00173333,0.00196667,0.00166667,0.00183333,0.0017,0.0018,0.00196667,0.00166667,0.00166667,0.00206667,0.00166667,0.00163333,0.00186667,0.0018,0.00186667,0.0017,0.00203333,0.00206667,0.00233333,0.0018,0.00216667,0.0017,0.00163333,0.0019,0.00173333,0.0018,0.00196667,0.00186667,0.0018,0.00176667,0.0017,0.0017,0.0018,0.0017,0.00176667,0.00166667,0.00166667,0.00196667,0.00186667,0.0021,0.00163333,0.0019,0.00176667,0.0018,0.00163333,0.00163333,0.810967,0}
注意這裡不是按照appid頻率排序的, 所以還需要處理一下.
digoal=# select * from (select row_number() over(partition by r) as rn,ele from (select unnest(most_common_elems::text::int[]) ele,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t1 join (select row_number() over(partition by r) as rn,freq from (select unnest(most_common_elem_freqs) freq,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t2 on (t1.rn=t2.rn) order by t2.freq desc limit 20; rn| ele | rn|freq -----+-----+-----+------------ 2 |1 |2 |0.810967 8 |7 |8 |0.8102 3 |2 |3 |0.809233 4 |3 |4 |0.808433 10 |9 |10 |0.808367 7 |6 |7 |0.808067 5 |4 |5 |0.807467 6 |5 |6 |0.806667 9 |8 |9 |0.806233 1 |0 |1 |0.590833 11 |10 |11 |0.588033 474 | 939 | 474 | 0.00233333 169 | 348 | 169 | 0.00233333 138 | 281 | 138 |0.0023 21 |27 |21 | 0.00223333 32 |44 |32 |0.0022 476 | 942 | 476 | 0.00216667 296 | 593 | 296 | 0.00216667 112 | 227 | 112 | 0.00216667 56 | 108 |56 | 0.00213333 (20 rows)
前10完全準確, 但是由於前8的記錄數偏差太小, 所以前8的排名順序可能不準確.
例如從統計資訊中取出的排名 :
rn| ele | rn|freq -----+-----+-----+------------ 2 |1 |2 |0.810967 8 |7 |8 |0.8102 3 |2 |3 |0.809233 4 |3 |4 |0.808433 10 |9 |10 |0.808367 7 |6 |7 |0.808067 5 |4 |5 |0.807467 6 |5 |6 |0.806667 9 |8 |9 |0.806233 1 |0 |1 |0.590833 11 |10 |11 |0.588033
實際排名 :
appid | count -------+-------- 9 | 872831 6 | 871908 3 | 871867 7 | 871551 8 | 871436 4 | 871391 1 | 871051 5 | 870770 2 | 870692 10 | 435583 0 | 435342
非常OK , 這對於大資料的統計來說, 無疑是非常重要的參考.
注意
- 資料統計資訊, 佔用空間超過ARRAY_WIDTH_THRESHOLD的陣列不會進入統計範疇.
src/backend/utils/adt/array_typanalyze.c
+/* + * To avoid consuming too much memory, IO and CPU load during analysis, and/or + * too much space in the resulting pg_statistic rows, we ignore arrays that + * are wider than ARRAY_WIDTH_THRESHOLD (after detoasting!).Note that this + * number is considerably more than the similar WIDTH_THRESHOLD limit used + * in analyze.c's standard typanalyze code. + */ +#define ARRAY_WIDTH_THRESHOLD 0x10000 .... +/* Skip too-large values. */ +if (toast_raw_datum_size(value) > ARRAY_WIDTH_THRESHOLD) +continue; +else +analyzed_rows++;