SQL Server 查詢統計資訊的相關取樣資訊
SQL Server 查詢統計資訊的相關取樣資訊
有時候我們會遇到,由於統計資訊不準確導致優化器生成了一個錯誤的執行計劃(或者這樣表達:一個較差的執行計劃),從而引起了系統性能問題。那麼如果我們懷疑這個錯誤的執行計劃是由於統計資訊不準確引起的。那麼我們如何判斷統計資訊不準確呢?當然首先得去檢視實際執行計劃中,統計資訊的相關資料是否與實際情況有較大的出入,下面我們拋開這個大命題,僅僅從統計資訊層面去檢視統計資訊的更新時間,統計資訊的取樣行數等情況。
1:首先,我們要查查統計資訊是什麼時候更新的。
2:其次,我們檢視統計資訊的取樣資訊:取樣選取的行數、 自上次更新統計資訊以來前導統計資訊列(構建直方圖的列)的總修改次數。。。
檢視統計資訊的最後更新時間。
方法1:
--檢視統計資訊的更新時間 DECLARE @TableName NVARCHAR(128); SET @TableName = 'dbo.pbCutClothCost'; SELECT @TableName AS Table_Name, name AS Stats_Name , STATS_DATE(object_id, stats_id) AS Last_Stats_Update FROM sys.stats WHERE object_id = OBJECT_ID(@TableName) ORDER BY 2 DESC;
如上所示,我們通過這個指令碼檢視某個表,所有的統計資訊最後一次更新時間。如果你需要檢視某個具體的統計資訊的最後更新時間,那麼在這個SQL的基礎上增加相關查詢條件即可。
方法2:
-- 檢視統計資訊的更新時間
EXEC sp_autostats 'dbo.pbCutClothCost' ;
方法3:
還有一種方法可以通過 sys.dm_db_stats_properties 返回統計資訊的更新時間,不過這個DMF只有SQL Server 2008 R2 SP2這個版本之後的才有。
列名 |
資料型別 |
Description |
object_id |
int |
要返回統計資訊物件屬性的物件(表或索引檢視)的 ID 。 |
stats_id |
int |
統計資訊物件的 ID 。 在表或索引檢視中是唯一的。 有關詳細資訊,請參閱 sys.stats (Transact-SQL) 。 |
last_updated |
datetime2 |
上次更新統計資訊物件的日期和時間。 有關詳細資訊,請參閱此頁中的 備註 部分。 |
rows |
bigint |
上次更新統計資訊時表或索引檢視中的總行數。 如果篩選統計資訊或者統計資訊與篩選索引對應,該行數可能小於表中的行數。 |
rows_sampled |
bigint |
用於統計資訊計算的抽樣總行數。 |
Step |
int |
直方圖中的值範圍數(步長)( Number of steps in the histogram )。 有關詳細資訊,請參閱 DBCC SHOW_STATISTICS (Transact-SQL) 。 |
unfiltered_rows |
bigint |
應用篩選表示式(用於篩選的統計資訊)之前表中的總行數。 如果未篩選統計資訊,則 unfiltered_rows 等於行列中返回的值。 |
modification_counter |
bigint |
自上次更新統計資訊以來前導統計資訊列(構建直方圖的列)的總修改次數。 |
persisted_sample_percent |
float |
持久樣本百分比用於未顯式指定取樣百分比的統計資訊更新。 如果值為零,則不為此統計資訊設定持久樣本百分比。 |
檢視取樣的相關資訊
SELECT sch.name + '.' + so.name AS table_name , so.object_id , ss.name AS stat_name , ds.stats_id , ds.last_updated , ds.rows , ds.rows_sampled , ds.steps , ds.unfiltered_rows , ds.modification_counter FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id CROSS APPLY sys.dm_db_stats_properties(ss.object_id,ss.stats_id) ds WHERE so.name = N'pbCutClothCost' AND LEFT(ss.name, 4) != '_WA_';
如上截圖,索引IX_CutClothCost的統計資訊有更新,是因為在執行上面指令碼,我更新了這個索引的統計資訊。通過rows與實際記錄數對比、 modification_counter資訊,我們從而有個大概的判斷,這些統計 資訊是否過時。是否取樣的比例太小等。
SELECT sch.name + '.' + so.name AS table_name , so.object_id , ss.name AS stat_name , ds.stats_id , ds.last_updated , ds.rows , ds.rows_sampled , ds.rows_sampled/ds.rows *100 AS sample_rate , ds.steps , ds.unfiltered_rows , ds.modification_counter FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id CROSS APPLY sys.dm_db_stats_properties(ss.object_id,ss.stats_id) ds WHERE so.name = N'pbCutClothCost' AND LEFT(ss.name, 4) != '_WA_';
當然也可以使用DBCC SHOW_STATISTICS來檢視統計資訊的詳細資訊。
DBCC SHOW_STATISTICS ( 'dbo.pbCutClothCost' , IX_pbCutClothCost_N1 )
另外一個維度來判別統計資訊過時,是通過時間維度(最後一次統計資訊更新距今的時間)。這個相關上面維度的資料而言,往往不是特別準確,但是也有參考意義。
SELECT sch.name + '.' + so.name AS "Table", ss.name AS"Statistic", CASE WHEN ss.auto_Created = 0 AND ss.user_created = 0 THEN 'Index Statistic' WHEN ss.auto_created = 0 AND ss.user_created = 1 THEN 'USER Created' WHEN ss.auto_created = 1 AND ss.user_created = 0 THEN 'Auto Created' WHEN ss.AUTO_created = 1 AND ss.user_created = 1 THEN 'Not Possible' END AS "Statistic Type", CASE WHEN ss.has_filter = 1 THEN 'Filtered INDEX' WHEN ss.has_filter = 0 THEN 'No Filter' END AS "Filtered", CASE WHEN ss.filter_definition IS NULL THEN '' WHEN ss.filter_definition IS NOT NULL THEN ss.filter_definition END AS "Filter Definition", sp.last_updated AS "Stats Last Updated", sp.rows AS "Rows", sp.rows_sampled AS "Rows Sampled", sp.unfiltered_rows AS "Unfiltered Rows", sp.modification_counter AS "Row Modifications", sp.steps AS "Histogram Steps" FROM sys.stats ss JOIN sys.objects so ON ss.object_id = so.object_id JOIN sys.schemas sch ON so.schema_id = sch.schema_id OUTER APPLY sys.dm_db_stats_properties(so.object_id, ss.stats_id) AS sp WHERE so.TYPE = 'U' AND sp.last_updated < GETDATE() - 7 ORDER BY sp.last_updated DESC;
以前收集過一個查詢過時的統計資訊的指令碼(忘記出自哪裡了),這個是通過 Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 ) 來找出過時的統計資訊。如下所示
DECLARE @Major INT , @Minor INT , @build INT , @revision INT , @i INT , @str NVARCHAR(100) , @str2 NVARCHAR(10); SET @str = CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(100)); SET @str2 = LEFT(@str, CHARINDEX('.', @str)); SET @i = LEN(@str); SET @str = RIGHT(@str, @i - CHARINDEX('.', @str)); SET @Major = CAST(REPLACE(@str2, '.', '') AS INT); SET @str2 = LEFT(@str, CHARINDEX('.', @str)); SET @i = LEN(@str); SET @str = RIGHT(@str, @i - CHARINDEX('.', @str)); SET @Minor = CAST(REPLACE(@str2, '.', '') AS INT); SET @str2 = LEFT(@str, CHARINDEX('.', @str)); SET @i = LEN(@str); SET @str = RIGHT(@str, @i - CHARINDEX('.', @str)); SET @build = CAST(REPLACE(@str2, '.', '') AS INT); SET @revision = CAST(@str AS INT); IF @Major < 10 SET @i = 1; ELSE IF @Major > 10 SET @i = 0; ELSE IF @Minor = 50 AND @build >= 4000 SET @i = 0; ELSE SET @i = 1; IF @i = 1 BEGIN EXEC sp_executesql N';WITH StatTables AS( SELECT so.schema_id AS ''schema_id'', so.name AS ''TableName'', so.object_id AS ''object_id'', CASE indexproperty(so.object_id, dmv.name, ''IsStatistics'') WHEN 0 THEN dmv.rows ELSE (SELECT TOP 1 row_count FROM sys.dm_db_partition_stats ps (NOLOCK) WHERE ps.object_id=so.object_id AND ps.index_id in (1,0)) END AS ''ApproximateRows'', dmv.rowmodctr AS ''RowModCtr'' FROM sys.objects so (NOLOCK) INNER JOIN sysindexes dmv (NOLOCK) ON so.object_id = dmv.id LEFT JOIN sys.indexes si (NOLOCK) ON so.object_id = si.object_id AND so.type in (''U'',''V'') AND si.index_id = dmv.indid WHERE so.is_ms_shipped = 0 AND dmv.indid<>0 AND so.object_id not in (SELECT major_id FROM sys.extended_properties (NOLOCK) WHERE name = N''microsoft_database_tools_support'') ), StatTableGrouped AS ( SELECT ROW_NUMBER() OVER(ORDER BY TableName) AS seq1, ROW_NUMBER() OVER(ORDER BY TableName DESC) AS seq2, TableName, cast(Max(ApproximateRows) AS bigint) AS ApproximateRows, cast(Max(RowModCtr) AS bigint) AS RowModCtr, schema_id,object_id FROM StatTables st GROUP BY schema_id,object_id,TableName HAVING (Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 )) ) SELECT @@SERVERNAME AS InstanceName, seq1 + seq2 - 1 AS NbOccurences, SCHEMA_NAME(stg.schema_id) AS ''SchemaName'', stg.TableName, CASE OBJECTPROPERTY(stg.object_id, ''TableHasClustIndex'') WHEN 1 THEN ''Clustered'' WHEN 0 THEN ''Heap'' ELSE ''Indexed View'' END AS ClusteredHeap, CASE objectproperty(stg.object_id, ''TableHasClustIndex'') WHEN 0 THEN (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) - 1 ELSE (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) END AS IndexCount, (SELECT count(*) FROM sys.columns c (NOLOCK) WHERE c.object_id = stg.object_id ) AS ColumnCount , (SELECT count(*) FROM sys.stats s (NOLOCK) WHERE s.object_id = stg.object_id) AS StatCount , stg.ApproximateRows, stg.RowModCtr, stg.schema_id, stg.object_id FROM StatTableGrouped stg'; END; ELSE BEGIN EXEC sp_executesql N';WITH StatTables AS( SELECT so.schema_id AS ''schema_id'', so.name AS ''TableName'', so.object_id AS ''object_id'' , ISNULL(sp.rows,0) AS ''ApproximateRows'' , ISNULL(sp.modification_counter,0) AS ''RowModCtr'' FROM sys.objects so (NOLOCK) JOIN sys.stats st (NOLOCK) ON so.object_id=st.object_id CROSS APPLY sys.dm_db_stats_properties(so.object_id, st.stats_id) AS sp WHERE so.is_ms_shipped = 0 AND st.stats_id<>0 AND so.object_id not in (SELECT major_id FROM sys.extended_properties (NOLOCK) WHERE name = N''microsoft_database_tools_support'') ), StatTableGrouped AS ( SELECT ROW_NUMBER() OVER(ORDER BY TableName) AS seq1, ROW_NUMBER() OVER(ORDER BY TableName DESC) AS seq2, TableName, cast(Max(ApproximateRows) AS bigint) AS ApproximateRows, cast(Max(RowModCtr) AS bigint) AS RowModCtr, count(*) AS StatCount, schema_id,object_id FROM StatTables st GROUP BY schema_id,object_id,TableName HAVING (Max(ApproximateRows) > 500 AND Max(RowModCtr) > (Max(ApproximateRows)*0.2 + 500 )) ) SELECT @@SERVERNAME AS InstanceName, seq1 + seq2 - 1 AS NbOccurences, SCHEMA_NAME(stg.schema_id) AS ''SchemaName'', stg.TableName, CASE OBJECTPROPERTY(stg.object_id, ''TableHasClustIndex'') WHEN 1 THEN ''Clustered'' WHEN 0 THEN ''Heap'' ELSE ''Indexed View'' END AS ClusteredHeap, CASE objectproperty(stg.object_id, ''TableHasClustIndex'') WHEN 0 THEN (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) - 1 ELSE (SELECT count(*) FROM sys.indexes i (NOLOCK) where i.object_id= stg.object_id) END AS IndexCount, (SELECT count(*) FROM sys.columns c (NOLOCK) WHERE c.object_id = stg.object_id ) AS ColumnCount , stg.StatCount, stg.ApproximateRows, stg.RowModCtr, stg.schema_id, stg.object_id FROM StatTableGrouped stg'; END;
參考資料:
https://www.sqlskills.com/blogs/erin/new-statistics-dmf-in-sql-server-2008r2-sp2/