GeneralSQL

https://blog.sqlauthority.com/2020/06/02/sql-server-rows-sampled-sys-dm_db_stats_properties/

Yesterday, I wrote a blog post SQL SERVER – Statistics Modification Counter – sys.dm_db_stats_properties where I discussed how many rows have been modified based on sys.dm_db_stats_properties. I received lots of follow up questions and comments on the blog. Here is the question asked in my yesterday’s Comprehensive Database Performance Health Check about rows sampled from sys.dm_db_stats_properties.

Rows Sampled – sys.dm_db_stats_properties

In the previous blog, we had discussed how to figure out how many rows are modified since the statistics last updated. Now today we are going to extend the same script and in addition to the same details, we will list additional details about rows in the original table, rows used in the statistical sampling, and % of the rows from the original table sampled.

SELECT      OBJECT_SCHEMA_NAME(obj.object_id) SchemaName, obj.name TableName, 

            stat.name, modification_counter, 

            [rows], rows_sampled, rows_sampled* 100 / [rows] AS [% Rows Sampled],

            last_updated

FROM        sys.objects AS obj

INNER JOIN  sys.stats AS stat ON stat.object_id = obj.object_id

CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp

WHERE       obj.is_ms_shipped = 0

ORDER BY    modification_counter DESC

I have the above query, it will give you result very similar to the following.

SQL SERVER - Rows Sampled - sys.dm_db_stats_properties rows-sampled

To make sure that the query only returns user databases, I have also included additional where condition about is_ms_shipped = 0.

How to use this script?

While there are many different reasons for the poor performance of your T-SQL query. This script one of the scripts which give us some indication of poor performance.

If you see consistently poor performance for queries that are related to one of the tables, there is a good chance that the SQL Server Query Optimizer is not producing an efficient execution plan for the query. One of the reasons for the same might be out of date statistics or sampled statistics.

You can decide a statistics is out of date based on a very old date on the last_updated column and very high modification_counter based on the table row numbers. When your table is huge and has a very high number of rows, the SQL Server engine, only takes a sample of the data and builds the statistics. You can figure that out from the columns rows_sampled and low value in % Rows Sampled.

In both of the above scenarios, it is a good idea to update your statistics with the full scan. You can read more about how to update the table with full scan here: SQL SERVER – Update Table Statistics in Parallel with FULLSCAN

After updating your statistics, if you still see the query poorly performing, in that case, you might have very different reasons for the poor performance. To fix the query performance, I would suggest you start with SQL SERVER – Query Specific Wait Statistics and Performance Tuning.

Let me know if you find this script helpful.

Reference: Pinal Dave (https://blog.sqlauthority.com)

https://www.mssqltips.com/sqlservertip/5908/what-is-the-best-value-for-fill-factor-in-sql-server/

Problem

I understand what the SQL Server fill factor is, but how do I choose an appropriate fill factor for any given index?

Solution

This prior tip will help understand what an index fill factor is and how it affects the size of a SQL Server index. In short, the bigger the fill factor the smaller the disk footprint of the index and vice versa. At the end of that tip it advises that indexes and their fill factors should be well planned. This tip will focus on how to plan for an appropriate fill factor by analyzing the impact of different fill factor numbers and by studying a few facts about the data in the index columns.

SQL Server Fill Factor Example

To better understand how different fill factors affect indexes we will make 4 identical tables. Each table will have 100,000 identical rows and be clustered based on a GUID column. That cluster will be made with different fill factors of 70, 80, 90, and 99.

This script will create the tables and put the rows in them.

DROP TABLE IF EXISTS dbo.RandomValue70;
CREATE TABLE dbo.RandomValue70(num INT IDENTITY, i CHAR(36), j CHAR(36));
INSERT INTO dbo.RandomValue70 (i, j) SELECT TOP 100000 NEWID(), NEWID() FROM sys.all_columns CROSS JOIN sys.columns;
CREATE UNIQUE CLUSTERED INDEX PK_RandomValue70 ON dbo.RandomValue70(i) WITH (FILLFACTOR=70, PAD_INDEX=ON)
GO
 
DROP TABLE IF EXISTS dbo.RandomValue80;
CREATE TABLE dbo.RandomValue80(num INT IDENTITY, i CHAR(36), j CHAR(36));
INSERT INTO dbo.RandomValue80 (i, j) SELECT i, j FROM dbo.RandomValue70;
CREATE UNIQUE CLUSTERED INDEX PK_RandomValue80 ON dbo.RandomValue80(i) WITH (FILLFACTOR=80, PAD_INDEX=ON)
GO
 
DROP TABLE IF EXISTS dbo.RandomValue90;
CREATE TABLE dbo.RandomValue90(num INT IDENTITY, i CHAR(36), j CHAR(36));
INSERT INTO dbo.RandomValue90 (i, j) SELECT i, j FROM dbo.RandomValue70;
CREATE UNIQUE CLUSTERED INDEX PK_RandomValue90 ON dbo.RandomValue90(i) WITH (FILLFACTOR=90, PAD_INDEX=ON)
GO 
 
DROP TABLE IF EXISTS dbo.RandomValue99;
CREATE TABLE dbo.RandomValue99(num INT IDENTITY, i CHAR(36), j CHAR(36));
INSERT INTO dbo.RandomValue99 (i, j) SELECT i, j FROM dbo.RandomValue70;
CREATE UNIQUE CLUSTERED INDEX PK_RandomValue99 ON dbo.RandomValue99(i) WITH (FILLFACTOR=99, PAD_INDEX=ON)
GO

This query tells us about the 4 tables that were just created and their indexes.

SELECT
  tbl.name TableName
, idx.name IndexName, idx.fill_factor
, CAST(Fragmentation.avg_page_space_used_in_percent AS DECIMAL(4,1)) ActualFillFactor
, CAST(Fragmentation.avg_fragmentation_in_percent AS DECIMAL(4,1)) CurrentFragmentation
, Fragmentation.fragment_count
, CAST(Fragmentation.avg_fragment_size_in_pages AS DECIMAL(8,1)) AvgFragmentSize 
FROM
  sys.tables tbl
    INNER JOIN
  sys.indexes idx ON tbl.object_id = idx.object_id
    CROSS APPLY
  sys.dm_db_index_physical_stats(DB_ID(), tbl.object_id, idx.index_id, 0, 'SAMPLED') Fragmentation
WHERE 
  tbl.name LIKE 'RandomValue[0-9]%';

During the writing of this post, the results showed that the actual fill factor came in very close to the requested fill factor and that the index fragmentation is minimal. Note that the fill factor isn’t likely to come in exactly as the rows won’t necessarily line up exactly with the requested percentage and a single row cannot be split across multiple pages.

SQL Server Fill Factor for sample tables

How SQL Server fill factor affects queries

To study how these different fill factors affect queries, these 4 identical statements will be executed. The only difference is the fill factor on the index that was set above.

SET STATISTICS IO ON;
SET NOCOUNT ON;
SELECT COUNT(*) FROM dbo.RandomValue70 WHERE i BETWEEN '001' AND '199';
SELECT COUNT(*) FROM dbo.RandomValue80 WHERE i BETWEEN '001' AND '199';
SELECT COUNT(*) FROM dbo.RandomValue90 WHERE i BETWEEN '001' AND '199';
SELECT COUNT(*) FROM dbo.RandomValue99 WHERE i BETWEEN '001' AND '199';

Upon executing the statements above, the result is 4 identical row counts and the IO stats, visible from the messages output tab.

Table 'RandomValue70'. Scan count 1, logical reads 151

Table 'RandomValue80'. Scan count 1, logical reads 131

Table 'RandomValue90'. Scan count 1, logical reads 118

Table 'RandomValue99'. Scan count 1, logical reads 107

From reviewing this output, we learn that the higher the fill factor value is on the index the lower the reads become. This makes sense as there are more rows stored on each page therefore requiring fewer pages to be read. Any time page reads can be reduced is good for query performance. All of which tells us that for read performance we need the highest fill factor possible.

Why not, then, just put every index at 99 or 100 and reduce the reads as much as possible? The answer to this question is fragmentation and how quickly indexes can become fragmented.

How SQL Server fill factors affect fragmentation

Continuing with this same script it is now time to add some new rows to the data set. This script will add 5,000 identical rows to each table. They will be randomly distributed throughout the cluster. After adding the new rows, rerun the index information script from above.

INSERT INTO dbo.RandomValue70 (i, j) SELECT TOP 5000 NEWID(), NEWID() FROM dbo.RandomValue70;
INSERT INTO dbo.RandomValue80 (i, j) SELECT i, j FROM dbo.RandomValue70 WHERE num > 100000;
INSERT INTO dbo.RandomValue90 (i, j) SELECT i, j FROM dbo.RandomValue70 WHERE num > 100000;
INSERT INTO dbo.RandomValue99 (i, j) SELECT i, j FROM dbo.RandomValue70 WHERE num > 100000;

Upon adding just 5% more rows randomly throughout the table the fragmentation difference becomes immediately clear. A higher fill factor made it less likely the storage engine could find space on an existing page to put the new rows and instead had to split pages to make room. This led to a lower actual fill factor and a higher fragmentation rate.

How SQL Server Fill Factor impacts Fragmentation

Rerunning the SELECT statements from above yields very different results with the heavily fragmented index now having the highest number of reads.

Table 'RandomValue70'. Scan count 1, logical reads 151

Table 'RandomValue80'. Scan count 1, logical reads 131

Table 'RandomValue90'. Scan count 1, logical reads 130
Table 'RandomValue99'. Scan count 1, logical reads 210

Rows don’t just get inserted, sometimes existing rows change. This query will change about 7% of the rows to a new random value.

DECLARE @Mod INT, @Remainder INT;
SELECT @Mod = 15, @Remainder = 2;
UPDATE dbo.RandomValue70 SET i = NEWID() WHERE num % @Mod = @Remainder;
UPDATE R80 SET i = R70.i FROM dbo.RandomValue80 R80 INNER JOIN dbo.RandomValue70 R70 ON R80.num = R70.num WHERE R70.num % @Mod = @Remainder;
UPDATE R90 SET i = R70.i FROM dbo.RandomValue90 R90 INNER JOIN dbo.RandomValue70 R70 ON R90.num = R70.num WHERE R70.num % @Mod = @Remainder;
UPDATE R99 SET i = R70.i FROM dbo.RandomValue99 R99 INNER JOIN dbo.RandomValue70 R70 ON R99.num = R70.num WHERE R70.num % @Mod = @Remainder;

Rerunning the index information query again shows that the fragmentation has gotten even worse.

A lower fill factor does not mean that there will never be fragmentation. Eventually, every index whose underlying data is changing will fragment. The UPDATE code snippet was run several more times with smaller @mod values and eventually even the lowest fill factor value indexes were heavily fragmented.

How to choose the best SQL Server fill factor value

An index will provide the most value with the highest possible fill factor without getting too much fragmentation on the index. Without a specific reason to choose an alternate value, consider 90 as a starting point and adjust from there. It's rare to go lower than that and it's especially rare to go below 80.

Consider how often index maintenance can be run on the indexes. If there are very limited windows to perform index maintenance to reduce fragmentation then smaller values should be considered to keep fragmentation down between maintenance windows.

The following is a list of occasions where a higher number can be considered.

Does the underlying table get read significantly and updated minimally? This might be something like a master table or lookup table. These tables rarely change and are therefore not susceptible to significant fragmentation. They are read often and will benefit greatly from the fuller pages.
Is the underlying table part of a data warehouse or other reporting environment that is read constantly and only updated periodically? The fragmentation problem can be mitigated by performing index maintenance after the ETL runs and all the queries will benefit from the higher fill factor.
Is the index sorted on a column whose values constantly increase and whose values never change after the row is created? This would be something that defaults to current date and time, an identity column, or a number based on a sequence. There isn't a reason to leave significant space in the pages as no new rows will ever arrive in the database that would use that space. All the new rows will instead go to the end of the index as new pages.

Hopefully this gives you some better insight for the SQL Server fill factor setting and things to consider when setting the values.

SELECT *
FROM sys.configurations
WHERE name ='fill factor (%)'

SELECT OBJECT_NAME(OBJECT_ID) Name, type_desc, fill_factor
FROM sys.indexes

SELECT cntr_value
FROM MASTER.dbo.sysperfinfo
WHERE counter_name ='Page Splits/sec' AND
OBJECT_NAME LIKE'%Access methods%'

GeneralSQL

Thursday, May 6, 2021

sQL Server tutorial basic questions

Tuesday, June 2, 2020

SQL SERVER – Rows Sampled – sys.dm_db_stats_properties to update statistics