SQL Server Statistics Updates: A Deep Dive

SQL Server Statistics Updates: A Deep Dive

We has given a detailed view of the behavior of SQL Server toward updating statistics. It is important to understand these mechanisms so as to be able to support query performance and provide optimizer with the right information to make correct decisions. We will journey into various update modes, thresholds and best practice of statistics management.

Understanding SQL Server Statistics

SQL Server statistics are structures which contain distributional information regarding the values present in one or more columns of a table or indexed view. These statistics enable the query optimizer to determine an estimate of rows to be yielded by the query predicate. The estimates play a key part in ensuring that the optimizer uses the most efficient query execution plan. When statistics are inaccurate or out-of-date, there can be poor plan decisions, and the query will perform slowly.

When are Statistics Updated?

SQL Server automatically updates statistics, when:

  • Auto Update Statistics: Auto indication of statistics comes with the particular variation of this database that is turned On by default, and when a large number of rows have been changed in a table then the automatic updating of statistics comes into effect. The table size will determine the triggering of an update.
  • Auto Update Statistics Asynchronously: This parameter is enforced to default as well, and it lets update of statistics also happen in the background; thus, they do not affect the performance of a query as significantly. The existing statistics will be used by the query optimizer until asynchronous update is finished.
  • Manual: Manual, statistics may be manually updated by using the UPDATE STATISTICS command. This can be used when you are sure that the distribution of data has changed substantially or you would like to check that the statistics are not outdated before you execute a critical performance query.

Auto Update Statistics Thresholds

The automatic updating of statistics value is dependent on the size of the tables.

SQLServer 2016 and further (compatibility level 130 or more):
  • In tables where a row count is less than 500, 500 changes will trigger the updating of statistics.
  • To tables that have over 500 rows, the thresholds are calculated by utilizing the following formula: `SQRT(1000 * number of rows)`.
  • This formula offers a dynamically relative threshold which is proportional to the size of a table.
SQLServer 2008 to SQL Server 2014 (and lower compatibility levels 130 or less):
  • The statistics are refreshed once 20 per cent of the rows of the table changed, and at least 500 rows. This is in the form of: 500 + [(20 percent of number of rows in the table)].
  • This older threshold may not work so well on an extremely large table, where the requirement of a 20 percent modification may entail many rows.
  • To be able to know the last time statistics were updated of a particular table or index, one can use the STATS_DATE function that identifies when stats were last up to date.

Asynchronous vs. Synchronous Updates

As it has been indicated above, AUTO_UPDATE_STATISTICS_ASYNC determines whether updates of statistics are synchronous or asynchronous.

  •  Asynchronous Updates (ON): Query optimizer utilizes existing statistics even though they are outdated and statistics update will be run in the background. This has little effect on query performance but can lead to poor plans when the statistics are very out- dated.
  • Synchronous Updates (OFF): The statistics update blocks the query optimizer until it is complete and thus when it updates the query is compiled. This guarantees the optimizer with the latest information but may escalate query compilation time particularly in large tables.
  • The asynchronous updates can be preferred in most cases in order to reduce the effect on the query performance. Synchronous updates can be used, but in those circumstances when query performance is relatively critical and the distribution of data changes too fast.

    Manual Statistics Updates

    The UPDATE STATISTICS command allows a fine control over statistics updates. The simple syntax is as follows;

  1. UPDATE STATISTICS table_name [index_name | statistics_name]  
  2.  [WITH  
  3.  [FULLSCAN | SAMPLE number PERCENT | SAMPLE number ROWS | RESAMPLE]  
  4.  [, NO_RECOMPUTE]  
  5.  [, INCREMENTAL = { ON | OFF }]  
  6.  ]  

Key options include:


  • FULLSCAN: Composite operation: computed using all the rows of the table or indexed view. It gives the best statistics but is time intensive when dealing with a large table.
  • SAMPLE number PERCENT: Performs statistics through a sampling of the given number of rows that is a part of the table or indexed view. It is quicker than FULLSCAN and possibly not as accurate.
  • SAMPLE number ROWS: Computes statistics by sampling a given number of rows of a table or an indexed view.
  • RESAMPLE: Works off of the current sampling rate to recalculate the statistics. This would be helpful in case of changed data distribution and yet the volume of data remains constant.
  • NO_RECOMPUTE: Removes the automatic updating of statistics on the specified statistics. This must be applied with caution because it may result in stagnant statistics and weaker performance of the queries.
  • INCREMENTAL = { ON | OFF }: Is used on partitioned tables. ON Each partition, statistics will be generated and incrementally refreshed. When OFF, statistics are generated and refreshed on the whole table.

Best Practices for Managing Statistics


  • Maintain AUTO_UPDATE_STATISTICS and AUTO_UPDATE_STATISTICS_ASYNC turned on: Those options give an optimal compromise between the performance of queries and accuracy of the statistics.
  • Check the updated statistics activity: Track the statistics update activities by using Extended Events or SQL Server Profiler. This will assist you in uncovering tables in which statistics are frequently changed, which can prompt the requirement to adjust indices or alter the data in a technique to ensure that they are put away.
  • Use of manual updates on volatile tables: make sure that there is manual updates on volatile tables.
  • Apply the minimally required amount of FULLSCAN: FULLSCAN offers the most reliable statistics but it may also be lengthy. Apply it only in the cases when it is necessary, i.e. when you think that the sampled statistics are displaced or after a big amount of data has been loaded.
  • Rebuilding indexes or changing their structure can alter information to a large extent:
  • distribution. Maintain the index and then update the statistics after the maintenance so that the optimizer can have correct data.
  • Beaware of filtered statistics: Filtered statistics may be useful in optimization of a query using some predicates. They, however, may also contribute to overheads in maintenance of statistics. Be sparing of them.
  • Revise update strategies on a regular basis: Statistics update strategies should be reviewed frequently and can also be changed as the data volume and the patterns of queries alter, and it may be important to change the statistics update strategies to ensure that the query performance is optimal.


Troubleshooting Statistics Issues



  • Slow query performance: Outdated or inaccurate statistics are a common cause of slow query performance. Check the STATS_DATE for the relevant tables and indexes and consider updating statistics manually.
  • Unexpected query plans: If the query optimizer is choosing a suboptimal query plan, it may be due to inaccurate statistics. Examine the query plan and look for cardinality estimation errors.
  • High CPU utilization: Frequent statistics updates can consume significant CPU resources. Monitor CPU utilization and consider adjusting the AUTO_UPDATE_STATISTICS settings or using sampled statistics updates.
  • Slow query performance: Poor query performance is usually as a result of old or incorrect statistics. Inquire about the STATS_DATE of the appropriate tables and indexes and think about update of the statistics manually.
  • Unanticipated query plans: In case the query optimizer is selecting a suboptimal one, it can be because of erroneous statistics. Audit the query plan and identify any inferential mistakes of cardinality estimation.
  • CPU usage: Dynamic statistics may use a lot of CPU resources. There is a need to monitor CPU utilization together with the possibility of adjusting the AUTO_UPDATE_STATISTICS settings or using the sampled statistics update.

Conclusion

Properly managing SQL Server statistics is essential for maintaining query performance. By understanding the different update modes, thresholds, and best practices, you can ensure that the query optimizer has accurate information to make informed decisions. Regularly monitor statistics update activity and adjust your strategies as needed to optimize query performance and minimize resource consumption.






0 Comments

Post Comment

Your email address will not be published. Required fields are marked *