This section includes an experiment evaluating the impact of two query methods on four performance metrics. Additionally, it highlights techniques used to improve data retrieval efficiency, along with a side-by-side comparison of the original and optimized code.
Optimization Experiment
The following experiment was conducted to assess the impact of two query optimization methods on various performance metrics. The original code is shown below:
Query Optimization Methods
Query Performance Metrics
Results
Discussion
It's essential to recognize that while the values of performance metrics can vary with each query execution, the overall patterns they exhibit tend to remain consistent.
Conclusion and Recommendation
Overall, reordering JOIN tables is seen to improve CPU and elapsed time without significantly impacting subtree cost and logical reads. Users must always assess JOIN order based on table size with smaller tables generally joined first.
Prioritize creating indexes on frequently joined tables to enhance data retrieval efficiency and lower CPU and elapsed times. However, use the method cautiously due to optimization trade-offs and consider that having multiple indexes does not necessarily yield better results. Another argument against indexing can be found in : When not to use indexes. When data is written to the database, the original table (the clustered index) is updated first and then all of the indexes off of that table are updated. Every time a write is made to the database, the indexes are unusable until they have updated. If the database is constantly receiving writes then the indexes will never be usable. This is why indexes are typically applied to databases in data warehouses that get new data updated on a scheduled basis(off-peak hours) and not production databases which might be receiving new writes all the time.
If only one performance metric were to be prioritized, it should be Elapsed Time as it directly reflects user experience and system responsiveness. In this case, the index combination of Orders and USZips had the lowest value after applying JOIN reordering likely because these tables have the most number of rows at 10,194 and 33,787, respectively.
Optimization Techniques
Adding index and reordering joined tables
Description
Hardcoding to remove subqueries
In this project, we are sure that the minimum year is 2021 and all years have January (1) as a starting month. Hardcoding these values can save the code from running subqueries. Performance improvements appear in CPU time (578 → 297 ms), elapsed time (751 → 553 ms) and estimated subtree cost (14.3979 → 13.4211).
Creating a temp table for repeatedly-used CTEs
In some use case pages, the same CTE is used for several queries. Temporary tables are more efficient to use because they can be reused across multiple queries within the same session, and support indexing They are particularly ideal when the data is expensive to compute and needs to be accessed frequently. Note that USZips is indexed prior to executing both code, and subsequent code must refer to temp table results with a hashtag (#factory_zips) as per SQL Server syntax.
Using a CTE rather than a subquery
By separating the CTE, performed was improved from 16 ms to 0 ms CPU time and 125 ms to 93 ms Elapsed time, and better readability was achieved.
Grouping with WHERE instead of HAVING
While performance did not vary greatly for this change, it is generally preferred to use WHERE rather than HAVING since the former gets run first in the SQL order of operations.