Skip to content
Gallery
Gwendolyn Ang - BI Portfolio
Share
Explore
Database Design, Querying & Analysis

icon picker
Bonus: Query Optimization

This section includes an experiment evaluating the impact of two query methods on four performance metrics. Additionally, it highlights techniques used to improve data retrieval efficiency, along with a side-by-side comparison of the original and optimized code.

Optimization Experiment

The following experiment was conducted to assess the impact of two query optimization methods on various performance metrics. The original code is shown below:

Query Optimization Methods

Query Performance Metrics

Results

Order
Clustered Index
Est. Sub-tree Cost
CPU Time (ms)
Elapsed Time (ms)
Logical Reads
Original
None
1.36472
46
130
USZips - 325, Orders - 209, Cust. - 1825, USStates - 1
Reordered
None
1.36472
16
109
USZips - 325, Orders - 209, Cust.- 1825, USStates - 1
Original
Orders
1.34768
47
147
USZips - 312 Orders - 212, Cust.- 16, USStates - 1
Reordered
Orders
1.34768
15
75
USZips - 312 Orders - 212, Cust.- 16, USStates - 1
Original
Customers
1.34768
31
104
USZips - 325, Orders - 210, Cust.- 18, USStates - 1
Reordered
Customers
1.34768
15
105
USZips - 325, Orders - 210, Cust.- 18, USStates - 1
Original
USZips
1.34768
62
142
USZips - 314, Orders - 210, Cust.- 16, USStates - 1
Reordered
USZips
1.34768
0
109
USZips - 314, Orders - 210, Cust.- 16, USStates - 1
Original
Orders, USZips
1.34768
78
142
USZips - 314, Orders - 212, Cust.- 16, USStates - 1
Reordered
Orders, USZips
1.34768
15
91
USZips - 314, Orders - 212, Cust.- 16, USStates - 1
Original
USZips, Customers
1.34768
16
108
USZips - 314, Orders - 210, Cust.- 18, USStates - 1
Reordered
USZips, Customers
1.34768
0
108
USZips - 314, Orders - 210, Cust.- 18, USStates - 1
Original
Customers, Orders
1.34768
31
112
USZips - 312, Orders - 212, Cust.- 18, USStates - 1
Reordered
Customers, Orders
1.34768
0
102
USZips - 312, Orders - 212, Cust.- 18, USStates - 1
Original
USZips, Orders, Customers
1.34768
31
136
USZips - 314, Orders - 212, Cust.- 18, USStates - 1
Reordered
USZips, Orders, Customers
1.34768
16
110
USZips - 314, Orders - 212, Cust.- 18, USStates - 1
There are no rows in this table

Discussion

It's essential to recognize that while the values of performance metrics can vary with each query execution, the overall patterns they exhibit tend to remain consistent.
Metric
JOIN Table Order
Indexing
Estimated Subtree Cost
Does not impact as values for Original and Reordered iterations were equal for each index combination
Indexing reduced cost from 1.365 to 1.348 as it allows the database to locate the necessary rows more efficiently without scanning the entire table.
The value of 1.348 remaining consistent across different index combinations indicates that query optimization capabilities has been maximized.
CPU Time
Generally improves when reordered particularly in configurations with multiple indexes
Indexing improves CPU times by facilitating faster data retrieval and workload reduction.
Indexes of USZips, USZips and Customers, and Customers and Orders had the lowest CPU times.
Elapsed Time
Shows significant improvements when join orders are optimized
The presence of indexes significantly decreases elapsed time, as they streamline data access and reduce the need for full table scans.
Indexes of Orders, Orders and USZips, and Customers and Orders had the lowest elapsed times.
Logical Reads
Does not impact as values for Original and Reordered iterations were equal for each index combination
Vastly reduces Logical Reads of the Customers table from 1825 to only 16-18 which indicates that scanning this entire table is very costly. Drops are likewise seen in USZips (325 → 312-314).
As for Orders, logical reads increased from 209 → 210-214, showing that although indexing can speed up data retrieval, they can also introduce overhead from index maintenance. Users should use indexes with the optimization trade-offs in mind.
Similar results between single index iterations, and multiple index ones show that in this case, multiple indexes might not necessarily lead to further improvements in logical reads.
There are no rows in this table

Conclusion and Recommendation

Overall, reordering JOIN tables is seen to improve CPU and elapsed time without significantly impacting subtree cost and logical reads. Users must always assess JOIN order based on table size with smaller tables generally joined first.
Prioritize creating indexes on frequently joined tables to enhance data retrieval efficiency and lower CPU and elapsed times. However, use the method cautiously due to optimization trade-offs and consider that having multiple indexes does not necessarily yield better results. Another argument against indexing can be found in :
When not to use indexes. When data is written to the database, the original table (the clustered index) is updated first and then all of the indexes off of that table are updated. Every time a write is made to the database, the indexes are unusable until they have updated. If the database is constantly receiving writes then the indexes will never be usable. This is why indexes are typically applied to databases in data warehouses that get new data updated on a scheduled basis(off-peak hours) and not production databases which might be receiving new writes all the time.
If only one performance metric were to be prioritized, it should be Elapsed Time as it directly reflects user experience and system responsiveness. In this case, the index combination of Orders and USZips had the lowest value after applying JOIN reordering likely because these tables have the most number of rows at 10,194 and 33,787, respectively.


Optimization Techniques

Adding index and reordering joined tables

Description





Hardcoding to remove subqueries

In this project, we are sure that the minimum year is 2021 and all years have January (1) as a starting month. Hardcoding these values can save the code from running subqueries. Performance improvements appear in CPU time (578 → 297 ms), elapsed time (751 → 553 ms) and estimated subtree cost (14.3979 → 13.4211).

Creating a temp table for repeatedly-used CTEs

In some use case pages, the same CTE is used for several queries. Temporary tables are more efficient to use because they can be reused across multiple queries within the same session, and support indexing They are particularly ideal when the data is expensive to compute and needs to be accessed frequently. Note that USZips is indexed prior to executing both code, and subsequent code must refer to temp table results with a hashtag (#factory_zips) as per SQL Server syntax.

Using a CTE rather than a subquery

By separating the CTE, performed was improved from 16 ms to 0 ms CPU time and 125 ms to 93 ms Elapsed time, and better readability was achieved.

Grouping with WHERE instead of HAVING

While performance did not vary greatly for this change, it is generally preferred to use WHERE rather than HAVING since the former gets run first in the SQL order of operations.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.