Gwendolyn Ang - BI Portfolio

Explore

Database Design, Querying & Analysis

Bonus: Query Optimization

This section includes an experiment evaluating the impact of two query methods on four performance metrics. Additionally, it highlights techniques used to improve data retrieval efficiency, along with a side-by-side comparison of the original and optimized code.

Optimization Experiment

The following experiment was conducted to assess the impact of two query optimization methods on various performance metrics. The original code is shown below:

Query Optimization Methods

Query Performance Metrics

Results

Order

Clustered Index

Est. Sub-tree Cost

CPU Time (ms)

Elapsed Time (ms)

Logical Reads

Original

None

1.36472

130

USZips - 325, Orders - 209, Cust. - 1825, USStates - 1

Reordered

None

1.36472

109

USZips - 325, Orders - 209, Cust.- 1825, USStates - 1

Original

Orders

1.34768

147

USZips - 312 Orders - 212, Cust.- 16, USStates - 1

Reordered

Orders

1.34768

USZips - 312 Orders - 212, Cust.- 16, USStates - 1

Original

Customers

1.34768

104

USZips - 325, Orders - 210, Cust.- 18, USStates - 1

Reordered

Customers

1.34768

105

USZips - 325, Orders - 210, Cust.- 18, USStates - 1

Original

USZips

1.34768

142

USZips - 314, Orders - 210, Cust.- 16, USStates - 1

Reordered

USZips

1.34768

109

USZips - 314, Orders - 210, Cust.- 16, USStates - 1

Original

Orders, USZips

1.34768

142

USZips - 314, Orders - 212, Cust.- 16, USStates - 1

Reordered

Orders, USZips

1.34768

USZips - 314, Orders - 212, Cust.- 16, USStates - 1

Original

USZips, Customers

1.34768

108

USZips - 314, Orders - 210, Cust.- 18, USStates - 1

Reordered

USZips, Customers

1.34768

108

USZips - 314, Orders - 210, Cust.- 18, USStates - 1

Original

Customers, Orders

1.34768

112

USZips - 312, Orders - 212, Cust.- 18, USStates - 1

Reordered

Customers, Orders

1.34768

102

USZips - 312, Orders - 212, Cust.- 18, USStates - 1

Original

USZips, Orders, Customers

1.34768

136

USZips - 314, Orders - 212, Cust.- 18, USStates - 1

Reordered

USZips, Orders, Customers

1.34768

110

USZips - 314, Orders - 212, Cust.- 18, USStates - 1

There are no rows in this table

⁠

Discussion

It's essential to recognize that while the values of performance metrics can vary with each query execution, the overall patterns they exhibit tend to remain consistent.

Metric

JOIN Table Order

Indexing

Estimated Subtree Cost

Does not impact as values for Original and Reordered iterations were equal for each index combination

⁠

Indexing reduced cost from 1.365 to 1.348 as it allows the database to locate the necessary rows more efficiently without scanning the entire table.

The value of 1.348 remaining consistent across different index combinations indicates that query optimization capabilities has been maximized.

CPU Time

Generally improves when reordered particularly in configurations with multiple indexes

Indexing improves CPU times by facilitating faster data retrieval and workload reduction.

Indexes of USZips, USZips and Customers, and Customers and Orders had the lowest CPU times.

Elapsed Time

Shows significant improvements when join orders are optimized

The presence of indexes significantly decreases elapsed time, as they streamline data access and reduce the need for full table scans.

Indexes of Orders, Orders and USZips, and Customers and Orders had the lowest elapsed times.

Logical Reads

Does not impact as values for Original and Reordered iterations were equal for each index combination

⁠

Vastly reduces Logical Reads of the Customers table from 1825 to only 16-18 which indicates that scanning this entire table is very costly. Drops are likewise seen in USZips (325 → 312-314).

As for Orders, logical reads increased from 209 → 210-214, showing that although indexing can speed up data retrieval, they can also introduce overhead from index maintenance. Users should use indexes with the optimization trade-offs in mind.

Similar results between single index iterations, and multiple index ones show that in this case, multiple indexes might not necessarily lead to further improvements in logical reads.

There are no rows in this table

⁠

Conclusion and Recommendation

Overall, reordering JOIN tables is seen to improve CPU and elapsed time without significantly impacting subtree cost and logical reads. Users must always assess JOIN order based on table size with smaller tables generally joined first.

Prioritize creating indexes on frequently joined tables to enhance data retrieval efficiency and lower CPU and elapsed times. However, use the method cautiously due to optimization trade-offs and consider that having multiple indexes does not necessarily yield better results. Another argument against indexing can be found in

this article⁠

When not to use indexes. When data is written to the database, the original table (the clustered index) is updated first and then all of the indexes off of that table are updated. Every time a write is made to the database, the indexes are unusable until they have updated. If the database is constantly receiving writes then the indexes will never be usable. This is why indexes are typically applied to databases in data warehouses that get new data updated on a scheduled basis(off-peak hours) and not production databases which might be receiving new writes all the time.

If only one performance metric were to be prioritized, it should be Elapsed Time as it directly reflects user experience and system responsiveness. In this case, the index combination of Orders and USZips had the lowest value after applying JOIN reordering likely because these tables have the most number of rows at 10,194 and 33,787, respectively.

Optimization Techniques

Adding index and reordering joined tables

Description

Hardcoding to remove subqueries

In this project, we are sure that the minimum year is 2021 and all years have January (1) as a starting month. Hardcoding these values can save the code from running subqueries. Performance improvements appear in CPU time (578 → 297 ms), elapsed time (751 → 553 ms) and estimated subtree cost (14.3979 → 13.4211).

Creating a temp table for repeatedly-used CTEs

In some use case pages, the same CTE is used for several queries. Temporary tables are more efficient to use because they can be reused across multiple queries within the same session, and support indexing They are particularly ideal when the data is expensive to compute and needs to be accessed frequently. Note that USZips is indexed prior to executing both code, and subsequent code must refer to temp table results with a hashtag (#factory_zips) as per SQL Server syntax.

Using a CTE rather than a subquery

By separating the CTE, performed was improved from 16 ms to 0 ms CPU time and 125 ms to 93 ms Elapsed time, and better readability was achieved.

Grouping with WHERE instead of HAVING

While performance did not vary greatly for this change, it is generally preferred to use WHERE rather than HAVING since the former gets run first in the SQL order of operations.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.