SQL Aggregations & Joins

Summary - Aggregation & Joins

Aggregation Functions in SQL

Aggregation Functions in SQL are used to perform calculations on a set of values and return a single summary value. They are essential for generating summary reports and insights from large datasets.

Common Aggregation Functions

COUNT(): Returns the number of rows that match a specified condition.
Example: To count the number of employees in a company:
SELECT COUNT(*) FROM employees;

SUM(): Adds up all the values in a specified column.
Example: To calculate the total sales amount:
SELECT SUM(sales_amount) FROM sales;

AVG(): Calculates the average value of a specified column.
Example: To find the average score of students:
SELECT AVG(score) FROM exam_results;

MIN(): Returns the smallest value in a specified column.
Example: To find the minimum price of products:
SELECT MIN(price) FROM products;

MAX(): Returns the largest value in a specified column.
Example: To find the highest salary in a department:
SELECT MAX(salary) FROM employees WHERE department_id = 10;

Understanding Levels of Data

Data in SQL can be understood at different levels, which helps in organizing and analyzing it effectively.

Types of Data Levels

Row Level: Each row in a table represents a single record or entry. For example, in a customers table, each row contains data for one customer.
Column Level: Each column in a table represents a specific attribute or property of the data. For instance, a column named customer_name would contain the names of all customers.
Table Level: A table is a collection of rows and columns, representing a specific type of data. For example, an orders table might include all orders made by customers.
Understanding these levels helps in structuring SQL queries and data retrieval.

Changing Level of Data using Aggregation

Changing the level of data using aggregation refers to the process of summarizing data from a detailed level (like individual rows) to a higher level (like groups of rows). This is often done to analyze trends or patterns.

Example

If you have sales data at the row level (each sale is a row), you can aggregate this data to find the total sales per month:
SELECT MONTH(sale_date) AS sale_month, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY MONTH(sale_date);

This query changes the level of data from individual sales to total sales by month.

Aggregation using Group By and Having Keywords

GROUP BY Clause

The GROUP BY clause is used in SQL to arrange identical data into groups. This is typically used with aggregate functions to summarize data.

Example of GROUP BY

To count the number of orders by each customer:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id;

HAVING Clause

The HAVING clause is used to filter groups based on aggregate values, similar to the WHERE clause but applied after grouping.

Example of HAVING

To find customers who have placed more than five orders:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 5;

Inner Join

An Inner Join combines rows from two or more tables based on a related column between them. It returns only the rows where there is a match in both tables.

Example

To retrieve customer names and their corresponding orders:
SELECT customers.customer_name, orders.order_id
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;

This query only returns customers who have made orders.

Left Join

A Left Join (or Left Outer Join) returns all rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.

Example

To list all customers and their orders, including customers with no orders:
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;

This will show all customers, with NULL for those who haven’t placed any orders.

Right Join

A Right Join (or Right Outer Join) is the opposite of a Left Join. It returns all rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.

Example

To list all orders and the corresponding customer names, including orders with no associated customers:
SELECT customers.customer_name, orders.order_id
FROM customers
RIGHT JOIN orders ON customers.customer_id = orders.customer_id;

This will show all orders, with NULL for those that do not have an associated customer.

Full Outer Join

A Full Outer Join returns all rows from both tables, with NULLs in place where there is no match. This join combines the results of both Left and Right Joins.

Example

To retrieve all customers and all orders, regardless of whether they have a match:
SELECT customers.customer_name, orders.order_id
FROM customers
FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;

This will show all customers and all orders, filling in NULLs where appropriate.

Cross Join

A Cross Join returns the Cartesian product of two tables, meaning it combines every row from the first table with every row from the second table. This join does not require any condition.

Example

If you want to pair every customer with every product:
SELECT customers.customer_name, products.product_name
FROM customers
CROSS JOIN products;


This results in a combination of all customers with all products.

Union and Union ALL

The UNION operator combines the results of two or more SELECT queries. It removes duplicate rows by default. UNION ALL, on the other hand, includes all duplicates.

Example of UNION

To combine results from two tables:
SELECT customer_name FROM customers
UNION
SELECT supplier_name FROM suppliers;

Example of UNION ALL

To include all entries, even duplicates:
SELECT customer_name FROM customers
UNION ALL
SELECT supplier_name FROM suppliers;


In this case, every entry from both tables is included, regardless of duplication.

Sequence of Execution in SQL Queries

When executing an SQL query, there is a specific order of operations that the SQL engine follows. Understanding this sequence helps in writing effective queries.

Order of Operations

FROM: Identifies the tables to retrieve data from.
JOIN: Combines rows from different tables based on join conditions.
WHERE: Filters records based on specified criteria.
GROUP BY: Groups rows sharing the same values in specified columns into summary rows.
HAVING: Filters groups based on aggregate values.
SELECT: Specifies which columns to include in the result.
ORDER BY: Sorts the result set based on specified columns.

Example

SELECT department_id, COUNT(employee_id) AS employee_count
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department_id
HAVING COUNT(employee_id) > 10
ORDER BY employee_count DESC;

In this example, the query retrieves departments with more than ten employees hired after January 1, 2020, grouping by department and sorting the result by employee count in descending order.
Grasping Aggregation and Joins is vital for effective data analysis and combining data across tables. For more detailed information, refer to the Notion link:
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.