Aggregation Functions in SQL
Aggregation Functions in SQL are used to perform calculations on a set of values and return a single summary value. They are essential for generating summary reports and insights from large datasets.
Common Aggregation Functions
COUNT(): Returns the number of rows that match a specified condition. Example: To count the number of employees in a company: SELECT COUNT(*) FROM employees;
SUM(): Adds up all the values in a specified column. Example: To calculate the total sales amount: SELECT SUM(sales_amount) FROM sales;
AVG(): Calculates the average value of a specified column. Example: To find the average score of students: SELECT AVG(score) FROM exam_results;
MIN(): Returns the smallest value in a specified column. Example: To find the minimum price of products: SELECT MIN(price) FROM products;
MAX(): Returns the largest value in a specified column. Example: To find the highest salary in a department: SELECT MAX(salary) FROM employees WHERE department_id = 10;
Understanding Levels of Data
Data in SQL can be understood at different levels, which helps in organizing and analyzing it effectively.
Types of Data Levels
Row Level: Each row in a table represents a single record or entry. For example, in a customers table, each row contains data for one customer. Column Level: Each column in a table represents a specific attribute or property of the data. For instance, a column named customer_name would contain the names of all customers. Table Level: A table is a collection of rows and columns, representing a specific type of data. For example, an orders table might include all orders made by customers. Understanding these levels helps in structuring SQL queries and data retrieval.
Changing Level of Data using Aggregation
Changing the level of data using aggregation refers to the process of summarizing data from a detailed level (like individual rows) to a higher level (like groups of rows). This is often done to analyze trends or patterns.
Example
If you have sales data at the row level (each sale is a row), you can aggregate this data to find the total sales per month:
SELECT MONTH(sale_date) AS sale_month, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY MONTH(sale_date);
This query changes the level of data from individual sales to total sales by month.
Aggregation using Group By and Having Keywords
GROUP BY Clause
The GROUP BY clause is used in SQL to arrange identical data into groups. This is typically used with aggregate functions to summarize data.
Example of GROUP BY
To count the number of orders by each customer:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id;
HAVING Clause
The HAVING clause is used to filter groups based on aggregate values, similar to the WHERE clause but applied after grouping.
Example of HAVING
To find customers who have placed more than five orders:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 5;
Inner Join
An Inner Join combines rows from two or more tables based on a related column between them. It returns only the rows where there is a match in both tables.
Example
To retrieve customer names and their corresponding orders:
SELECT customers.customer_name, orders.order_id
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;
This query only returns customers who have made orders.
Left Join
A Left Join (or Left Outer Join) returns all rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.
Example
To list all customers and their orders, including customers with no orders:
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;
This will show all customers, with NULL for those who haven’t placed any orders.
Right Join
A Right Join (or Right Outer Join) is the opposite of a Left Join. It returns all rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.
Example
To list all orders and the corresponding customer names, including orders with no associated customers:
SELECT customers.customer_name, orders.order_id
FROM customers
RIGHT JOIN orders ON customers.customer_id = orders.customer_id;
This will show all orders, with NULL for those that do not have an associated customer.
Full Outer Join
A Full Outer Join returns all rows from both tables, with NULLs in place where there is no match. This join combines the results of both Left and Right Joins.
Example
To retrieve all customers and all orders, regardless of whether they have a match:
SELECT customers.customer_name, orders.order_id
FROM customers
FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;
This will show all customers and all orders, filling in NULLs where appropriate.
Cross Join
A Cross Join returns the Cartesian product of two tables, meaning it combines every row from the first table with every row from the second table. This join does not require any condition.
Example
If you want to pair every customer with every product:
SELECT customers.customer_name, products.product_name
FROM customers
CROSS JOIN products;
This results in a combination of all customers with all products.
Union and Union ALL
The UNION operator combines the results of two or more SELECT queries. It removes duplicate rows by default. UNION ALL, on the other hand, includes all duplicates.
Example of UNION
To combine results from two tables:
SELECT customer_name FROM customers
UNION
SELECT supplier_name FROM suppliers;
Example of UNION ALL
To include all entries, even duplicates:
SELECT customer_name FROM customers
UNION ALL
SELECT supplier_name FROM suppliers;
In this case, every entry from both tables is included, regardless of duplication.
Sequence of Execution in SQL Queries
When executing an SQL query, there is a specific order of operations that the SQL engine follows. Understanding this sequence helps in writing effective queries.
Order of Operations
FROM: Identifies the tables to retrieve data from. JOIN: Combines rows from different tables based on join conditions. WHERE: Filters records based on specified criteria. GROUP BY: Groups rows sharing the same values in specified columns into summary rows. HAVING: Filters groups based on aggregate values. SELECT: Specifies which columns to include in the result. ORDER BY: Sorts the result set based on specified columns. Example
SELECT department_id, COUNT(employee_id) AS employee_count
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department_id
HAVING COUNT(employee_id) > 10
ORDER BY employee_count DESC;
In this example, the query retrieves departments with more than ten employees hired after January 1, 2020, grouping by department and sorting the result by employee count in descending order.
Grasping Aggregation and Joins is vital for effective data analysis and combining data across tables. For more detailed information, refer to the Notion link: