Explore

w23 NOV 22 MAD 6123 Lab Test 2 MONGO DB Aggregation Pipeline

⁠

How to hand this in:

Make a Word Document named as StudentName_StudentID.docx

Put screen shots and all code into that Document

Upload to:

Welcome to a pivotal chapter in your journey through the world of database management!

Learning Outcomes:

In SQL, we can do sum, median, average and other formulas on aggreations of column data.

Transitioning from the structured queries of SQL to the dynamic and versatile Aggregation Pipeline of MongoDB.

Think of this as learning a new language that allows you to converse with your data in more nuanced and intricate ways.

The Aggregation Pipeline is MongoDB's powerful tool that enables you to perform complex data processing and analysis, similar to the formulas and functions you've mastered in SQL.

As we dive into this topic, you'll discover how to elegantly transform, summarize, and analyze large datasets using MongoDB's intuitive and expressive framework.

This is not just about learning new commands; it's about adopting a new perspective on how to interact with and understand your data.

Let's start exploring the rich capabilities of MongoDB's Aggregation Pipeline!

Here is the sample data we will use for this activity:

⁠

https://www.dropbox.com/scl/fi/unihrkr15ge4p9eql1ign/sample-real-estate-data.xlsx?rlkey=9r65bkhwzsmjplfmkrde4rf1q&dl=0⁠

⁠

How to translate data from an Excel sheet into MongoDB shell commands to determine the minimum, maximum, and median prices for each type and description of house.

Let's assume your Excel sheet has a structure like this:

Table 1

Table 1

Type

Description

Price

Apartment

Cozy

1200

Villa

Spacious

2500

Apartment

Modern

1500

...

There are no rows in this table

⁠

To analyze this data in MongoDB, you'd first need to import it into a MongoDB collection.

Once that's done, you can use the aggregation framework to calculate the minimum, maximum, and median prices.

Here’s a step-by-step guide:

Using MONGO IMPORT:

https://coda.io/@peter-sigurdson/mongoimport-getting-data-from-a-flat-text-file-into-mongo-db⁠

⁠

Import Data from Excel to MongoDB:

Convert your Excel data to a CSV file.

Use a MongoDB tool like mongoimport to import the CSV into a MongoDB collection (let's name it houses).

Perform Aggregation Queries:

To find the minimum, maximum, and median prices for each type and description, you’d use MongoDB's aggregation pipeline.

Here's an example of how you would write these queries:

db.houses.aggregate([

{

$group: {

_id: { type: "$Type", description: "$Description" },

// Group by type and description

minPrice: { $min: "$Price" }, // Find minimum price

maxPrice: { $max: "$Price" }, // Find maximum price

medianPrice: { $avg: "$Price" } // Calculate the average selling price

}

]);

In this command:

$group is used to group the documents by Type and Description.

$min, $max, and $avg are aggregation operators that calculate the minimum, maximum, and average (used here as a simple approximation for median) prices respectively for each group.

This command is like asking a database, "For each type and description of house, tell me the lowest, highest, and median price."

By running this command in the MongoDB shell, you'll get the desired statistics for your housing data.

Remember, the key is in how you structure your aggregation pipeline to sift through the data and find the insights you need.

Translating the concept of SQL's `GROUP BY` to MongoDB's aggregation framework is a key step in harnessing the power of MongoDB for data analysis.

In SQL, you might use `GROUP BY` to calculate the average price per house type. In MongoDB, you would use the Aggregation Pipeline to achieve the same goal. Let's walk through how you would do this in MongoDB shell scripting.

MongoDB Aggregation Pipeline

The Aggregation Pipeline in MongoDB is a framework for data aggregation, modeled as a pipeline of stages.

Each stage transforms the data as it passes through the pipeline. For calculating the average price per house type, you would use the `$group` and `$avg` stages.

Example Scenario

Let's say you have a collection named `houses` with documents that have fields like `type` and `price`. Your goal is to calculate the average price for each house type.

MongoDB Shell Scripting Command

1. **Open MongoDB Shell:** First, ensure that MongoDB is running and then open the MongoDB shell.

2. **Use the Appropriate Database:** If your collection is in a database other than the default `test` database, switch to it using the `use` command: use yourDatabaseName;

3. **Write and Execute the Aggregation Command:** Use the following script to calculate the average price per house type:

db.houses.aggregate([ { $group: { _id: "$type", // Group by the "type" field averagePrice: { $avg: "$price" } // Calculate the average price } } ]);

In this script: - `db.houses`: Refers to your collection named `houses`. - `.aggregate([])`: Starts the aggregation pipeline. - `$group`: The aggregation stage where the grouping of documents is defined. - `_id: "$type"`: Indicates that the grouping should be done based on the `type` field in your documents. - `$avg: "$price"`: Calculates the average of the `price` field for each group.

4. **Interpreting the Results:** The output will be a list of documents where each document represents a house type along with the calculated average price for that type.

Example Output

The output will look something like this (assuming fictional data):

{ "_id" : "Apartment", "averagePrice" : 1500 } { "_id" : "Villa", "averagePrice" : 2500 } { "_id" : "Bungalow", "averagePrice" : 1800 }

This output means, for instance, that the average price of all apartments in your collection is 1500.

Conclusion

By using the aggregation pipeline in MongoDB, you can perform complex data analysis tasks like calculating averages grouped by certain fields, akin to what you might do with `GROUP BY` in SQL.

This powerful tool opens up a plethora of possibilities for data manipulation and insights.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.