Lab Workbook: Creating Predicate Joins in MongoDB


MongoDB is a popular document-oriented NoSQL database that allows developers to store and retrieve data in a flexible JSON-like format. One of the main advantages of MongoDB is that it supports complex data structures, including nested arrays and objects, which can be difficult to model in a traditional relational database.

However, there are times when we need to join data from multiple collections in MongoDB, just like we would in a relational database. In this lab, we will explore how to create predicate joins in MongoDB using the $lookup operator.


Before we begin, you should have the following:
A basic understanding of MongoDB and its query language.
A MongoDB server running locally or remotely.
The mongo shell or a MongoDB client tool such as Compass.

Getting Started

To follow along with this lab, we will be using a sample database that contains two collections: orders and customers.
The orders collection contains information about customer orders, while the customers collection contains information about the customers who placed those orders.
Here is an example document from the orders collection:
{ "_id": ObjectId("6178bfbf5d1d2e5c5b7f8b5a"), "order_number": "1001", "customer_id": ObjectId("6178bf9f5d1d2e5c5b7f8b58"), "order_date": ISODate("2021-10-27T00:00:00Z"), "order_total": 50.0}

And here is an example document from the customers collection:
{ "_id": ObjectId("6178bf9f5d1d2e5c5b7f8b58"), "first_name": "John", "last_name": "Doe", "email": "", "phone": "555-555-5555", "address": { "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345" }}
The customer_id field in the orders collection is a reference to the _id field in the customers collection.

Creating a Predicate Join

To create a predicate join in MongoDB, we can use the $lookup operator, which performs a left outer join between two collections based on a matching condition.
Here is an example of a predicate join that retrieves all orders along with the corresponding customer information:
db.orders.aggregate([ { $lookup: { from: "customers", localField: "customer_id", foreignField: "_id", as: "customer" } }]);

Let's break down each part of this query:

db.orders.aggregate: This starts an aggregation pipeline on the orders collection, which allows us to perform complex queries.
$lookup: This is the operator that performs the join. It takes an object with the following properties:
from: This is the name of the collection we want to join with (customers in this case).
localField: This is the name of the field in the orders collection that we want to use to match documents in the customers collection (customer_id in this case).
foreignField: This is the name of the field in the customers collection that we want to use to match documents in the orders collection (_id in this case).
as: This is the name of the field where the joined data will be stored (customer in this case).
[]: This is an empty array that specifies that we want to return all documents.
When we run this query, MongoDB will perform a left outer join between the orders and customers collections based on the customer_id and _id fields, respectively.
The result will be an array of documents that contain both the order information and the corresponding customer information:
[ { "_id": ObjectId("6178bfbf5d1d2e5c5b7f8b5a"), "order_number": "1001", "customer_id": ObjectId("6178bf9f5d1d2e5c5b7f8b58"), "order_date": ISODate("2021-10-27T00:00:00Z"), "order_total": 50.0, "customer": [ { "_id": ObjectId("6178bf9f5d1d2e5c5b7f8b58"), "first_name": "John", "last_name": "Doe", "email": "", "phone": "555-555-5555", "address": { "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345" } } ] }]


Predicate joins in MongoDB are a powerful feature that allow us to combine data from multiple collections into a single result set.
By using the $lookup operator, we can perform left outer joins based on matching conditions, just like we would in a traditional relational database. This can be particularly useful when working with complex data structures that are difficult to model in a relational database.

Lab Workbook - Part 2: Mongoose Code for Predicate Joins Using $lookup

In this lab workbook, we will create a fully self-contained Mongoose code that creates a target database and collections, and implements methods that do predicate joins using $lookup. We will use a movie ticket app as an example, with collections for customers and movie tickets.

Step 1: Install Required Packages

First, we need to install the required packages: mongoose and dotenv.
npm install mongoose dotenv

Step 2: Create Database Connection

Next, we need to create a connection to our MongoDB database using Mongoose. We will use the connect() method to connect to the database, and the disconnect() method to disconnect from the database.
const mongoose = require('mongoose');require('dotenv').config();
const { DB_USERNAME, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME } = process.env;
(async () => { try { await mongoose.connect(`mongodb://${DB_USERNAME}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_NAME}`, { useNewUrlParser: true, useUnifiedTopology: true, }); console.log('Connected to database!'); } catch (error) { console.error(error); } finally { await mongoose.disconnect(); console.log('Disconnected from database!'); }})();
Note that we are using the dotenv package to load environment variables from a .env file. We are also using async/await and a try/catch block to handle errors and ensure that the database connection is closed even if there is an error.

Step 3: Define Schemas and Models

Next, we need to define the schemas and models for our collections. We will define a schema for both the customers and movieTickets collections.
const mongoose = require('mongoose');
const customerSchema = new mongoose.Schema({ name: { type: String, required: true, }, email: { type: String, required: true, }, phone: { type: String, required: true, }, address: { street: { type: String, required: true, }, city: { type: String, required: true, }, state: { type: String, required: true, }, zip: { type: String, required: true, }, },});
const movieTicketSchema = new mongoose.Schema({ movieId: { type: mongoose.Schema.Types.ObjectId, ref: 'Movie', required: true, }, customerId: { type: mongoose.Schema.Types.ObjectId, ref: 'Customer', required: true, }, showtime: { type: Date, required: true, },});
const Customer = mongoose.model('Customer', customerSchema);const MovieTicket = mongoose.model('MovieTicket', movieTicketSchema);
module.exports = { Customer, MovieTicket };
Note that we are using the ref option in the movieTicketSchema to reference the Movie and Customer collections. We are also exporting the Customer and MovieTicket models for use in other parts of our code.

Step 4: Seed Data

Next, we need to seed some data into our collections. We will use the insertMany() method to insert multiple documents into each collection at once.

const { Customer, MovieTicket } = require('./models');
(async () => { try { const customerData = [ { name: 'John Doe', email: '', phone: '555-555-5555', address: { street: '123 Main St', city: 'Anytown', state: 'CA', zip: '12345', }, },
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.