Process and analyze data with MongoDB's powerful aggregation framework — match, group, sort, and transform.
So far, you have used MongoDB to find, create, update, and delete individual documents. But what if you need to answer questions like: "What is the total revenue per product category?" or "Who are the top 5 customers by spending?" or "What is the average order value per month?" Simple CRUD queries cannot answer these questions. You need aggregation — the ability to process multiple documents, compute results across groups, and return transformed data.
In SQL databases, you would use GROUP BY, SUM(), AVG(), COUNT(), HAVING, and subqueries to perform these analytics. MongoDB provides an equivalent (and in many ways more powerful) system called the Aggregation Pipeline.
The aggregation pipeline works like a factory assembly line. Raw documents enter the pipeline on one end, pass through a series of stages, and emerge as transformed results on the other end. Each stage performs one specific operation on the documents: filtering, grouping, sorting, reshaping, joining, or computing new fields. The output of one stage becomes the input of the next.
[All Documents] → $match → $group → $sort → $limit → [Final Results]
For example, to find the top 3 product categories by total revenue from completed orders, your pipeline would be:
amount field for each groupWhat enters the pipeline as potentially millions of order documents exits as exactly 3 category summary objects. This is the power of aggregation — it transforms raw operational data into meaningful analytics.
The aggregation pipeline is executed on the MongoDB server, not in your Node.js application. This is crucial for performance. Instead of fetching millions of documents over the network and processing them in JavaScript, you send the pipeline definition to MongoDB and it does all the heavy computation internally, returning only the final results. For large datasets, this difference can be orders of magnitude faster.
The aggregation pipeline has dozens of stages, but you will use a handful of core stages in the vast majority of your queries. Understanding these core stages deeply is more valuable than memorizing all of them.
$match — Filters documents, identical to a find() query. Only documents that match the condition pass through to the next stage. Always place $match as early as possible in your pipeline — it reduces the number of documents subsequent stages need to process, dramatically improving performance. MongoDB can use indexes on $match when it is the first stage.
{ $match: { status: 'completed', createdAt: { $gte: new Date('2024-01-01') } } }$group — Groups documents by a specified key and computes aggregate values. This is the most important aggregation stage. The _id field specifies the grouping key, and additional fields use accumulator operators to compute values across the group.
{ $group: { _id: '$category', totalSales: { $sum: '$amount' }, count: { $sum: 1 } } }$sort — Orders documents by one or more fields. Use 1 for ascending and -1 for descending. After a $group stage, you typically sort by the computed aggregate to find the top or bottom results.
{ $sort: { totalSales: -1 } } // Highest sales first$project — Reshapes documents by including, excluding, renaming, or computing new fields. Think of it as SELECT in SQL — you choose exactly which fields appear in the output. You can also create computed fields using expressions.
{ $project: { _id: 0, category: '$_id', totalSales: 1, averageOrder: { $divide: ['$totalSales', '$count'] } } }$limit and $skip — Pagination stages. $limit: 10 takes the first 10 documents, $skip: 20 skips the first 20. Combined with $sort, these give you paginated, sorted results.
$unwind — Deconstructs an array field. If a document has tags: ['js', 'node', 'express'], $unwind: '$tags' creates three separate documents, each with one tag. Essential when you need to group by array elements.
$lookup — Performs a left outer join with another collection. MongoDB's equivalent of SQL JOIN. This is how you combine data from related collections in a single aggregation query.
Stages are applied strictly in order. The same stage can appear multiple times in a pipeline — for example, you might $match early to filter broadly, then $match again after a $group to filter the aggregated results (equivalent to SQL's HAVING clause).
// ── Example 1: Top 5 Categories by Revenue ──────────
const topCategories = await Order.aggregate([
// Stage 1: Only completed orders
{ $match: { status: 'completed' } },
// Stage 2: Group by category, compute totals
{ $group: {
_id: '$category',
totalRevenue: { $sum: '$amount' },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: '$amount' }
}},
// Stage 3: Sort by revenue (highest first)
{ $sort: { totalRevenue: -1 } },
// Stage 4: Top 5 only
{ $limit: 5 },
// Stage 5: Clean up output
{ $project: {
_id: 0,
category: '$_id',
totalRevenue: 1,
orderCount: 1,
avgOrderValue: { $round: ['$avgOrderValue', 2] }
}}
]);
// Result: [{ category: 'electronics', totalRevenue: 50000, orderCount: 120, avgOrderValue: 416.67 }, ...]
// ── Example 2: Monthly Sales Report ──────────────────
const monthlySales = await Order.aggregate([
{ $match: {
createdAt: { $gte: new Date('2024-01-01'), $lt: new Date('2025-01-01') }
}},
{ $group: {
_id: { month: { $month: '$createdAt' }, year: { $year: '$createdAt' } },
revenue: { $sum: '$amount' },
orders: { $sum: 1 }
}},
{ $sort: { '_id.year': 1, '_id.month': 1 } }
]);
// ── Example 3: Unwind + Group (Tags Analysis) ────────
const popularTags = await Product.aggregate([
{ $unwind: '$tags' }, // Split array into individual docs
{ $group: {
_id: '$tags',
count: { $sum: 1 },
avgPrice: { $avg: '$price' }
}},
{ $sort: { count: -1 } },
{ $limit: 10 }
]);
// Each product had a tags array — now we know which tags are most commonThe $group stage is where the real computation happens. Its _id field determines the grouping key — all documents with the same value for that key are collected into one group. The remaining fields use accumulator operators to compute values across the documents in each group.
Here are the accumulators you will use most often:
$sum — Adds up values. Use $sum: 1 to count documents in each group (each document contributes 1 to the total). Use $sum: '$fieldName' to sum a numeric field. For example, $sum: '$amount' computes the total amount for each group.
{ $group: { _id: '$category', total: { $sum: '$amount' }, count: { $sum: 1 } } }$avg — Computes the average value of a numeric field across the group. Useful for average order value, average rating, average response time.
{ $group: { _id: '$product', avgRating: { $avg: '$rating' } } }$min and $max — Find the smallest and largest values in the group. Useful for finding date ranges (earliest/latest order), price ranges (cheapest/most expensive), or performance bounds (fastest/slowest response).
{ $group: { _id: '$category', cheapest: { $min: '$price' }, mostExpensive: { $max: '$price' } } }$push — Collects values into an array. Every value of the specified field from documents in the group is pushed into the resulting array. Be careful with large groups — this can create very large arrays in memory.
{ $group: { _id: '$department', employees: { $push: '$name' } } }
// Result: { _id: 'engineering', employees: ['Alice', 'Bob', 'Charlie'] }$first and $last — Take the first or last value encountered in the group. Most useful after a $sort stage, where $first gives you the top-ranked document's value and $last gives you the bottom-ranked.
Grouping by null aggregates all documents into a single group. This is how you compute totals across the entire collection:
{ $group: { _id: null, totalOrders: { $sum: 1 }, totalRevenue: { $sum: '$amount' } } }
// Result: [{ _id: null, totalOrders: 15000, totalRevenue: 2500000 }]Compound grouping keys group by multiple fields. Pass an object as _id:
{ $group: { _id: { category: '$category', year: { $year: '$createdAt' } }, total: { $sum: '$amount' } } }This creates a group for every unique combination of category and year — like SQL's GROUP BY category, YEAR(created_at).
MongoDB is a document database — data that belongs together is typically stored together in one document. But sometimes data spans multiple collections. An orders collection might reference users by userId, or a posts collection might reference authors by authorId. When you need to combine data from related collections in a single query, you use the $lookup stage.
$lookup performs a left outer join between the current collection and another collection. For every document in the current pipeline, it finds all matching documents in the "from" collection and adds them as an array field.
{ $lookup: {
from: 'users', // The collection to join with
localField: 'authorId', // Field in the current documents
foreignField: '_id', // Field in the 'users' collection
as: 'author' // Output array field name
}}This stage looks at each document's authorId field, finds all documents in the users collection whose _id matches, and puts them into an author array on the current document. Since _id values are unique, the author array will typically contain exactly one element for a valid reference or zero elements if the referenced user does not exist (which is why it is a left outer join — documents without a match still pass through).
The result looks like:
{ title: 'My Post', authorId: 'user_123', author: [{ _id: 'user_123', name: 'Alice', email: 'alice@test.com' }] }Since $lookup always produces an array (even for one-to-one relationships), you typically follow it with $unwind to flatten the array into a single object:
{ $unwind: '$author' }
// Now: { title: 'My Post', authorId: 'user_123', author: { _id: 'user_123', name: 'Alice', email: 'alice@test.com' } }Then use $project to shape the final output:
{ $project: { title: 1, authorName: '$author.name', authorEmail: '$author.email' } }
// Final: { title: 'My Post', authorName: 'Alice', authorEmail: 'alice@test.com' }$lookup is powerful but expensive on large collections. Each document in the pipeline triggers a query against the "from" collection. To maintain performance: ensure the foreignField is indexed, place $match before $lookup to reduce the number of lookups, and avoid chaining multiple $lookup stages unless necessary. For high-traffic queries, consider denormalizing the data (storing the author name directly on the post document) to avoid the join entirely.
Which aggregation stage is equivalent to SQL's WHERE clause?