Model relationships between data — references, populate, embedded documents, and when to use each approach.
In relational databases like PostgreSQL or MySQL, relationships between tables are handled with JOINs. You have a users table and a posts table, and they are linked by a foreign key (posts.user_id references users.id). When you need a post along with its author's name, you JOIN the two tables in a single SQL query.
MongoDB takes a fundamentally different approach. There are no JOINs in the traditional sense (though MongoDB added $lookup as an aggregation stage, which functions similarly). Instead, MongoDB offers two primary strategies for modeling relationships:
1. Embedding (denormalization) — Store the related data directly inside the parent document as a nested object or array. A user document might contain an addresses array with full address objects nested inside it. No second query needed.
2. Referencing (normalization) — Store just the ObjectId of the related document. A post document contains an author field with the ObjectId of the user who wrote it. To get the author's details, you need a second query (or use Mongoose's populate() method).
A common misconception is that MongoDB is "anti-relational" — that it cannot handle related data. This is false. MongoDB handles relationships well; it just gives you different tools and trade-offs compared to SQL databases. The key skill is knowing when to embed and when to reference.
MongoDB documents have a 16MB size limit. This is rarely an issue for references (ObjectIds are tiny), but it matters for embedding. If you embed thousands of large objects inside a single document, you could approach this limit. This is one reason unbounded arrays of embedded documents are discouraged.
Choosing between embedding and referencing is one of the most important decisions in MongoDB schema design. Here are the guidelines used in production applications.
Embed when:
Reference when:
Rule of thumb: Embed for tight, bounded coupling. Reference for loose, unbounded coupling. When in doubt, start with references — they are easier to change later. Moving from embedded to referenced is harder than the reverse.
Mongoose's populate() method is the primary tool for working with references. It automatically replaces an ObjectId with the actual document it points to.
Setting up references in your schema:
const postSchema = new mongoose.Schema({
title: String,
content: String,
author: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User' // tells Mongoose which model to use for populate()
},
comments: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'Comment'
}]
});The ref property is a string matching the model name you used in mongoose.model('User', userSchema). It tells Mongoose where to look when you call populate().
Using populate():
// Basic populate — replaces author ObjectId with full user document
const post = await Post.findById(postId).populate('author');
// post.author is now { _id: '...', name: 'Alice', email: 'alice@example.com', ... }
// Select specific fields — only get name and email, not the entire user document
const post = await Post.findById(postId).populate('author', 'name email');
// post.author is now { _id: '...', name: 'Alice', email: 'alice@example.com' }
// Populate multiple fields
const post = await Post.findById(postId)
.populate('author', 'name')
.populate('comments');
// Nested populate — populate the author of each comment
const post = await Post.findById(postId)
.populate({
path: 'comments',
populate: { path: 'author', select: 'name' }
});Under the hood, populate() executes an additional query for each populated field. Post.findById(id).populate('author') runs two queries: one to find the post, and one to find the user with the matching _id. For arrays of references (like comments), Mongoose uses a single $in query to fetch all referenced documents at once, which is efficient.
Important caveat: populate is NOT a JOIN. It happens at the application level, not the database level. For very complex queries with multiple nested populations, consider using MongoDB's $lookup aggregation stage instead, which executes at the database level.
const mongoose = require('mongoose');
// ---- User Schema ----
const userSchema = new mongoose.Schema({
name: { type: String, required: true },
email: { type: String, required: true, unique: true },
role: { type: String, enum: ['user', 'admin'], default: 'user' },
});
const User = mongoose.model('User', userSchema);
// ---- Post Schema (with author reference) ----
const postSchema = new mongoose.Schema({
title: { type: String, required: true },
content: { type: String, required: true },
author: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User', // Reference to User model
required: true,
},
tags: [String], // Embedded array (simple values)
createdAt: { type: Date, default: Date.now },
});
const Post = mongoose.model('Post', postSchema);
// ---- Creating data ----
async function seedData() {
// Create a user
const alice = await User.create({
name: 'Alice',
email: 'alice@example.com',
});
// Create posts referencing the user
await Post.create([
{
title: 'Getting Started with MongoDB',
content: 'MongoDB is a document database...',
author: alice._id, // Store just the ObjectId
tags: ['mongodb', 'database'],
},
{
title: 'Mongoose Relationships',
content: 'Learn how to model relationships...',
author: alice._id,
tags: ['mongoose', 'relationships'],
},
]);
}
// ---- Querying with populate ----
async function getPosts() {
// WITHOUT populate — author is just an ObjectId string
const rawPosts = await Post.find();
console.log(rawPosts[0].author);
// Output: 507f1f77bcf86cd799439011 (just an ID)
// WITH populate — author is the full user object
const posts = await Post.find()
.populate('author', 'name email') // Only get name and email
.sort('-createdAt'); // Newest first
console.log(posts[0].author);
// Output: { _id: '507f...', name: 'Alice', email: 'alice@example.com' }
console.log(posts[0].author.name);
// Output: 'Alice'
return posts;
}
// ---- Finding posts by a specific user ----
async function getPostsByUser(userId) {
return Post.find({ author: userId })
.populate('author', 'name')
.sort('-createdAt');
}Here are the common relationship patterns and how to model them in MongoDB with Mongoose.
One-to-One — A user has one profile. You can embed or reference. Embed when the data is always needed together (user document includes a profile sub-document). Reference when you sometimes need just the user without the profile, or the profile is very large.
// Embedded
const userSchema = { name: String, profile: { bio: String, avatar: String } };
// Referenced
const userSchema = { name: String, profile: { type: ObjectId, ref: 'Profile' } };One-to-Many (bounded) — A user has a few addresses (max 5). Embed the addresses inside the user. The array is small and always accessed with the user.
const userSchema = { name: String, addresses: [{ street: String, city: String, zip: String }] };One-to-Many (unbounded) — A user has many posts (could be thousands). Reference the user in each post. Never embed thousands of posts inside a user document.
const postSchema = { title: String, author: { type: ObjectId, ref: 'User' } };
// Query: Post.find({ author: userId }).populate('author')Many-to-Many — Students enroll in courses; courses have many students. Store an array of references on one or both sides:
const studentSchema = { name: String, courses: [{ type: ObjectId, ref: 'Course' }] };
const courseSchema = { title: String, students: [{ type: ObjectId, ref: 'Student' }] };
// Or use a junction collection for very large datasetsAvoid deep nesting. A post with comments, each comment with replies, each reply with likes... this gets hard to query and update. Flatten when possible: make comments their own collection with a postId reference. Replies can have a parentCommentId field. This is easier to query (find all comments for a post) and update (add a like to a specific reply).
When should you embed data instead of referencing it?