Advanced25 min read

Mongoose Relationships & Populate

Model relationships between data — references, populate, embedded documents, and when to use each approach.

Relationships in MongoDB

In relational databases like PostgreSQL or MySQL, relationships between tables are handled with JOINs. You have a users table and a posts table, and they are linked by a foreign key (posts.user_id references users.id). When you need a post along with its author's name, you JOIN the two tables in a single SQL query.

MongoDB takes a fundamentally different approach. There are no JOINs in the traditional sense (though MongoDB added $lookup as an aggregation stage, which functions similarly). Instead, MongoDB offers two primary strategies for modeling relationships:

1. Embedding (denormalization) — Store the related data directly inside the parent document as a nested object or array. A user document might contain an addresses array with full address objects nested inside it. No second query needed.

2. Referencing (normalization) — Store just the ObjectId of the related document. A post document contains an author field with the ObjectId of the user who wrote it. To get the author's details, you need a second query (or use Mongoose's populate() method).

A common misconception is that MongoDB is "anti-relational" — that it cannot handle related data. This is false. MongoDB handles relationships well; it just gives you different tools and trade-offs compared to SQL databases. The key skill is knowing when to embed and when to reference.

MongoDB documents have a 16MB size limit. This is rarely an issue for references (ObjectIds are tiny), but it matters for embedding. If you embed thousands of large objects inside a single document, you could approach this limit. This is one reason unbounded arrays of embedded documents are discouraged.

Embedding vs Referencing

Choosing between embedding and referencing is one of the most important decisions in MongoDB schema design. Here are the guidelines used in production applications.

Embed when:

  • The data is always accessed together. If you always need a user's addresses when you fetch the user, embed the addresses inside the user document. One query gets everything.
  • The child data does not exist independently. An address has no meaning outside the context of its user. A product option (size, color) has no meaning outside its product.
  • The data is small and bounded. A user has at most 5 addresses. A product has at most 10 options. You know the array will not grow indefinitely.
  • You need atomic updates. Updating a parent and its embedded children happens in a single document write, which MongoDB guarantees to be atomic. No transactions needed.

Reference when:

  • The data is shared across multiple documents. A category that applies to thousands of products should be its own document, referenced by each product. Embedding the same category data in every product wastes space and makes updates painful.
  • The data is large or unbounded. A user might have 10,000 posts. Embedding all of them inside the user document is a terrible idea — it approaches the 16MB limit and makes every user query slow.
  • The data is accessed independently. Posts are often queried on their own (recent posts, popular posts, posts by category) — they need to be their own collection.
  • You need many-to-many relationships. Students and courses: a student enrolls in many courses, and a course has many students. This requires references on both sides.

Rule of thumb: Embed for tight, bounded coupling. Reference for loose, unbounded coupling. When in doubt, start with references — they are easier to change later. Moving from embedded to referenced is harder than the reverse.

References with populate()

Mongoose's populate() method is the primary tool for working with references. It automatically replaces an ObjectId with the actual document it points to.

Setting up references in your schema:

javascript
const postSchema = new mongoose.Schema({
  title: String,
  content: String,
  author: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'User'  // tells Mongoose which model to use for populate()
  },
  comments: [{
    type: mongoose.Schema.Types.ObjectId,
    ref: 'Comment'
  }]
});

The ref property is a string matching the model name you used in mongoose.model('User', userSchema). It tells Mongoose where to look when you call populate().

Using populate():

javascript
// Basic populate — replaces author ObjectId with full user document
const post = await Post.findById(postId).populate('author');
// post.author is now { _id: '...', name: 'Alice', email: 'alice@example.com', ... }

// Select specific fields — only get name and email, not the entire user document
const post = await Post.findById(postId).populate('author', 'name email');
// post.author is now { _id: '...', name: 'Alice', email: 'alice@example.com' }

// Populate multiple fields
const post = await Post.findById(postId)
  .populate('author', 'name')
  .populate('comments');

// Nested populate — populate the author of each comment
const post = await Post.findById(postId)
  .populate({
    path: 'comments',
    populate: { path: 'author', select: 'name' }
  });

Under the hood, populate() executes an additional query for each populated field. Post.findById(id).populate('author') runs two queries: one to find the post, and one to find the user with the matching _id. For arrays of references (like comments), Mongoose uses a single $in query to fetch all referenced documents at once, which is efficient.

Important caveat: populate is NOT a JOIN. It happens at the application level, not the database level. For very complex queries with multiple nested populations, consider using MongoDB's $lookup aggregation stage instead, which executes at the database level.

Complete Example: Posts with Authors

javascript
const mongoose = require('mongoose');

// ---- User Schema ----
const userSchema = new mongoose.Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true },
  role: { type: String, enum: ['user', 'admin'], default: 'user' },
});
const User = mongoose.model('User', userSchema);

// ---- Post Schema (with author reference) ----
const postSchema = new mongoose.Schema({
  title: { type: String, required: true },
  content: { type: String, required: true },
  author: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'User',         // Reference to User model
    required: true,
  },
  tags: [String],         // Embedded array (simple values)
  createdAt: { type: Date, default: Date.now },
});
const Post = mongoose.model('Post', postSchema);

// ---- Creating data ----
async function seedData() {
  // Create a user
  const alice = await User.create({
    name: 'Alice',
    email: 'alice@example.com',
  });

  // Create posts referencing the user
  await Post.create([
    {
      title: 'Getting Started with MongoDB',
      content: 'MongoDB is a document database...',
      author: alice._id,  // Store just the ObjectId
      tags: ['mongodb', 'database'],
    },
    {
      title: 'Mongoose Relationships',
      content: 'Learn how to model relationships...',
      author: alice._id,
      tags: ['mongoose', 'relationships'],
    },
  ]);
}

// ---- Querying with populate ----
async function getPosts() {
  // WITHOUT populate — author is just an ObjectId string
  const rawPosts = await Post.find();
  console.log(rawPosts[0].author);
  // Output: 507f1f77bcf86cd799439011 (just an ID)

  // WITH populate — author is the full user object
  const posts = await Post.find()
    .populate('author', 'name email')  // Only get name and email
    .sort('-createdAt');                // Newest first
  
  console.log(posts[0].author);
  // Output: { _id: '507f...', name: 'Alice', email: 'alice@example.com' }

  console.log(posts[0].author.name);
  // Output: 'Alice'

  return posts;
}

// ---- Finding posts by a specific user ----
async function getPostsByUser(userId) {
  return Post.find({ author: userId })
    .populate('author', 'name')
    .sort('-createdAt');
}

Relationship Patterns

Here are the common relationship patterns and how to model them in MongoDB with Mongoose.

One-to-One — A user has one profile. You can embed or reference. Embed when the data is always needed together (user document includes a profile sub-document). Reference when you sometimes need just the user without the profile, or the profile is very large.

javascript
// Embedded
const userSchema = { name: String, profile: { bio: String, avatar: String } };
// Referenced
const userSchema = { name: String, profile: { type: ObjectId, ref: 'Profile' } };

One-to-Many (bounded) — A user has a few addresses (max 5). Embed the addresses inside the user. The array is small and always accessed with the user.

javascript
const userSchema = { name: String, addresses: [{ street: String, city: String, zip: String }] };

One-to-Many (unbounded) — A user has many posts (could be thousands). Reference the user in each post. Never embed thousands of posts inside a user document.

javascript
const postSchema = { title: String, author: { type: ObjectId, ref: 'User' } };
// Query: Post.find({ author: userId }).populate('author')

Many-to-Many — Students enroll in courses; courses have many students. Store an array of references on one or both sides:

javascript
const studentSchema = { name: String, courses: [{ type: ObjectId, ref: 'Course' }] };
const courseSchema = { title: String, students: [{ type: ObjectId, ref: 'Student' }] };
// Or use a junction collection for very large datasets

Avoid deep nesting. A post with comments, each comment with replies, each reply with likes... this gets hard to query and update. Flatten when possible: make comments their own collection with a postId reference. Replies can have a parentCommentId field. This is easier to query (find all comments for a post) and update (add a like to a specific reply).

When should you embed data instead of referencing it?

Ready to practice?

Create your free account to access the interactive code editor, run challenges, and track your progress.