Django select_related vs prefetch_related: The Visual Guide

Django select_related vs prefetch_related: The Visual Guide

I was pair-programming with a junior developer on our team. She had a view that listed blog posts with their authors and tags. Simple page — title, author name, list of tags. About 50 posts on the page.

She opened Django Debug Toolbar and froze. “Why is this page making 152 database queries?”

One query for the posts. Fifty queries to fetch each post’s author. Fifty more to fetch each post’s tags. Plus a couple extra for pagination. The classic N+1 problem, hiding behind Django’s lazily-loaded relationships.

“Just add select_related ," someone on the team said.

“Wait — which one?” she asked. “There’s also prefetch_related . What's the difference?"

I realized I’d never actually seen anyone explain the difference visually — showing what SQL each one produces, what data flows where, and why picking the wrong one doesn’t just fail to help but can actually make things worse. So I drew it on a whiteboard. That whiteboard sketch eventually became this article.

The Setup: Models We’ll Use

Let’s use a real-world example that hits both relationship types. A blog with posts, authors, and tags:

class Author(models.Model):
    name = models.CharField(max_length=100)
    email = models.EmailField()


class Tag(models.Model):
    name = models.CharField(max_length=50)


class Post(models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()
    author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name='posts')
    tags = models.ManyToManyField(Tag, related_name='posts')
    created_at = models.DateTimeField(auto_now_add=True)
Two relationships here:
author is a ForeignKey — each post has exactly one author
tags is a ManyToManyField — each post can have multiple tags, each tag can be on multiple posts

This distinction is everything. It determines which method you use.

The N+1 Problem: What Happens Without Either

N+1 Problem - Query Explosion N+1 Problem - Query Explosion

# views.py
def post_list(request):
    posts = Post.objects.all()[:50]
    for post in posts:
        print(post.author.name)         # Hits the database
        print(list(post.tags.all()))    # Hits the database again

Here’s what Django does behind the scenes:

-- Query 1: Get all posts
SELECT * FROM blog_post LIMIT 50;

-- Query 2: Get author for post 1
SELECT * FROM blog_author WHERE id = 3;

-- Query 3: Get author for post 2
SELECT * FROM blog_author WHERE id = 7;

-- ... 48 more author queries ...

-- Query 52: Get tags for post 1
SELECT tag.* FROM blog_tag
  INNER JOIN blog_post_tags ON tag.id = blog_post_tags.tag_id
  WHERE blog_post_tags.post_id = 1;

-- Query 53: Get tags for post 2
SELECT tag.* FROM blog_tag
  INNER JOIN blog_post_tags ON tag.id = blog_post_tags.tag_id
  WHERE blog_post_tags.post_id = 2;

-- ... 48 more tag queries ...

Total: 101 queries. One for posts, fifty for authors, fifty for tags. Each author and tag fetch is a separate round trip to the database. On a page with 50 posts, this takes about 120ms of pure database time — most of it wasted on connection overhead, not actual data retrieval.

Further reading: How detect N+1 problem

This is the N+1 problem. One query to get N items, then N more queries to get each item’s related data. Django does this because querysets are lazy — related objects aren’t loaded until you access them. It’s a reasonable default for cases where you don’t always need the related data, but it’s devastating when you do.

Django select_related — JOIN Path Django select_related — JOIN Path

posts = Post.objects.select_related('author').all()[:50]

Here’s the SQL:

SELECT
    blog_post.id,
    blog_post.title,
    blog_post.content,
    blog_post.author_id,
    blog_post.created_at,
    blog_author.id,
    blog_author.name,
    blog_author.email
FROM blog_post
INNER JOIN blog_author ON blog_post.author_id = blog_author.id
LIMIT 50;

Further reading: more select_related examples

One query. Django joins the post and author tables, pulls all the columns from both, and creates fully populated Post objects with their Author objects already attached. When you access post.author.name, there's no database hit.
— the data is already in memory.

What it looks like conceptually:

Database sends back ONE result set:
┌─────────────────────────────────────────────────────┐
 post.id  post.title  author.id  author.name      
├─────────┼────────────┼───────────┼──────────────────┤
 1        "Hello"     3          "Alice"          
 2        "World"     7          "Bob"            
 3        "Django"    3          "Alice"          
└─────────────────────────────────────────────────────┘

Django creates Post objects with Author already attached:
Post(id=1, title="Hello", author=Author(id=3, name="Alice"))
Post(id=2, title="World", author=Author(id=7, name="Bob"))
Post(id=3, title="Django", author=Author(id=3, name="Alice"))

50 queries → 1 query. That’s the magic.

When Django select_related Works

select_related only works on relationships that point to a single object:

It works by creating a SQL INNTER JOIN , which means the result set has one row per parent object. This is efficient because the result set stays the same size as the original query.

SQL INNER JOIN SQL INNER JOIN

When select_related Breaks

You cannot use select_related for:

  • ManyToManyFieldpost.tags (one post → many tags)
  • Reverse ForeignKeyauthor.posts (one author → many posts)

Why? Because a JOIN on a many-relationship multiplies the rows. If post 1 has 3 tags, the JOIN would create 3 rows for that one post. With 50 posts averaging 4 tags each, your 50-row result becomes 200 rows — with the post data duplicated in every row. The more tags per post, the worse it gets.

Django’s documentation says exactly this: to avoid the much larger result set from joining across a ‘many’ relationship, select_related is limited to single-valued relationships.

Further reading: Common Mistakes With select_related

Django prefetch_related Django prefetch_related

posts = Post.objects.prefetch_related('tags').all()[:50]

Here’s the SQL:

-- Query 1: Get all posts
SELECT * FROM blog_post LIMIT 50;

-- Query 2: Get ALL tags for ALL those posts in one shot
SELECT
    blog_tag.id,
    blog_tag.name,
    blog_post_tags.post_id
FROM blog_tag
INNER JOIN blog_post_tags ON blog_tag.id = blog_post_tags.tag_id
WHERE blog_post_tags.post_id IN (1, 2, 3, 4, ... 50);

Two queries. Not 51. Not 101. Just 2.

  • The first query gets the posts.
  • The second query gets ALL tags for ALL those posts at once, using an IN clause with the post IDs.
  • Django then matches them up in Python: "tag X belongs to post 1 and post 3, tag Y belongs to post 2" — and attaches them to the correct objects.

What it looks like conceptually:

Query 1 returns posts:
┌─────────┬────────────┐
 post.id  post.title 
├─────────┼────────────┤
 1        "Hello"    
 2        "World"    
 3        "Django"   
└─────────┴────────────┘

Query 2 returns ALL tags for those posts:
┌─────────┬─────────────┬──────────┐
 tag.id   tag.name     post_id  
├─────────┼─────────────┼──────────┤
 10       "python"     1        
 11       "tutorial"   1        
 10       "python"     2        
 12       "django"     3        
 10       "python"     3        
└─────────┴─────────────┴──────────┘

Django matches them in Python:
Post 1  [Tag("python"), Tag("tutorial")]
Post 2  [Tag("python")]
Post 3  [Tag("django"), Tag("python")]

When prefetch_related Works

prefetch_related works on any relationship:
ManyToManyFieldpost.tags
Reverse ForeignKeyauthor.posts (all posts by an author)
It can even be used for ForeignKey, though select_related is more efficient for that

prefetch_related - The Subtle Trap

Here’s something that catches experienced developers. prefetch_related caches the results and attaches them to the parent objects. But if you filter the related queryset after prefetching, Django ignores the cache and runs a new query:

posts = Post.objects.prefetch_related('tags').all()

for post in posts:
    # This uses the prefetch cache — no extra query
    all_tags = post.tags.all()

    # This IGNORES the prefetch cache — new query per post!
    python_tags = post.tags.filter(name='python')

The .filter() on the prefetched relationship breaks the cache because Django can't know in advance what filters you'll apply. If you need filtered prefetches, use the Prefetch object :

from django.db.models import Prefetch

posts = Post.objects.prefetch_related(
    Prefetch(
        'tags',
        queryset=Tag.objects.filter(name='python'),
        to_attr='python_tags'    # Stores result in a new attribute
    )
).all()

for post in posts:
    # This uses the prefetch — no extra query
    print(post.python_tags)

Further reading: Prefetch Objects: Fine-Grained Control (with examples)

The Prefetch object lets you customize the query used for prefetching. The to_attr parameter stores the result in a list attribute instead of overriding the manager, so you can even have multiple prefetches for the same relationship with different filters.

django select_related vs prefetch_related When to Use Which django select_related vs prefetch_related When to Use Which

Combining Both—The Real-World Pattern

select_related combined with prefetech_related select_related combined with prefetech_related

In practice, you almost always need both. Our blog post list needs the author (ForeignKey) and the tags (ManyToMany):

posts = (
    Post.objects
    .select_related('author')           # JOIN for the FK
    .prefetch_related('tags')            # Separate query for M2M
    .filter(is_published=True)
    .order_by('-created_at')[:50]
)
Total queries: 2.
One joined query for posts + authors.
One batch query for all tags.
Down from 101.

For more complex scenarios, you can chain multiple levels:

# A dashboard showing orders with customer info and order items with products
orders = (
    Order.objects
    .select_related(
        'customer',                      # FK: order → customer
        'customer__company',             # FK: customer → company (chained)
    )
    .prefetch_related(
        'items',                         # Reverse FK: order → items
        'items__product',                # FK on the prefetched items
        Prefetch(
            'items__product__reviews',
            queryset=Review.objects.filter(rating__gte=4),
            to_attr='top_reviews'
        ),
    )
)

select_related follows FK chains with double underscore notation . prefetch_related can also follow chains, and you can mix in Prefetch objects for filtered lookups.

Further reading: more complex examples

The Cheat Sheet

Here’s the decision tree I keep in my head:
Is the relationship a ForeignKey or OneToOneField? → Use select_related. It creates a JOIN and fetches everything in one query.
Is the relationship a ManyToManyField or reverse ForeignKey? → Use prefetch_related. It runs a separate batch query with an IN clause.
Do you need to filter the related objects? → Use Prefetch() object inside prefetch_related with a custom queryset.
Not sure which one to use?prefetch_related works for everything. It's slightly less efficient than select_related for ForeignKey lookups (2 queries instead of 1), but it never gives wrong results. When in doubt, prefetch.

Feature select_related prefetch_related
Primary Use <strong>ForeignKey</strong>, <strong>OneToOne</strong> <strong>ManyToManyField</strong>, <strong>Reverse ForeignKey</strong>
SQL Strategy Single query with <strong>JOIN</strong> Multiple separate queries
Where Join Occurs Database level Application level (Python)
Query Count Always 1 query 2 or more queries

The Performance Numbers

I benchmarked all three approaches on our blog model with 500 posts, each having 1 author and 3–5 tags:

From 1,501 queries and 340ms to 2 queries and 12ms. That’s a 28x speedup from adding two method calls to the queryset. This is the highest-leverage optimization available in Django — nothing else comes close for the effort involved.

Bottom Line

Django’s ORM is lazy by default — it doesn’t fetch related data until you ask for it. This is smart for most cases but devastating when you loop through a queryset and access related objects on every iteration.

The fix is two methods: select_related for single-object relationships (ForeignKey, OneToOne) and prefetch_related for multi-object relationships (ManyToMany, reverse FK). Together, they can turn 1,500 queries into 2.

My rule for every queryset that touches a template or serializer: before you pass it to the response, check every relationship you access in the loop and make sure it’s either selected or prefetched. Django Debug Toolbar makes this trivial — if you see more than 10 queries on a list page, something needs optimizing.

That junior developer on my team? She added two method calls and her 152-query page dropped to 3. She pushed it, and our DBA sent a thank-you message in Slack. That’s the kind of fix that makes your whole day.

What’s the highest query count you’ve ever seen on a single Django page? I’ve personally witnessed 4,300 on an admin view with nested inlines. Drop your record in the comments — I know someone has a worse story.

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus