Self join
Join a table to itself to read a relationship that lives inside one table.
Overview
A self join joins a table to itself. You reach for it when the rows you need to connect already live in the same table, linked by a column that points from one row to another. The employees table below is the standard case. Each row holds an id, a name, the id of that person’s manager, a department, and a salary.
| id | name | manager_id | dept | salary |
|---|---|---|---|---|
| 1 | Devin | null | Exec | 320 |
| 2 | Maya | 1 | Sales | 150 |
| 3 | Omar | 1 | Sales | 140 |
| 4 | Priya | 2 | Eng | 160 |
Look at the manager_id column. Maya’s is 1, and row 1 is Devin. Maya’s manager is not stored somewhere else. It is another row in this same table. To put every employee next to their manager, you read employees twice and line the two readings up.
How it works
A self join is an ordinary join with one difference: both sides are the same table. Everything you know about joining still holds. You match a column on one side to a column on the other, and the two rows become one. What changes is that the database reads employees twice, so each read needs a name. The aliases e and m are those names. They let you write e.name and m.name without the database asking which copy you meant.
Because it is still a join, you choose its type. Inner keeps only rows that find a match. Left keeps every row on the chosen side, even when nothing matches. That choice is not cosmetic here. It decides whether Devin, who has no manager, survives. Toggle it in the demo and watch.
select e.name as employee, m.name as manager from employees e left join employees m on e.manager_id = m.id;
| id | name | manager_id |
|---|---|---|
| 1 | Devin | null |
| 2 | Maya | 1 |
| 3 | Omar | 1 |
| 4 | Priya | 2 |
| id | name |
|---|---|
| 1 | Devin |
| 2 | Maya |
| 3 | Omar |
| 4 | Priya |
Patterns
The mechanism never changes. A table points at its own rows, and you read it twice to bring them together. Here it is in three more situations, each on a table you can see.
To pair rows that share a value, two employees in the same department, join the table to itself on that value. The condition a.id < b.id does two jobs. It stops a row from pairing with itself, and it returns each pair once instead of both Maya-Omar and Omar-Maya.
select a.name, b.name, a.dept
from employees a
join employees b
on a.dept = b.dept
and a.id < b.id;First, a new table. monthly_revenue holds one row per month with that month’s revenue. To compare each month with the one before it, join the table to itself where one month is one less than the other. A left join keeps the first month, which has nothing before it, with a null change.
select cur.month, cur.revenue,
cur.revenue - prev.revenue as change
from monthly_revenue cur
left join monthly_revenue prev
on prev.month = cur.month - 1;A match does not have to be equality. To count, for each person, how many colleagues earn more, join the table to itself where one salary is higher, then count: on h.salary > e.salary. This is the most expensive shape, for the reason Performance explains.
Trade-offs
The neighbor query has a shorter form: revenue - lag(revenue) over (order by month). So when do you use which? Use a window function when you step along an order: the previous row, a running total, a rank. Use a self join when the relationship is not a simple offset: a key, a range, or any condition a window cannot express. Choosing the right tool is the skill. The syntax is not.
Pitfalls
An inner join drops every row whose pointer is null. Here that is Devin, the person at the top. The query returns rows, it looks correct, and the boss is simply gone. When you want the whole set, use a left join. This is the most common self-join bug, which is why it is the one to watch.
- It reaches one level. A self join gives you a person and their manager, not their manager’s manager. A hierarchy of unknown depth needs a recursive CTE.
- Drop the condition and you get a cross join. With no on clause, every row pairs with every row. Ten thousand rows become a hundred million.
- Name collisions need an alias. Reference a shared column without one and the database stops with an ambiguous column error. It cannot know which copy you meant.
Performance
A self join reads the table twice, so index the column you join on. Comparison joins, like the salary count, can match every row against every other, which grows with the square of the row count. Reach for them deliberately.
Practice
Recap
- A self join joins a table to itself, with an alias for each copy.
- Match the columns that link the rows, then take one from each side.
- Use a left join to keep rows whose pointer is null.
- Add
a.id < b.idto find pairs without self-matches or duplicates. - It reaches one level. Deeper hierarchies need a recursive CTE.
- To step along an order, a window function is usually simpler.