DISTINCT Stinks
DISTINCT
(and GROUP BY
) is a powerful and useful feature in Postgres, however, it is often misused. I have seen many many queries over the years that use DISTINCT
as a quick hack to get the data that is required, when it is not needed at all. According to Postgres
If SELECT DISTINCT is specified, all duplicate rows are removed from the result set
Seems pretty awesome, so how can this be misused you ask? DISTINCT
probably get misused when we JOIN on tables unnecessarily (JOIN only for SELECT). Take this query for example.
SELECT DISTINCT
students.local_student_id,
students.first_name,
students.last_name
FROM students
JOIN matviews.ss_cube USING (student_id)
WHERE ss_cube.academic_year = 2017
LIMIT 100
You can eliminate the DISTINCT if you move the filter on ss_cube to an IN.
SELECT
students.local_student_id,
students.first_name,
students.last_name
FROM students
WHERE students.student_id IN (
SELECT ss_cube.student_id
FROM matviews.ss_cube
WHERE ss_cube.academic_year = 2017
)
LIMIT 100
So when using DISTINCT
or GROUP BY
make sure you look for ways to eliminate them, because they are not always necessary.