DISTINCT Stinks

DISTINCT (and GROUP BY) is a powerful and useful feature in Postgres, however, it is often misused. I have seen many many queries over the years that use DISTINCT as a quick hack to get the data that is required, when it is not needed at all. According to Postgres

If SELECT DISTINCT is specified, all duplicate rows are removed from the result set

Seems pretty awesome, so how can this be misused you ask? DISTINCT probably get misused when we JOIN on tables unnecessarily (JOIN only for SELECT). Take this query for example.

SELECT DISTINCT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
JOIN matviews.ss_cube USING (student_id)
WHERE ss_cube.academic_year = 2017
LIMIT 100

You can eliminate the DISTINCT if you move the filter on ss_cube to an IN.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
WHERE students.student_id IN (
  SELECT ss_cube.student_id
  FROM matviews.ss_cube
  WHERE ss_cube.academic_year = 2017
)
LIMIT 100

So when using DISTINCT or GROUP BY make sure you look for ways to eliminate them, because they are not always necessary.