Queries should be easy to understand, easy to change, and easy to reuse. A good way to accomplish these three things is to

“avoid joining on a table that you are not selecting from” or … “do not use JOIN when you mean EXISTS or IN”

Let me give you some examples. First lets get a list of students.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
LIMIT 100

Now lets filter for students who are enrolled & rostered in 2017.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
JOIN matviews.ss_cube USING (student_id)
WHERE ss_cube.academic_year = 2017
GROUP BY students.local_student_id,
  students.first_name,
  students.last_name
LIMIT 100

Notice we are JOINing matviews.ss_cube but we are not SELECTing any data from it. Also notice that we need a GROUP BY (or DISTINCT) to make sure we only get one row of data per student. This breaks our guideline. Lets correct it.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
WHERE students.student_id IN (
    SELECT ss_cube.student_id
    FROM matviews.ss_cube
    WHERE academic_year = 2017
  )
LIMIT 100

The new query is simpler and we are able to re-use that sub-query in another query if we wish. Now lets see how it is also easier to change.

Lets update the queries to filter all students by a student program.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
JOIN matviews.ss_cube USING (student_id)
JOIN student_program_aff AS spa USING (student_id)
WHERE ss_cube.academic_year = 2017
  AND spa.student_program_id = 28
GROUP BY students.local_student_id,
  students.first_name,
  students.last_name
LIMIT 100

Notice again we are not SELECT’ing any columns from student_program_aff. Also, this is JOIN’ing every record of matviews.ss_cube to student_program_aff, which could have some serious performance issues. Lets fix it.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name
FROM students
WHERE students.student_id IN (
    SELECT ss_cube.student_id
    FROM matviews.ss_cube
    WHERE ss_cube.academic_year = 2017
  ) AND students.student_id IN (
    SELECT spa.student_id
    FROM student_program_aff AS spa
    WHERE spa.student_program_id = 28
  )
LIMIT 100

Notice we now have another sub-query we can reuse elsewhere, the query is easier to understand, and is still simple to extend.

Next, lets take things one step further and add the each students primary ethnicity.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name,
  eth.code_translation AS primary_ethnicity
FROM students
JOIN matviews.ss_cube USING (student_id)
JOIN student_program_aff AS spa USING (student_id)
JOIN codes.ethnicity AS eth ON eth.code_id = students.primary_ethnicity
WHERE ss_cube.academic_year = 2017
  AND spa.student_program_id = 28
GROUP BY students.local_student_id,
  students.first_name,
  students.last_name,
  eth.code_translation
LIMIT 100

This time the JOIN is within the guideline because we are selecting the code_translation, however, notice we had to add the column to the GROUP BY because we are JOIN’ing everything. It is easy to forget this and introduce a bug. Lets do this better.

SELECT
  students.local_student_id,
  students.first_name,
  students.last_name,
  eth.code_translation AS primary_ethnicity
FROM students
JOIN codes.ethnicity AS eth ON eth.code_id = students.primary_ethnicity
WHERE students.student_id IN (
    SELECT ss_cube.student_id
    FROM matviews.ss_cube
    WHERE ss_cube.academic_year = 2017
  ) AND students.student_id IN (
    SELECT spa.student_id
    FROM student_program_aff AS spa
    WHERE spa.student_program_id = 28
  )
LIMIT 100

So the next time you go to JOIN a table, make sure that you are SELECT’ing a column from that table, otherwise maybe you could use an IN or EXISTS.