Only SELECT the columns you need. Only JOIN the tables you need.

Before I start writing a query I like to define the columns that I absolutely need, and then I build the rest of my query off of this. This means you should never, ever, ever use SELECT *.

Using SELECT * can be useful for seeing what is in a table, however, it should not be used when writing your “real” queries. Using SELECT * makes it difficult to see what columns you actually need, and as you start to build onto your queries it makes things more complicated. It also requires the database server to return more data than what you actually need, which is wasteful. For example, take the following query.

-- Bad, lets not use SELECT *
SELECT *
FROM students
JOIN matviews.ss_cube USING (student_id)
JOIN users USING (user_id)
WHERE academic_year = 2017
LIMIT 10

This will return you multiple first_name’s, one for students and one for users. Which one do you actually want?

Instead, lets start with the columns we actually want, and build off of that. Lets say we wanted to get a list of students and their teachers for today. So I would think we would want these columns:

student id
student first name
student last name
teacher id
teacher first name
teacher last name

So lets start with our columns

SELECT
  students.student_id,
  students.first_name AS student_first_name,
  students.last_name AS student_last_name,
  users.user_id,
  users.first_name AS user_first_name,
  users.last_name AS user_last_name
FROM ?

Next we need to figure out where to get this data. Well I know that I want rosters, so I am going to use matviews.ss_cube which has a student_id and a user_id. So lets add that in with our filter for current date.

SELECT
  students.student_id,
  students.first_name AS student_first_name,
  students.last_name AS student_last_name,
  users.user_id,
  users.first_name AS user_first_name,
  users.last_name AS user_last_name
FROM matviews.ss_cube
?
WHERE ss_cube.academic_year = 2017
  AND current_date BETWEEN entry_date AND leave_date
LIMIT 1000

Now lets JOIN only the tables that we need. And don’t forget our timeout and application name.

SET application_name TO 'zrankin via pgAdmin, experimenting with queries';
SET statement_timeout TO '1000';

SELECT
  students.student_id,
  students.first_name AS student_first_name,
  students.last_name AS student_last_name,
  users.user_id,
  users.first_name AS user_first_name,
  users.last_name AS user_last_name
FROM matviews.ss_cube
JOIN students ON students.student_id = ss_cube.student_id
JOIN users ON users.user_id = ss_cube.user_id
WHERE ss_cube.academic_year = 2017
  AND current_date BETWEEN entry_date AND leave_date
ORDER BY students.last_name,
  students.first_name,
  users.last_name,
  users.first_name
LIMIT 1000