Let’s start the journey of learning SQL queries by learning about CRUD Operations. Okay, tell me one thing. Let’s say there is a table in which we are storing information about students. What can we do in that table or its entries?
Correct. Primarily, on any entity stored in a table, there are 4 operations possible:
- Create (or insert a new entry)
- Read (fetching some entries)
- Update (updating information about an entry already stored)
- Delete (deleting an entry)
For learning CRUD operations, we will use Sakila database.
Today we are going to discuss about these operations in detail. Understand that read queries can get a lot more complex, involving aggregate functions, subqueries etc.
What is the Sakila Database
Let me give you all a brief idea about what the Sakila database represents so that it is easy to relate to the conversations that we shall have around this over the coming weeks. The Sakila database represents a digital video rental store, assuming an old movie rental store before Netflix etc came. It’s designed with functionality that would allow for all the operations of such a business, including transactions like renting films, managing inventory, and storing customer and staff information. For example, it has tables regarding films, actors, customers, staff, stores, payments etc.
Create
CREATE Query
First of all, what is SQL? SQL stands for Structured Query Language. It is a language used to interact with relational databases. It allows you to create tables, fetch data from them, update data, manage user permissions etc. Why “Structured Query” because it allows to query over data arranged in a structured way. Eg: In Relational databases, data is structured into tables.
A simple query to create a table in MySQL has:
- column names
- data type of column (integer, varchar, boolean, date, timestamp)
- properties of column (unique, not null, default)
Similarly, the table could also have properties (primary key, key)
CREATE TABLE students (
id INT AUTO_INCREMENT,
firstName VARCHAR(50) NOT NULL,
lastName VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
dateOfBirth DATE NOT NULL,
enrollmentDate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
psp DECIMAL(3, 2) CHECK (psp BETWEEN 0.00 AND 100.00),
batchId INT,
isActive BOOLEAN DEFAULT TRUE,
PRIMARY KEY (id),
);
Here we are creating a table called students. Inside brackets, we mention the different columns that this table has. Along with each column, we mention the data type of that column. E.g.: firstName is of type VARCHAR(50). After the data type, we mention any constraints on that column. Eg: NOT NULL means that this column cannot be null. if we try to not put a value in this column, we will get an error. UNIQUE means that this column cannot have duplicate values. If we insert a new row in a table or update an existing row that leads to 2 rows having the same value of this column, the query will fail and we will get an error. DEFAULT specifies that if no value is provided for this column, it will take the given value. For example, for enrollmentDate, it will take the value of current_timestamp, which is the time when you are inserting the row. CHECK (PSP BETWEEN 0.00 AND 100.00) means that the value of this column should be between 0.00 and 100.00. If some other value is put, the query will fail.
INSERT Query
Now let’s start with the first set of operations for the day: The Create Operation. As the name suggests, this operation is used to create new entries in a table. Let’s say we want to add a new film to the database. How do we do that?
INSERT
statement in MySQL is used to insert new entries in a table. Let’s see how we can use it to insert a new film in the film table of the Sakila database.
INSERT INTO film (title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features)
VALUES ('The Dark Knight', 'Batman fights the Joker', 2008, 1, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers'),
('The Dark Knight Rises', 'Batman fights Bane', 2012, 1, 3, 4.99, 165, 19.99, 'PG-13', 'Trailers'),
('The Dark Knight Returns', 'Batman fights Superman', 2016, 1, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers');
Let’s dive through the syntax of the query. First, we have the INSERT INTO
clause, which is used to specify the table in which we want to insert the new entry. Then we have the column names in the brackets, which are the columns in which we want to insert the values. Then we have the VALUES
clause, which is used to specify the values that we want to insert in the columns. The values are specified in the same order as the columns are specified in the INSERT INTO
clause. So the first value in the VALUES
clause will be inserted in the first column specified in the INSERT INTO
clause, and so on.
Anyways, an example of a query without column names is as follows:
INSERT INTO film
VALUES (default, 'The Dark Knight', 'Batman fights the Joker', 2008, 1, NULL, 3, 4.99, 152, 19.99, 'PG-13', 'Trailers', default);
NULL is used to specify that the value of that column should be NULL, and default is used to specify that the value of that column should be the default value specified for that column. Example: film_id is an auto-increment column, so we don’t need to specify its value. So we can specify a default for that column, which will insert the next auto-increment value in that column.
Read
SELECT
statement is used to read data from a table. Let’s see how we can use it to read data via different queries on the film table of the Sakila database. A basic select query is as follows:
SELECT * FROM film;
Here we are selecting all the columns from the film table. The * is used to select all the columns. This query will give you the value of each column in each row of the film table. If we want to select only specific columns, then we can specify the column names instead of *. Example:
SELECT title, description, release_year FROM film;
Here we are selecting only the title, description and release_year columns from the film table. Note that the column names are separated by commas. Also, the column names are case-insensitive, so the title and TITLE are the same. Example following query would have also given the same result
SELECT TITLE, DESCRIPTION, RELEASE_YEAR FROM film;
Now, let’s learn some nuances about the SELECT
statement.
Selecting Distinct Values
Let’s say we want to select all the distinct values of the rating
column from the film
table. How do we do that? We can use the DISTINCT
keyword to select distinct values. Example:
SELECT DISTINCT rating FROM film;
This query will give you all the distinct values of the rating
column from the film
table. Note that the DISTINCT
keyword, as all other keywords in MySQL, is case-insensitive, so DISTINCT
and distinct
are the same.
We can also use the DISTINCT
keyword with multiple columns. Example:
SELECT DISTINCT rating, release_year FROM film;
This query will give you all the distinct values of the rating
and release_year
columns from the film
table.
Let’s try to understand the above query with a pseudo code. The pseudo-code for the above query would be as follows:
answer = []
for each row in film:
answer.append(row)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
unique_answer = set(filtered_answer)
return unique_answer
So what you see is that the DISTINCT keyword on multiple columns gives you for all of the rows in the table, the distinct value of a pair of these columns.
Select a statement to print a constant value
Let’s say we want to print a constant value in the output. Eg: The first program that almost every programmer writes: “Hello World”. How do we do that? We can use the SELECT
statement to print a constant value. Example:
SELECT 'Hello World';
SELECT title, 'Hello World' FROM film;
That’s it. No form, nothing. Just the value. You can also combine it with other columns. Example:
Operations on Columns
Let’s say we want to select the title
and length
columns from the film
table. If you see, the value of length is currently in minutes, but we want to select the length in hours instead of minutes. How do we do that? We can use the SELECT
statement to perform operations on columns. Example:
SELECT title, length/60 FROM film;
Later in the course, we will learn about built-in functions in SQL as well. You can use those functions as well to perform operations on columns. Example:
SELECT title, ROUND(length/60) FROM film;
The ROUND function is used to round off a number to the nearest integer. So the above query will give you the title of the film, and the length of the film in hours, rounded off to the nearest integer.
Inserting Data from Another Table
BTW, select can also be used to insert data in a table. Let’s say we want to insert all the films from the film
table into the film_copy
table. We can combine the SELECT
and INSERT INTO
statements to do that. Example:
INSERT INTO film_copy (title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features)
SELECT title, description, release_year, language_id, rental_duration, rental_rate, length, replacement_cost, rating, special_features
FROM film;
Here we are using the SELECT
statement to select all the columns from the film
table, and then using the INSERT INTO
statement to insert the selected data into the film_copy
table. Note that the column names in the INSERT INTO
clause and the SELECT
clause are the same, and the values are inserted in the same order as the columns are specified in the INSERT INTO
clause. So the first value in the SELECT
clause will be inserted in the first column specified in the INSERT INTO
clause, and so on.
Till now, we have been doing basic read operations. SELECT
query with only a FROM
a clause is rarely sufficient. Rarely do we want to return all rows. Often we need to have some kind of filtering logic etc. for the rows that should be returned. Let’s learn how to do that.
WHERE Clause
Let’s say we want to select all the films from the film
table which has a rating of PG-13
. How do we do that? We can use the WHERE
clause to filter rows based on a condition. Example:
SELECT * FROM film WHERE rating = 'PG-13';
Here we are using the WHERE
clause to filter rows based on the condition that the value of the rating
column should be PG-13
. Note that the WHERE
clause is always used after the FROM
clause. In terms of pseudocode, you can think of where clause to work as follows:
answer = []
for each row in film:
if row.matches(conditions in where clause) # new line from above
answer.append(row)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
unique_answer = set(filtered_answer) # assuming we also had DISTINCT
return unique_answer
If you seem where clause can be considered analgous to if
in a programming language. With if as well there are many other operators that are used, right. Can you name which operators do we often use in programming languages with if
?
AND, OR, NOT
Correct. We use things like and
, or
, !
in programming languages to combine multiple conditions. Similarly, we can use AND
, OR
, NOT
operators in SQL as well. Example: We want to get all the films from the film
table which has a rating of PG-13
and a release year of 2006
. We can use the AND
operator to combine multiple conditions.
SELECT * FROM film WHERE rating = 'PG-13' AND release_year = 2006;
Similarly, we can use the OR
operator to combine multiple conditions. Example: We want to get all the films from the film
table which has a rating of PG-13
or a release year of 2006
. We can use the OR
operator to combine multiple conditions.
SELECT * FROM film WHERE rating = 'PG-13' OR release_year = 2006;
Similarly, we can use the NOT
operator to negate a condition. Example: We want to get all the films from the film
table which does not have a rating of PG-13
. We can use the NOT
operator to negate the condition.
SELECT * FROM film WHERE NOT rating = 'PG-13';
A piece of advice on using these operators. If you are using multiple operators, it is always a good idea to use parentheses to make your query more readable. Otherwise, it can be difficult to understand the order in which the operators will be evaluated. Example:
SELECT * FROM film WHERE rating = 'PG-13' OR release_year = 2006 AND rental_rate = 0.99;
Here, it is not clear whether the AND
an operator will be evaluated first or the OR
operator. To make it clear, we can use parentheses. Example:
SELECT * FROM film WHERE rating = 'PG-13' OR (release_year = 2006 AND rental_rate = 0.99);
Till now, we have used only =
for doing comparisons. Like traditional programming languages, MySQL also supports other comparison operators like >
, <
, >=
, <=
, !=
etc. Just one special case, !=
can also be written as <>
in MySQL. Example:
SELECT * FROM film WHERE rating <> 'PG-13';
IN Operator
With comparison operators, we can only compare a column with a single value. What if we want to compare a column with multiple values? For example, we want to get all the films from the film
table which have a rating of PG-13
or R
. One way to do that can be to combine multiple consitions using OR
. A better way will be to use the IN
operator to compare a column with multiple values. Example:
SELECT * FROM film WHERE rating IN ('PG-13', 'R');
Okay, now let’s say we want to get those films that have ratings anything other than the above 2. Any guesses how we may do that?
Correct! We had earlier discussed NOT
. You can also use NOT
before IN
to negate the condition. Example:
SELECT * FROM film WHERE rating NOT IN ('PG-13', 'R');
Think of IN to be like any other operator. Just that it allows comparison with multiple values.
Hope you had a good break. Let’s continue with the session. In this second part of the session, we are going to start the discussion by discussing about another important keyword in SQL, BETWEEN
.
IS NULL Operator
Now we are almost at the end of the discussion about different operators. Do you all remember how we store empties, that is, no value for a particular column for a particular row? We store it as NULL
. Interestingly working with NULLs is a bit tricky. We cannot use the =
operator to compare a column with NULL
. Example:
SELECT * FROM film WHERE description = NULL;
The above query will not return any rows. Why? Because NULL
is not equal to NULL
. In fact, NULL
is not equal to anything. Nor is it not equal to anything. It is just NULL
.
Example:
SELECT NULL = NULL;
The above query will return NULL
. Similarly, 3 = NULL
, 3 <> NULL
, NULL <> NULL
will also return NULL
. So, how do we compare a column with NULL
? We use the IS NULL
operator. Example:
SELECT * FROM film WHERE description IS NULL;
Similarly, we can use the IS NOT NULL
operator to find all the rows where a particular column is not NULL
. Example:
SELECT * FROM film WHERE description IS NOT NULL;
In many assignments, you will find that you will have to use the IS NULL
and IS NOT NULL
operators. Without them you will miss out on rows that had NULL values in them and get the wrong answer. Example: find customers with id other than 2. If you use =
operator, you will miss out on the customer with id NULL
. Example:
SELECT * FROM customers WHERE id != 2;
The above query will not return the customer with id NULL
. So, you will get the wrong answer. Instead, you should use the IS NOT NULL
operator. Example:
SELECT * FROM customers WHERE id IS NOT NULL AND id != 2;
Update
Now let’s move to learn U of CRUD. Update and Delete are thankfully much simpler, so don’t worry, we will be able to breeze through it over the coming 20 minutes. As the name suggests, this is used to update rows in a table. The general syntax is as follows:
UPDATE table_name SET column_name = value WHERE conditions;
Example:
UPDATE film SET release_year = 2006 WHERE id = 1;
The above query will update the release_year
column of the row with id
1 in the film
table to 2006. You can also update multiple columns at once. Example:
UPDATE film SET release_year = 2006, rating = 'PG' WHERE id = 1;
Let’s talk about how the update works. It works as follows:
for each row in film:
if row.matches(conditions in where clause)
row['release_year'] = 2006
row['rating'] = 'PG'
So basically update query iterates through all the rows in the table and updates the rows that match the conditions in the where clause. So, if you have a table with 1000 rows and you run an update query without a where clause, then all the 1000 rows will be updated. So, be careful while running update queries. Example:
UPDATE film SET release_year = 2006;
Delete
Finally, we are at the end of CRUD. Let’s talk about Delete operations. The general syntax is as follows:
DELETE FROM table_name WHERE conditions;
Example:
DELETE FROM film WHERE id = 1;
The above query will delete the row with id
1 from the film
table. If you don’t specify a where clause, then all the rows from the table will be deleted. Example:
DELETE FROM film;
Let’s talk about how delete works as well in terms of code.
for each row in film:
if row.matches(conditions in where clause)
delete row
Delete vs Truncate vs Drop
There are two more commands which are used to delete rows from a table. They are TRUNCATE
and DROP
. Let’s discuss them one by one.
Truncate
The command looks as follows:
TRUNCATE film;
The above query will delete all the rows from the film
table. TRUNCATE command internally works by removing the complete table and then recreating it. So, it is much faster than DELETE. But it has a disadvantage. It cannot be rolled back. We will learn more about rollbacks in the class on Transactions (In short, rollbacks can only happen for incomplete transactions – not committed yet – to be discussed in future classes). But at a high level, this is because as the complete table is deleted as an intermediate step, no log is maintained as to what all rows were deleted, and thus is not easy to revert. So, if you run a TRUNCATE query, then you cannot undo it.
Note: It also resets the primary key ID. For example, if the highest ID in the table before truncating was 10, then the next row inserted after truncating will have an ID of 1.
Drop
The command looks as follows:
DROP TABLE film;
The above query will delete the film
table. The difference between DELETE
and DROP
is that DELETE
is used to delete rows from a table and DROP
is used to delete the entire table. So, if you run a DROP
query, then the entire table will be deleted. All the rows and the table structure will be deleted. So, be careful while running a DROP
query. Nothing will be left of the table after running a DROP
query. You will have to recreate the table from scratch.
Note that, DELETE:
- Removes specified rows one-by-one from a table (may delete all rows if no condition is present in a query but keeps table structure intact).
- It is slower than TRUNCATE.
- Doesn’t reset the key.
- It can be rolled back.
TRUNCATE
- Removes the complete table and then recreates it.
- Faster than DELETE.
- Resets the key.
- It can not be rolled back because the complete table is deleted as an intermediate step.
DROP
- Removes the complete table and the table structure as well.
- It can not be rolled back.
LIKE Operator
LIKE operator is one of the most important and frequently used operators in SQL. Whenever there is a column storing strings, there comes a requirement to do some kind of pattern matching. For example, assume Scaler’s database where we have a batches
table with a column called name
. Let’s say we want to get the list of Academy
batches and the rule is that an Academy batch shall have Academy
somewhere within the name. How do we find those? We can use the LIKE
operator for this purpose. Example:
SELECT * FROM batches WHERE name LIKE '%Academy%';
Similarly, let’s say in our Sakila database, we want to get all the films that have LOVE
their title. We can use the LIKE
operator. Example:
SELECT * FROM film WHERE title LIKE '%LOVE%';
Let’s talk about how the LIKE
operator works. The LIKE
operator works with the help of 2 wildcards in our queries, %
and _
. The %
wildcard matches any number of characters (>= 0 occurrences of any set of characters). The _
wildcard matches exactly one character (any character). Example:
- LIKE ‘cat%’ will match “cat”, “caterpillar”, “category”, etc. but not “wildcat” or “dog”.
- LIKE ‘%cat’ will match “cat”, “wildcat”, “domesticcat”, etc. but not “cattle” or “dog”.
- LIKE ‘%cat%’ will match “cat”, “wildcat”, “cattle”, “domesticcat”, “caterpillar”, “category”, etc. but not “dog” or “bat”.
- LIKE ‘_at’ will match “cat”, “bat”, “hat”, etc. but not “wildcat” or “domesticcat”.
- LIKE ‘c_t’ will match “cat”, “cot”, “cut”, etc. but not “chat” or “domesticcat”.
- LIKE ‘c%t’ will match “cat”, “chart”, “connect”, “cult”, etc. but not “wildcat”, “domesticcat”, “caterpillar”, “category”.
COUNT
The count function takes the values from a particular column and returns the number of values in that set. Umm, but don’t you think it will be exactly the same as the number of rows in the table? Nope. Not true. Aggregate functions only take not null values into account. So, if there are any null values in the column, they will not be counted.
Example: Let’s take a students table with data like follows:
If you will try to run COUNT and give it the values in batch_id column, it will return 3. Because there are 3 not null values in the column. This is different from number of rows in the students table.
Let’s see how do you use this operation in SQL.
SELECT COUNT(batch_id) FROM students;
To understand how aggregate functions work via a pseudocode, let’s see how SQL query optimizer may execute them.
table = []
count = 0
for row in table:
if row[batch_id] is not null:
count += 1
print(count)
Few things to note here: While printing, do we have access to the values of row? Nope. We only have access to the count variable. So, we can only print the count. Extrapolating this point, when you use aggregate functions, you can only print the result of the aggregate function. You cannot print the values of the rows.
SELECT COUNT(batch_id), batch_id FROM students;
This will be an invalid query. Because you are trying to print the values of batch_id
column as well as the count of batch_id
column. But, you can only print the count of batch_id
column.
LIMIT Clause
And now let’s discuss the last clause for the day. The LIMIT clause allows us to limit the number of rows returned by a query. Example:
SELECT * FROM film LIMIT 10;
The above query will return only 10 rows from the film
table. If you want to return 10 rows starting from the 11th row, you can use the OFFSET
keyword. Example:
SELECT * FROM film LIMIT 10 OFFSET 10;
The above query will return 10 rows starting from the 11th row from the film
table. Note that in MySQL, you cannot use the OFFSET
keyword without the LIMIT
keyword. Example:
SELECT * FROM film OFFSET 10;
throws an error.
LIMIT clause is applied at the end. Just before printing the results. Taking the example of pseudocode, it works as follows:
answer = []
for each row in film:
if row.matches(conditions in where clause) # new line from above
answer.append(row)
answer.sort(column_names in order by clause)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
return filtered_answer[start_of_limit: end_of_limit]
Thus, if your query contains ORDER BY clause, then LIMIT clause will be applied after the ORDER BY clause. Example:
SELECT * FROM film ORDER BY title LIMIT 10;
The above query will return 10 rows from the film
table in ascending order of the title
column.
ORDER BY Clause
Now let’s discuss another important clause. ORDER BY clause allows to return values in a sorted order. Example:
SELECT * FROM film ORDER BY title;
The above query will return all the rows from the film table in ascending order of the title column. If you want to return the rows in descending order, you can use the DESC keyword. Example:
SELECT * FROM film ORDER BY title DESC;
You can also sort by multiple columns. Example:
SELECT * FROM film ORDER BY title, release_year;
The above query will return all the rows from the film
table in ascending order of the title
column and then in ascending order of the release_year
column. Consider the second column as tie breaker. If 2 rows have the same value of title, the release year will be used to break tie between them. Example:
SELECT * FROM film ORDER BY title DESC, release_year DESC;
Above query will return all the rows from the film
table in descending order of the title
column and if tie on title
, in descending order of the release_year
column.
By the way, you can ORDER BY
on a column which is not present in the SELECT
clause. Example:
SELECT title FROM film ORDER BY release_year;
Let’s also build the analogy of this with a pseudocode.
answer = []
for each row in film:
if row.matches(conditions in where clause) # new line from above
answer.append(row)
answer.sort(column_names in order by clause)
filtered_answer = []
for each row in answer:
filtered_answer.append(row['rating'], row['release_year'])
return filtered_answer
If you see, the ORDER BY
clause is applied after the WHERE
clause. So, first the rows are filtered based on the WHERE
clause and then they are sorted based on the ORDER BY
clause. And only after that are the columns that have to be printed taken out. And that’s why you can sort based on columns not even in the SELECT
clause.