SHOW TABLES;| Tables_in_airlines |
|---|
| airports |
| carriers |
| flights |
| flights_summary |
| planes |
The airlines SQL database contains information on 48 million flights from the Bureau of Transportation Statistics (you have worked with a small subset of this data in the nycflights13 package). Information from the database can be obtained through SQL queries. For example, the flights contains the following tables:
SHOW TABLES;| Tables_in_airlines |
|---|
| airports |
| carriers |
| flights |
| flights_summary |
| planes |
Make sure the mdsr and DBI packages are installed, then run the following code in R:
library(tidyverse)
library(mdsr)
library(DBI)
db <- dbConnect_scidb("airlines")Let’s try a quick query to see that it works:
query <- "
SELECT name FROM carriers LIMIT 0,5;
"
dbGetQuery(db, query) name
1 Titan Airways
2 Tradewind Aviation
3 Comlux Aviation, AG
4 Master Top Linhas Aereas Ltd.
5 Flair Airlines Ltd.
Here, we are querying SQL through R. We store the query in a string, and then use the dbGetQuery function to send that query to the database and return the results.
tailnum and year columns from the planes table. For safety, we are only going to return the first 5 rows.query <- "
SELECT ... FROM ... LIMIT 0,5;
"
dbGetQuery(db, query)Solution:
query <- "
SELECT tailnum, year FROM planes LIMIT 0,5;
"
dbGetQuery(db, query) tailnum year
1 N10156 2004
2 N102UW 1998
3 N103US 1999
4 N104UW 1999
5 N10575 2002
MIN() function, find the oldest date of manufacture (year) in the planes table.query <- "
SELECT ... FROM ... LIMIT 0,5;
"
dbGetQuery(db, query)Solution:
query <- "
SELECT tailnum, MIN(year) FROM planes LIMIT 0,5;
"
dbGetQuery(db, query) tailnum MIN(year)
1 N10156 1956
WHERE, find the tailnum of the oldest plane in the planes table.query <- "
SELECT ...
FROM ...
WHERE ...
LIMIT 0,5;
"
dbGetQuery(db, query)Solution:
query <- "
SELECT tailnum, year
FROM planes
WHERE year = 1956
LIMIT 0,5;
"
dbGetQuery(db, query) tailnum year
1 N381AA 1956
Questions 2 and 3 required two separate queries: first to find the oldest plane, then to find the tail number. We can be more efficient by using a subquery to choose only the years equal to the smallest year
query <- "
SELECT tailnum, year
FROM planes
WHERE year = (SELECT MIN(year) FROM planes)
LIMIT 0,5;
"
dbGetQuery(db, query) tailnum year
1 N381AA 1956
SUM(1) or COUNT(...) to count rows.Solution:
query <- "
SELECT year, SUM(1) AS N
FROM planes
WHERE year = (SELECT MAX(year) FROM planes)
LIMIT 0,10;
"
dbGetQuery(db, query) year N
1 2013 92
query <- "
SELECT year, COUNT(year) AS N
FROM planes
WHERE year = (SELECT MAX(year) FROM planes)
LIMIT 0,10;
"
dbGetQuery(db, query) year N
1 2013 92
dbDisconnect(db)