library(tidyverse)
episode_info <- read_csv("https://sta279-f25.github.io/class_activities/taskmaster_episodes.csv")Activity: Strings and regular expressions II
Instructions:
- Work with a neighbor to answer the following questions
- To get started, download the class activity template file
- When you are finished, render the file as an HTML and submit the HTML to Canvas (let me know if you encounter any problems)
Episode titles
The following code loads a data frame containing episode information from Season 11 of the British TV show Taskmaster:
The resulting data frame contains columns for the task, task description, episode, contestant, and score.
Examining the episode column, we see that the entries are strings that look like
"Episode 1: It's not your fault. (18 March 2021)"
That is, the string contains three different pieces of information: the episode number, the episode title, and the air date. In this activity, we will pull out each of these pieces of information.
Questions
Extract just the episode numbers from the
episodecolumn.Now we want to extract the episode title for each entry. Use positive lookaheads and lookbehinds to extract the episode titles from the
episodecolumn. Hint: Parentheses( )are special characters in regular expressions. To match a literal parenthesis, you will need to use escape characters – that is,\\(and\\)Finally, use positive lookaheads and lookbehinds to extract the episode air dates from the
episodecolumn.Using your answers to the previous questions, modify the
episode_infodataset so that theepisodecolumn is split into three different columns: episode number, episode title, and episode air date. Here is some example output:
episode_num title air_date
1 1 It's not your fault. 18 March 2021
2 1 It's not your fault. 18 March 2021
3 1 It's not your fault. 18 March 2021
4 1 It's not your fault. 18 March 2021
5 1 It's not your fault. 18 March 2021
6 1 It's not your fault. 18 March 2021
Phone numbers
Below is a vector containing 10 phone numbers:
phone_numbers <- c("(336) 703-2910",
"(336) 703-2665",
"(336) 703-2920",
"(336) 703-2930",
"(336) 703-2940",
"(336) 703-2950",
"(336) 703-2960",
"(336) 703-2970",
"(336) 703-2980",
"(336) 703-2990")- Use string functions and regular expressions to convert these phone numbers to the following format:
336-703-2910
For question 5, try doing this with and without back references. Helpful string functions for this question include str_remove_all, str_replace, and str_replace_all