Activity: Strings and regular expressions II

Instructions:

Episode titles

The following code loads a data frame containing episode information from Season 11 of the British TV show Taskmaster:

library(tidyverse)
episode_info <- read_csv("https://sta279-f25.github.io/class_activities/taskmaster_episodes.csv")

The resulting data frame contains columns for the task, task description, episode, contestant, and score.

Examining the episode column, we see that the entries are strings that look like

"Episode 1: It's not your fault. (18 March 2021)"

That is, the string contains three different pieces of information: the episode number, the episode title, and the air date. In this activity, we will pull out each of these pieces of information.

Questions

  1. Extract just the episode numbers from the episode column.

  2. Now we want to extract the episode title for each entry. Use positive lookaheads and lookbehinds to extract the episode titles from the episode column. Hint: Parentheses ( ) are special characters in regular expressions. To match a literal parenthesis, you will need to use escape characters – that is, \\( and \\)

  3. Finally, use positive lookaheads and lookbehinds to extract the episode air dates from the episode column.

  4. Using your answers to the previous questions, modify the episode_info dataset so that the episode column is split into three different columns: episode number, episode title, and episode air date. Here is some example output:

  episode_num                 title      air_date
1           1 It's not your fault.  18 March 2021
2           1 It's not your fault.  18 March 2021
3           1 It's not your fault.  18 March 2021
4           1 It's not your fault.  18 March 2021
5           1 It's not your fault.  18 March 2021
6           1 It's not your fault.  18 March 2021

Phone numbers

Below is a vector containing 10 phone numbers:

phone_numbers <- c("(336) 703-2910",
                   "(336) 703-2665",
                   "(336) 703-2920",
                   "(336) 703-2930",
                   "(336) 703-2940",
                   "(336) 703-2950",
                   "(336) 703-2960",
                   "(336) 703-2970",
                   "(336) 703-2980",
                   "(336) 703-2990")
  1. Use string functions and regular expressions to convert these phone numbers to the following format: 336-703-2910

For question 5, try doing this with and without back references. Helpful string functions for this question include str_remove_all, str_replace, and str_replace_all