Project
Due: Friday, December 5 at 11:59pm
Instructions:
- Go to Canvas -> Assignments -> Project. Open the GitHub Classroom assignment link
- Follow the instructions to accept the assignment and clone the repository to your local computer
- Commit and push your work regularly
- Finally, request feedback on your assignment on the “Feedback” pull request
Important: Make sure to include all requested files in your repository on GitHub to receive full credit.
Group project
This is a group project. You will work in groups of 2-3 to complete the assignment. You must inform me of your groups by Friday, November 14. If you need help finding a group, please let me know. I reserve the right to assign or re-assign group members as necessary.
Organizing a student research competition
I am currently one of the organizers for the Undergraduate Statistics Project Competition (USPROC), a national statistics competition in which students submit research projects from classes and independent studies. This organization work involves managing data about student submissions, communicating with faculty mentors, and coordinating judging. I use many of the tools we have learned in STA 279 to do this work.
In this project, you will play the role of a competition organizer from start to finish, from handling student submissions to assigning judges and finally determining a winner. I have modified the rules of the competition slightly for this project, but otherwise what you will be doing is very similar.
Submission rules
- One student cannot submit multiple projects
- The length of a project can be at most 5 pages
- There can be at most 5 authors of the project (the student who submits the project, plus at most 4 coauthors)
- Each project must be sponsored by a faculty member, who will attest that the student(s) did complete the project themselves and were undergraduates when they did the work
- Each faculty member can sponsor at most 5 projects for the competition. If more than 5 projects are submitted for a faculty member, then they will be asked to choose their top 5 projects for judging
Judging rules
- Judges must be university faculty, graduate students, or industry professionals
- Each judge can score at most 6 projects
- Every project must be scored by at least 5 judges
- Each project is scored on the following criteria. Each criterion is
worth 10 points:
- Introduction: how well the students introduce and motivate the problem
- Methods: are the chosen statistical methods sufficient and appropriate to address the research question?
- Interpretation: do the students use and interpret their statistical results correctly?
- Discussion: do the students discuss their results and conclusions in the context of the research problem?
- Discretionary: each judge can award additional discretionary points for projects they believe are particularly worthy of praise
- Judges scores will be combined to determine 1st place, 2nd place, and 3rd place winners of the competition. It will be up to you to decide how to appropriately combine these scores
Your task
Provided data
In the GitHub repository, you are provided with 5 CSV files:
faculty_info.csv: the name and email address of the faculty members sponsoring the student projects in this competitionjudge_info.csv: the name, judge ID, and role (faculty, graduate student, etc.) of everyone who volunteered to judge this competitionjudge_scores.csv: scores from the judges on their assigned projects.- Note: part of your task will involve you assigning
judges to projects. You will almost certainly make different assignments
than what I did here. That is ok! You will just use the
judge_scores.csvfile to find the winners
- Note: part of your task will involve you assigning
judges to projects. You will almost certainly make different assignments
than what I did here. That is ok! You will just use the
student_submissions.csv: information about the different student submissions to the competitionverification_responses.csv: information provided by faculty sponsors when asked to verify their students’ submissions
Part 1: Send verification emails
The email addresses included in the faculty_info.csv
file really work (they all go to me). Your first task is to email each
faculty member with information about the project(s) submitted by their
students; each sponsor is asked to (1) verify their students’ work, and
tell us if any projects should be disqualified, and (2) choose their top
5 projects, if they have more than 5 projects submitted to the
competition.
Here is an example of what the email should look like:
Dear Professor Sebastian Vigil,
We are delighted to see that you have 3 student(s) who submitted to the competition.
For your convenience, here is a list of your students who have submitted to the competition:
| ID | Student | Coauthors |
|---|---|---|
| 2 | Cody Estudillo | Shaofan Roe-Miller, Dominic Oyebi, Brenda Collazo, Jesse Kuebler, Connie Jernigan |
| 14 | Kiana Blackmon | Mohamed Hines, Andrea Omar, Melana Axalan, Thaaqib al-Vohra, Ashley Makaiwi |
| 70 | Shawn Del Rosario | Derek Rhoads, Kamri Comfort, Cheyenne Cholas, Anna Moland |
Please note that in the case of multiple authors, the corresponding author is listed as Student and other authors are listed as Co-Authors.
We have two requests for you as the faculty sponsor.
Please verify that all of these are your students, that they (and any co-authors) are/were an undergraduate student when completing the work, and that you have approved their project for submission to the competition.
If you have more than 5 submissions in any of the categories, we ask you to provide a list of which projects you consider to be the top five submissions within each category to reduce the burden on our judges for course-wide submissions. Those top 5 will then be sent for judging with the potential to receive an award. While the remaining will not receive an award, they will, along with all submissions, each receive an email from us once judging is complete, thanking them for their submission and encouraging them on their educational path.
Thank you very much for promoting the competition with your students!
Best, (Your name here)
Requirements:
- The student information for each faculty sponsor must be included as a table in the email, as shown in the example
- The emails must be generated and sent automatically, using R. You are not allowed to write the emails all separately by hand. Rather, you must write a script that fills in the email template with the information for each faculty member (their name, the number of projects submitted, and a table of the student information), then sends the emails
- The emails must be sent to the faculty email addressed in the
faculty_infofile. If correctly sent to these email addresses, they will be forwarded to my inbox, so I will be able to check. Make sure to sign your names!
Faculty verification responses
The responses to your verification emails are contained in the
verification_responses.csv file. Each faculty member:
- Reports whether they have more than 5 projects entered in the competition
- If they do have more than 5 projects, they provide the IDs for their top 5 projects for judging (their other projects will not be judged)
- Reports the IDs for any of their projects which they think should be disqualified (e.g., because the students were not undergraduates at the time of the work)
Part 2: Assign judges
Now that you have received submission verification from the faculty
members, it is time to assign judges to the projects. Your task in this
part is to produce a table, which you will write to a CSV file called
judging_assignments.csv.
Requirements:
- Judging assignments should obey the judging and competition rules described above
- You may not waste judges’ time: do not send them inadmissible projects, and do not assign too many projects to any judge
- Your
judging_assignments.csvfile should contain one row for each judge assigned to the projects, with the following columns:judge_name: the name of the judgejudge_id: the judge’s ID number, corresponding to thejudge_idcolumn in thejudge_info.csvfile- A column for each project the judge is assigned, containing the project ID number. Reading across the row for each judge will tell me which projects were assigned to that judge
Judging scores
The judges submit their scores, and the resulting scores are
contained in the judge_scores.csv file. I know that the
judging assignments you made will not match up with the projects
assigned to each judge in judge_scores.csv. That is ok! For
Part 2, ignore the judge_scores.csv file. Then for Part 3
(finding winners), just use the judge_scores.csv file –
don’t worry about your own judging assignments after finishing Part
2.
Part 3: Winning projects
Now that we have the judging scores, we can go ahead and identify the winners! Using the scores, your task is to create a table containing the 1st, 2nd, and 3rd place winner information. Your table should have three rows (one for each winner), and the following columns:
Award: whether the project won 1st, 2nd, or 3rd placeID: the ID of the projectStudent: the full name of the student who submitted the projectCoauthors: the names of any co-authors on the projectFaculty: The full name of the faculty sponsor for the projectSchool: the school which the students attend
You will save this information in a CSV file called
winning_projects.csv.
Requirements:
- It is up to you to figure out how to combine judging scores to determine the winning projects. However, your choice must be statistically sound and defensible
Other requirements
Code:
- All code needed to reproduce the work must be included in the repository and pushed to GitHub. This includes the code for emailing the faculty sponsors, assigning judges, and determining winners.
- Any additional files you create, such as
judging_assignments.csvandwinning_projects.csv, must also be included in the GitHub repository - You may not modify any of the original CSV files I provide
- Your code should be organized and commented. Consider dividing different tasks between different files. Consider using appropriate helper functions when needed.
Contributions and sources: Include a
README.md in the repository which describes:
- The contributions for each group member
- Any outside resources used (provide citation for things like R package documentation, discussion forum posts that were helpful, etc.)
- Any use of generative AI. Your disclosure should state what program you used and how you used it, including links to the specific prompts you used, if possible. Properly citing the AI-generated content allows me to understand your process better and gives credit to the assistance received from these tools.
Checklist
- All code necessary to reproduce your work is included in the repository and pushed to GitHub
judging_assignments.csvandwinning_projects.csvfiles are included and pushed to GitHubREADME.mdis included, describing each group member’s contributions, citing any outside sources, and thoroughly describing any use of AI- Verification emails were sent to the email addresses in
faculty_info.csv - Code organization:
- Proper use of
.Rand.qmdfiles - Code divided between multiple files if needed
- Code is commented
- Helper functions are used when appropriate
- Code uses tools we have learned in class
- Proper use of