Project
- Deadline: January 10th, 2024 at 18:00
- Presentation: January 12th, 2024
Overview
For your individual project you will create your own blog post. The post should illustrate an issue using a unique data set collected by yourself and illustrate the tools taught in the course. Deadline for handing in the project will be January 10th, 2024 at 18:00. Hand-in is done in the usual fashion, where you push it to your Github repository under a subdirectory called project and raise an issue. You will be able to push to your Github repository earlier if you want, we will only assess the version of your code at the time of the deadline. On January 12th, 2024, you will present your project infront of the class. The presentation should be no longer than 5 minutes. Presence is compulsory for the entire session you are presenting in.
Inspiration
You can find some inspiration of what I am expecting of you for the project
- The Olympic Medal Table Visualized Gapminder Style
- On a First Name Basis with Statistics Sweden
- Baby Weight Shiny app
- Are #python users more likely to get into Slytherin?
Data sources
During the course, you were introduced to a lot of possible data sources. Additional public web based data sources could, e.g., be the Stockholm Open Data Portal or an API to query data from Sweden’s national data portal. Another example of a contemporary website for relevant data is the COVID-19 data page by the Swedish Folkhälsomyndigheten.
Details
Find data out in the wild - this can be an open-access SQL database, API data1, web scraping data, personal surveillance data (e.g., running watch, log-files), data collected as part of a hobby, … The raw data should not be of sensitive nature (data protection!) and should be accessible and uploadable to Github without violating any copyright or access rights.
Determine a good story you can tell based on the data, e.g., a specific hypothesis you want to investigate, a cool visualization of numbers, a data journalism type of story, an educative post, something which might interest your fellow students. Your post can be about a serious matter, but it can also be a not so serious matter. However, make it clear before writing who is your intended readership (general public, fellow B.Sc. students, R users, ornotologists, …)
Read the data, wrangle the data, visualize the data, make simple statistical summaries and interpret the results in accordance with the selected story you chose in 2.
Write a story worthwhile reading for the selected target audience - it can be written in Swedish or English. The story needs to be written so that the it can be reproduced, e.g Notebook and should not be excessively long. As a rough guideline: Between 1000-1500 words in the text, no more than 7 figures (tables count as figures). No more than 7 visible code chunks (if you decide to have any at all).
Create a 5 minute presentation about your work to present to your fellow students on 2023-01-12. The main aim of your presentation is to convince your fellow students that they should read your blog pos
The biggest challenge of the project will be to be realistic about what you can achieve within the given deadline. Once you have an estimate of how much that could be, take 50% of that and you are still likely to be busy. Make sure you have a working project early on and then scale up iteratively, so you’re always ready. Start early.
Technical Details
You will hand-in the project as you have done in the homeworks. That is, via github, by pushing it to repo and raising an issue. The project should be under a subdirectory called project
. A few things that should be included is the code you have written that generates the report. Either a Notebook (.ipynb
) or a R-markdown file (.Rmd
), for the sake of reproducibility.
Grading
The project will be graded based on the following five dimensions, which have equal weight:
Technical difficulty of the project, i.e. how hard to get the data imported, how much time needed to wrangle, how advanced are some of the methods to get statistical summaries, use of additional technical shenanigans
Coding style and reproducibility of the the submitted file (
.ipynb
or.Rmd
). This includes an assessment of, whether the code i readable.Quality of the visualisations and their interpretation.
Readability of the project report (is the story concise, are the aims clear, is the readership happy, decent spelling & grammar). In particular: Less is sometimes more!
Quality of the presentation (slides, snore factor, staying in time, …)