Welcome

Jorge Cimentada

Welcome to the world of Data Harvesting

How about you?

We’ll learn to scrape data from the internet.

We’ll talk to APIs

Tutoring on Tuesdays between 18h-20h. Requests should be per email and should receive a confirmation per email. Tutoring will be online over video call.
My email: cimentadaj@gmail.com
Classes on Thursdays
Class between 18:00 - 19:30
15 minute break
Class between 19:45 to 20:45

Scoring for the class:

Webscraping – First three classes (Jan 30th / Feb 6th / Feb 13th)

APIs - Next three classes (Feb 20th / Feb 27th / March 6th)

Automating Data Harvesting (Final class) - March 13th (ONLINE CLASS)

Presentation of projects - March 27nd / 16:30-19:15

Final project spreadsheet is here.

30 students / 15 groups
Try to find your partner as soon as possible – deadline 13 February (third class)
Final project ideas submission – deadline 27th Feb (fifth class)
2 weeks of work on final project
Final project submission – deadline March 13th (seventh class)
Final project presentation – March 27th / 16:30-19:15
Every team will have 10 minutes to present.

Handout: Github repository private or public.
A clear README on how to reproduce the scraper/API program.
Key is to make it reproducible: I should be able to clone the repository and execute whatever you need to me to produce the scraper.
Document what the output is, where it is saved and what each script in the program does.

The idea is for some medium-hard scraping/API projects.
- Scrape several sources of information
- Same website or combining several websites
- Meaningful dataset / Something that might help you on another class
- Remember most of the mark is for this project.
API Projects: tokens should not be hosted on your repository. Provide clear instructions where to place tokens for reproducibility.

Project ideas should be consulted and approved by me before the deadline

Emails can be directed at cimentadaj@gmail.com

Examples from previous classes:

Contribute to the book!