Course Details

  • Number: PPHA 30530, SOCI 40214
  • Lecture: Tuesdays, Thursdays at 1:30-2:50pm
  • Room: PBPL-224
  • Text: No required book for the course, however there will be optional and required online readings.
  • Questions/non-git assignment submission: (list for instructors/TA)
  • Class Discussion: Piazza
Jennifer Helsby, postdoctoral fellow, Center for Data Science and Public Policy
  • Email:
  • Office: 219A, Computation Institute, 5735 S Ellis Ave
  • Office Hours: Fridays 11AM
Eric Potash, postdoctoral fellow, Center for Data Science and Public Policy
  • Email:
  • Office: 211, Computation Institute, 5735 S Ellis Ave
  • Office Hours: Mondays 5pm
Bill Harper, teaching assistant

Assignments and Grading

Grading for the course will come from assignments and a final group project:

Software Setup

A virtual machine (VM) will be provided that has all the software required for the course installed.
  • Download: Link is available on Chalk (under course announcements).
  • Setting up the VM: Download VirtualBox and install it on your local machine. You can load this machine in VirtualBox by going to the File menu, then click Import Appliance.
  • Contents: The class VM is a Debian Linux machine with everything you will need to participate in the course, including nano, git, QGIS, NLTK, pgadmin, IPython, scikit-learn, numpy, scipy, pandas, matplotlib, seaborn, statsmodels, sqlalchemy, flask, beautifulsoup and urllib3. Students are free to use their own machines if they have the software setup, else they should use the virtual machine.

Course Schedule

Note the schedule will be updated throughout the quarter. Assignments will be posted on the specified dates and readings will be finalized one week before they are due. Please check this site regularly and we will notify you of any major changes.

January 5 Course introduction and demos

January 7 Command line, intro to Python (variables, types, data structures), basic plotting with matplotlib

Required Reading:

January 12 Python control flow

January 14 Data and functions

January 19 Effective software development skills, version control, git

Required Reading: Optional Reading:

January 21 DataFrame Operations

January 26 Working with different files (JSON, xls, etc.), writing modules/packages, debugging

Required Reading:

January 28 Census and survey data; introduction to relational databases

February 2 Ethics, data security and privacy

Required Reading:
  • Assignment 2 due

February 4 SQL continued

February 9 Web scraping

Recommended Reading:

February 11 More SQL; Web APIs

February 16 Servers and Deployment

February 18 Mapping, GIS

Recommended Reading:
  • Tufte, "The Visual Display of Quantitative Information"
Required Reading:
  • Krivo and Peterson, “Extremely Disadvantaged Neighborhoods and Urban Crime”, Social Forces (1996)

February 23 Web application programming with Flask

Required Reading:

February 25 Text analysis, natural language processing

March 1 Flask web development continued

March 3 Machine learning 1

March 8 Final Project Presentations 1

  • Assignment 5 due

Finals week: Final project presentations 2