You will be expected to:
  • Select an area of interest in public policy or social science. Define some questions you’d like to answer using data. Write a short (1 page or less) proposal describing the project you plan to do and potential data sources.
  • Collect datasets through some combination of open data and data collection (e.g. web scraping) and vet them.
  • Perform computational analysis to answer your questions. This work will be done in a git repo that will be shared with the class.
  • Present your results to the class during finals week.
  • Proposal: Due Friday, Feb 12th at 5pm by email to the mailing list. Should include:
    • your group,
    • your area of interest,
    • the data sources you will use, and
    • the questions you hope to answer.
  • Git repository: Due 2/17. We will be grading you based on:
    • (30%) Clear, well-documented code. Unless you used private data that you cannot commit to your repository, we should be able to run your code to replicate your analysis.
    • (10%) Use Markdown to create a file in your repository that briefly (equivalent of 1 page) summarizes the purpose and findings of your project, including some tables and visualizations.
  • Project presentation: During finals week. At least 10 minutes. For groups or 2 or more, at least 20 minutes but no more than 30 minutes. We will be looking for the following (but feel free to use a different outline):
    • Introduction and motivation: Why did you select this problem?
    • Data: Describe your data including how you selected and collected it.
    • (30%) Analysis and Visualization
    • (30%) Result: What does your work mean? What questions does it answer?
    • Further work: If you were to continue this work, what extensions would you pursue?
Example Projects: