You’ll be scraping data from this website, which contains a list of incidents involving commercial aircraft listed by year.
Write a scraper that will produce a pandas dataframe containing the following columns:
If you choose not to do the extra credit assignment, you can start the assignment from this csv file which contains the description and the link for each page. Read this csv file into a pandas DataFrame as a starting point for part b.
Now write a code that clicks each link and scrapes additional content from the detailed page associated with each individual crash. How will you ensure that you rate limit your requests to the target web server? Once you have implemented this feature, scrape the content located in the right column of each details page and put it in a DataFrame:
If there are multiple responses for passengers, just save the first one for simplicity. Similarly if there are not entries (e.g. for registration in the first link) then you can simply fill that entry in the DataFrame with 'No data'.
Which were the top 5 most deadly aviation incidents? Report the number of fatalities and the flight origin for each.
Which flight origin has the highest number of aviation incidents in the last 25 years?
Save this Dataframe as JSON and commit to your repo, along with the notebook / python code used to do this assignment.