Homework 1 Solutions

There are many valid ways of approaching this first assignment. Here we'll show one method of doing the homework assignment using primarily pandas.

1. Write a piece of code that reads in the data to a format that you can use.

In [27]:
% matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
In [2]:
crimes = pd.read_csv('Recent_Crimes.csv')

2. How often did a crime result in an arrest?

In [3]:
arrest_counts = crimes['Arrest'].value_counts()
In [4]:
arrest_counts
Out[4]:
False    15337
True      2888
Name: Arrest, dtype: int64
In [5]:
arrest_counts[1]/(arrest_counts[0] + arrest_counts[1]) * 100.
Out[5]:
15.84636488340192

In the last month, 15.8% of crimes result in an arrest.

Another approach would be to use describe() to get summary statistics.

In [6]:
crimes['Arrest'].describe()
Out[6]:
count       18225
mean     0.158464
std      0.365185
min         False
25%             0
50%             0
75%             0
max          True
Name: Arrest, dtype: object

3. Which types of crimes most often result in arrest?

In [7]:
bycrime = crimes.groupby('Primary Type')
In [8]:
frac_arrest_bycrime = bycrime.mean()['Arrest']
In [9]:
frac_arrest_bycrime
Out[9]:
Primary Type
ARSON                                0.037037
ASSAULT                              0.153430
BATTERY                              0.162437
BURGLARY                             0.009959
CONCEALED CARRY LICENSE VIOLATION    1.000000
CRIM SEXUAL ASSAULT                  0.025974
CRIMINAL DAMAGE                      0.034698
CRIMINAL TRESPASS                    0.484108
DECEPTIVE PRACTICE                   0.028536
GAMBLING                             1.000000
HOMICIDE                             0.033333
INTERFERENCE WITH PUBLIC OFFICER     0.803030
INTIMIDATION                         0.000000
KIDNAPPING                           0.047619
LIQUOR LAW VIOLATION                 1.000000
MOTOR VEHICLE THEFT                  0.033839
NARCOTICS                            1.000000
NON - CRIMINAL                       0.000000
NON-CRIMINAL                         0.000000
OBSCENITY                            0.500000
OFFENSE INVOLVING CHILDREN           0.032051
OTHER OFFENSE                        0.161111
PROSTITUTION                         1.000000
PUBLIC INDECENCY                     1.000000
PUBLIC PEACE VIOLATION               0.578125
ROBBERY                              0.017149
SEX OFFENSE                          0.097561
STALKING                             0.000000
THEFT                                0.071380
WEAPONS VIOLATION                    0.500000
Name: Arrest, dtype: float64
In [28]:
frac_arrest_bycrime.plot(kind='bar')
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x10853f518>

We can clearly see six categories that always resulted in an arrest: concealed carry license violation, gambling, liquor law violation, narcotics, prostitution, and public indecency. These are acceptable answers.

However, if we look at the counts per category, we can see that some of these categories had very few crimes. So we might filter out categories by the number of crimes:

In [11]:
num_per_crime = bycrime.count()['Arrest']
In [12]:
num_per_crime[num_per_crime >= 15].index
Out[12]:
Index(['ARSON', 'ASSAULT', 'BATTERY', 'BURGLARY', 'CRIM SEXUAL ASSAULT',
       'CRIMINAL DAMAGE', 'CRIMINAL TRESPASS', 'DECEPTIVE PRACTICE',
       'HOMICIDE', 'INTERFERENCE WITH PUBLIC OFFICER', 'KIDNAPPING',
       'MOTOR VEHICLE THEFT', 'NARCOTICS', 'OFFENSE INVOLVING CHILDREN',
       'OTHER OFFENSE', 'PROSTITUTION', 'PUBLIC PEACE VIOLATION', 'ROBBERY',
       'SEX OFFENSE', 'THEFT', 'WEAPONS VIOLATION'],
      dtype='object', name='Primary Type')

Now we can see that narcotics and prostitution are the crimes that most often resulted in arrest in the last month. This answer is better than the above.

4. What are the number of weapons violations (one of the Primary Types) per district?

In [13]:
weap_viols = crimes[crimes['Primary Type'] == 'WEAPONS VIOLATION']
In [14]:
dist_group = weap_viols.groupby('District')
In [15]:
dist_group.count()['ID']
Out[15]:
District
2      4
3      7
4     11
5     15
6     16
7     12
8      9
9     13
10    10
11    20
12     5
14     1
15     9
16     1
18     2
19     2
20     3
22     3
24     3
25     8
Name: ID, dtype: int64

5. What are the number of arrests per days of the week? Which day of the week has the most arrests?

In [16]:
arrests = crimes[crimes['Arrest'] == True]
In [17]:
daysofweek = pd.to_datetime(arrests.Date).apply(lambda x: x.dayofweek)
In [18]:
daysofweek.value_counts()
Out[18]:
1    489
2    442
5    426
6    411
4    384
3    373
0    363
Name: Date, dtype: int64

It looks like Tuesday had the most arrests in this time period.

6. Make bar charts that show (a) the result of Question 4 and (b) Question 5.

In [29]:
dist_group.count()['ID'].plot(kind='bar')
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x108896208>
In [30]:
daysofweek.value_counts().plot(kind='bar', title='Arrests Per Day of Week')
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x108d63c50>

7. Make a scatter plot of latitude versus longitude (we’ll get more into making real maps later in the course) for those crimes where the Primary Type was: deceptive pratice.

In [21]:
deceptive_prac = crimes[crimes['Primary Type'] == 'DECEPTIVE PRACTICE']
In [31]:
deceptive_prac.plot(kind='scatter', x='Longitude', y='Latitude')
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x108da7438>
/Users/redshiftzero/miniconda3/lib/python3.4/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

8: Make a histogram that shows the number of arrests per beat. The Y-axis should show Frequency and the X-axis should show the count of arrests in that beat.

In [23]:
by_beat = arrests.groupby('Beat')
In [24]:
by_beat.count()['ID'].values
Out[24]:
array([20, 11,  5,  3,  8,  8,  2,  7,  5,  5,  6,  9,  3,  4,  4,  5,  8,
        7,  7,  8,  9, 10,  1,  1,  6, 11,  7,  7, 13,  9,  5,  9, 20,  8,
        5,  8, 12, 18, 10, 18, 35, 19, 24, 15, 17, 12,  9,  7, 24, 13, 27,
       13, 10, 19, 13, 11, 12, 21, 16, 16, 10, 28, 35, 27, 31, 12, 17, 15,
        9, 11, 23, 12,  5,  8,  3, 24, 10, 25,  3, 20, 10, 18, 10, 24,  8,
        3, 11,  1, 15,  6,  8, 10, 10,  9, 11,  9,  6,  3,  9,  7,  9,  3,
        4, 11,  3, 13,  3,  6,  5,  8,  3, 10, 10,  7, 32, 13,  8, 12, 20,
       17,  7, 14, 10,  5,  8, 12, 16, 36, 39,  9, 57, 51, 27, 23, 16,  3,
       55, 39, 20, 44,  7,  6,  1,  4,  8, 13,  9,  3, 16,  3,  4,  6,  3,
        3,  5,  5,  3,  4, 10,  8,  8, 10,  1, 18,  4,  8,  7,  6, 10, 42,
       17, 17, 20, 24, 44,  3,  2,  5,  4, 14,  7,  5,  5, 13,  7,  1,  1,
        3,  2,  1,  6, 14,  9,  6,  7,  7, 14,  8,  1,  1,  1,  7,  5,  2,
       11,  6,  2, 10, 16,  8, 14,  8,  4,  2,  6,  6,  6,  5,  4,  6,  2,
        6, 15,  4,  5,  4,  6,  4,  7,  6,  8,  5,  7, 10,  7, 13,  9,  9,
        7,  8,  3,  3, 15, 17,  2, 10,  2,  9,  7,  5, 15,  4,  6, 23, 19,
       11,  8, 11,  3, 11, 13, 26, 18, 14])
In [25]:
arrest_rate_by_beat = by_beat.count()['ID'].values
In [32]:
_, _, _ = plt.hist(arrest_rate_by_beat, bins=30)
plt.xlabel('Arrests per Beat')
plt.ylabel('Frequency')
Out[32]:
<matplotlib.text.Text at 0x108fe96a0>