In this guided project, I’ll be working with Jupyter notebook, and analyzing data on gun deaths in the US. The restriction, at this point, is not to use pandas module.

% matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

# helper function to plot
def plot_dictionary(in_dict):
    plt.figure()
    in_dict_count = np.arange(len(in_dict))
    plt.bar(in_dict_count, in_dict.values(), align='center', width=0.5)
    plt.xticks(in_dict_count, in_dict.keys(), rotation=90)
import csv
f = open("guns.csv","r")
data = list(csv.reader(f))
print(data[:5])
[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

Removing Headers From a List of Lists

headers = data[0]
data = data[1:]
print(headers)
print(data[:5])
['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]

Counting Gun Deaths By Year

The year column contains information on the year in which gun deaths occured. We can use this column to calculate how many gun deaths happened each year.

We can perform this operation by creating a dictionary, then keeping cound in the dictionary of how many times each elements occurs in the year column.

years = [x[1] for x in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] +=1
    else:
        year_counts[year] = 1

print(year_counts)
{'2012': 33563, '2013': 33636, '2014': 33599}
plot_dictionary(year_counts)
plt.title("Gun Deaths By Year")
plt.xlabel('Year'); plt.ylabel('Number of Gun Deaths')
<IPython.core.display.Javascript object>

Text(0,0.5,'Number of Gun Deaths')

Exploring Gun Deaths by Month and Year

It looks like gun death didn’t change much by year from 2012 to 2014. Let’s see if the gun deaths in the US change by month and year. In order to do this, we’ll have to create a datetime.datetime object using the year and month columns. We’ll then be able to count up gun deaths by date, like we did by year in the last screen.

We can use the month and year column of data to create a datetime. We’ll specidy a fixed day because we’re missing that column in our data. If we create a datetime.datetime object for each row, we can then count up how many gun deaths occured in each month and year using similar procedure to what we did above.

import datetime
dates = [datetime.datetime(year=int(row[1]),month=int(row[2]),day=1) for row in data]
print(dates[:5])
date_counts = {}
for dat in dates:
    if dat in date_counts:
        date_counts[dat] +=1
    else:
        date_counts[dat] = 1
date_counts
[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]





{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11, 1, 0, 0): 2758,
 datetime.datetime(2013, 12, 1, 0, 0): 2765,
 datetime.datetime(2014, 1, 1, 0, 0): 2651,
 datetime.datetime(2014, 2, 1, 0, 0): 2361,
 datetime.datetime(2014, 3, 1, 0, 0): 2684,
 datetime.datetime(2014, 4, 1, 0, 0): 2862,
 datetime.datetime(2014, 5, 1, 0, 0): 2864,
 datetime.datetime(2014, 6, 1, 0, 0): 2931,
 datetime.datetime(2014, 7, 1, 0, 0): 2884,
 datetime.datetime(2014, 8, 1, 0, 0): 2970,
 datetime.datetime(2014, 9, 1, 0, 0): 2914,
 datetime.datetime(2014, 10, 1, 0, 0): 2865,
 datetime.datetime(2014, 11, 1, 0, 0): 2756,
 datetime.datetime(2014, 12, 1, 0, 0): 2857}
plot_dictionary(date_counts)
<IPython.core.display.Javascript object>

Exploring Gun Deaths By Race and Sex

The sex and race columns contains potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of the columns can be done with a similar dictionary counting technique to what we did earlier.

sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1
sex_counts
{'M': 86349, 'F': 14449}
plot_dictionary(sex_counts)
plt.xlabel("Sex"); plt.ylabel('Number of Gun Deaths')
<IPython.core.display.Javascript object>

Text(0,0.5,'Number of Gun Deaths')
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1
race_counts
{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}
# display the race count as a bar plot
plot_dictionary(race_counts)
plt.title('Gun Deaths by Race')
plt.ylabel('Number of Gun Deaths')
<IPython.core.display.Javascript object>

Text(0,0.5,'Number of Gun Deaths')

Conclusion thus far

Gun deaths in the US seem to disproportionately affect men vs women (86349 vs 14449). They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall US population would help. (To help avoid drawing pre-mature conclusion and Simpson’s paradox.)

There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter. It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.

Reading In A Second Dataset

We explored gun deaths by race above. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won’t be able to meaningfully compare those numbers.

What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we’ll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

The data contains information on the total population of the US, as well as the total population of each racial group in the US. The data is stored in the census.csv file, and only consists of two rows:

# read in the `census.csv` and convert to a list of lists
census = list(csv.reader(open("census.csv","r")))
print(census)
[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]

Computing Rates of Gun Deaths Per Race

Earlier we computer the number of gun deaths per race, and create a dictionary race_counts

race_counts
{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we’ll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the white racial category is 197318956. So we’d perform the following division: white_gun_death_rate = 66237 / 197318956

This gives us the percentage chance that a given person in the white census race category would have been killed by a gun in the US from 2012 to 2014. If you do this computation, you’ll see that the race is very small number, 0.0003356849303419181.

It’s for this reason that it’s typical to express crime statistics as the “rate per 100000”. This tells you the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, we multiply by 100000: rate_per_hundredk = 0.0003356849303419181 * 100000

This gives us 33.56, which we can interpret at 33.56 out of every 100000 people in the white census race category in the US were kills by guns between 2012 and 2014.

We’ll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly different in census and in data. We’ll need to manually construct a dictionary that allows us to map between them, and perform the division.

Here’s a list of race name in data, and the corresponding race name in census:

  • Asian/Pacific Islander – Race Alone - Asian plus Race Alone - Native Hawaiian and Other Pacific Islander
  • Black – Race Alone - Black or African American
  • Hispanic – Race Alone - Hispanic
  • Native American/Native Alaskan – Race Alone - American Indian and Alaska Native
  • White – Race Alone - White

The dictionary we create will help us make the map: the key will be the race name from data and the population found for the races from census will serve as the dictionary values.

mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk
{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}
plot_dictionary(race_per_hundredk)
<IPython.core.display.Javascript object>

Filtering By Intent

We can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we’ll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.

We can do this by first extracting the intent column, then using the enumerate() function to loop through each index and value in the race column. If the value in the same position in the intents is Homicide, we’ll count the value in the range column.

Finally, we’ll use the mapping dictionary to convert from row counts to rates.

intents = [row[3] for row in data]
homicide_race_counts = {}
for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1

race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk
{'Asian/Pacific Islander': 3.530346230970155,
 'White': 4.6356417981453335,
 'Native American/Native Alaskan': 8.717729026240365,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914}
plot_dictionary(race_per_hundredk)
plt.title('Homicide Rates of Gun Deaths Per Race')
<IPython.core.display.Javascript object>

Text(0.5,1,'Homicide Rates of Gun Deaths Per Race')

Some findings thus far

It appears that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.

Some areas to investigate further:

  • The link between month and homicide rate.
  • Homicide rate by gender.
  • The rates of other intents by gender and race.
  • Gun death rates by location and education.