Processing The Gun Violence

Columns To Keep

Each observation in the gun violence dataset is a single instance of a crime involving a gun. So far, I have only focused on three columns:

n_killed: The number of people killed for this incident
n_injured: The number of people injured for this incident
n_guns_involved: The number of guns involved in this incident

I also used the latitude and longitude columns in the process of grouping incidents by area.

Processing/Feature Engineering

In order to compare this dataset with the TEDS-D dataset, I had to figure out the CBSA code for each incident. To do this, I obtained a list of CBSA codes and then used Google's GeoCoding API to get the latitude and longitude for each one.

image.png
image.png
def getLatLon(x):
county = x['City']
state = x['State']
latlng = gmaps.geocode(f'{city}, {state}')[0]['geometry']['location']
try:
lat = latlng['lat']
lng = latlng['lng']
x['Latitude'] = lat
x['Longitude'] = lng
except KeyError:
x['Latitude'] = np.nan
x['Longitude'] = np.nan
return x

cities = cities.apply(getLatLon, axis=1)
image.png
image.png

Using this list, and the latitude and longitude columns of the gun violence dataset, I was able to find the closest CBSA for each incident and assign it a code.


From there I grouped the data by the CBSA codes and totaled up the columns (n_killed, n_injured, n_guns_involved) for each area. To avoid any problems, I divided each of the columns by the number of incidents reported for that area.


Finally, I merged the treatment scores with the gun violence dataset by joining on the CBSA code.

image.png
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.