Skip to content
PREDICTING DRUG AND ALCOHOLABUSE TREATMENT COMPLETION BY LOOKING AT GUN VIOLENCE
Share
Explore

Processing The Gun Violence

Columns To Keep

Each observation in the gun violence dataset is a single instance of a crime involving a gun. So far, I have only focused on three columns:

n_killed: The number of people killed for this incident
n_injured: The number of people injured for this incident
n_guns_involved: The number of guns involved in this incident

I also used the latitude and longitude columns in the process of grouping incidents by area.

Processing/Feature Engineering

In order to compare this dataset with the TEDS-D dataset, I had to figure out the CBSA code for each incident. To do this, I obtained a list of CBSA codes and then used Google's GeoCoding API to get the latitude and longitude for each one.

image.png
image.png
def getLatLon(x):
county = x['City']
state = x['State']
latlng = gmaps.geocode(f'{city}, {state}')[0]['geometry']['location']
try:
lat = latlng['lat']
lng = latlng['lng']
x['Latitude'] = lat
x['Longitude'] = lng
except KeyError:
x['Latitude'] = np.nan
x['Longitude'] = np.nan
return x

cities = cities.apply(getLatLon, axis=1)
image.png
image.png

Using this list, and the latitude and longitude columns of the gun violence dataset, I was able to find the closest CBSA for each incident and assign it a code.


From there I grouped the data by the CBSA codes and totaled up the columns (n_killed, n_injured, n_guns_involved) for each area. To avoid any problems, I divided each of the columns by the number of incidents reported for that area.


Finally, I merged the treatment scores with the gun violence dataset by joining on the CBSA code.

image.png
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.