Skip to content

PREDICTING DRUG AND ALCOHOLABUSE TREATMENT COMPLETION BY LOOKING AT GUN VIOLENCE

Project Description

The Treatment Score

The Treatment Score (cont.)

Processing The Gun Violence

More

Share

Explore

Processing The Gun Violence

Columns To Keep

Each observation in the gun violence dataset is a single instance of a crime involving a gun. So far, I have only focused on three columns:

n_killed: The number of people killed for this incident

n_injured: The number of people injured for this incident

n_guns_involved: The number of guns involved in this incident

I also used the latitude and longitude columns in the process of grouping incidents by area.

Processing/Feature Engineering

In order to compare this dataset with the TEDS-D dataset, I had to figure out the CBSA code for each incident. To do this, I obtained a list of CBSA codes and then used Google's GeoCoding API to get the latitude and longitude for each one.

⁠

⁠

⁠

⁠

⁠

⁠

def getLatLon(x):

county = x['City']

state = x['State']

latlng = gmaps.geocode(f'{city}, {state}')[0]['geometry']['location']

try:

lat = latlng['lat']

lng = latlng['lng']

x['Latitude'] = lat

x['Longitude'] = lng

except KeyError:

x['Latitude'] = np.nan

x['Longitude'] = np.nan

return x

cities = cities.apply(getLatLon, axis=1)

⁠

⁠

⁠

⁠

⁠

⁠

Using this list, and the latitude and longitude columns of the gun violence dataset, I was able to find the closest CBSA for each incident and assign it a code.

From there I grouped the data by the CBSA codes and totaled up the columns (n_killed, n_injured, n_guns_involved) for each area. To avoid any problems, I divided each of the columns by the number of incidents reported for that area.

Finally, I merged the treatment scores with the gun violence dataset by joining on the CBSA code.

⁠

⁠

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.