Introduction

The motivation for these interactive maps are to present supplemental information for my dissertation on crime at micro-places in Washington, D.C. If you have any questions, or would like access to any of the original data feel free to send me an email. I will update this document to link to the dissertation - but for now feel free to either email me or see my web-page, http://andrewpwheeler.wordpress.com/, for updates.

Outlier Incidents: Bars and Crime

This map portrays outlier incidents from the model predicting crime by the bars on local and neighboring streets. If you click on the icons it shows the associated information from the model - and a table of the same information is below the map. I have given names to some of the locations as generic descriptors, not as absolute reference to the location.

MarID	TotalLic	NeighBar	TotalCrime	PredMeanCrime	PredLogCrime	StdErrPredLogCrime	DevResVal	StdPearResid	Area Name
800252	12	7	22	272	5.606	.454	-1.776	-.487	9th and Florida
800728	2	27	5	316	5.756	.340	-2.478	-.516	Jeff. and Conn.
804518	12	7	48	309	5.734	.453	-1.422	-.448	U St.
813230	3	28	14	365	5.899	.359	-2.129	-.506	Verizon Center
810506	12	0	1	932	6.837	.668	-3.302	-.540	Union Station
807751	1	26	0	215	5.371	.344	-3.279	-.524	Adam’s Morgan
813215	0	27	29	966	6.874	.364	-2.245	-.509	Adam’s Morgan
815376	0	34	6	3911	8.272	.458	-3.287	-.527	Adam’s Morgan
905208	1	28	1	264	5.577	.362	-2.898	-.522	Adam’s Morgan
813850	20	22	63	8	2.061	.843	3.026	4.896	Adam’s Morgan
800897	2	4	249	11	2.440	.090	5.799	10.657	DC USA (Colum. Heights)

Updated Outlier Incidents

Fitting a more general model of crime and including other variables in the model, like recreational areas or shopping areas, improved the fit of these same 11 locations. Here is an updated table with these locations and there predictions from the more general model compared to the previous fit only including bars. The models labelled as (B) are the estimates and residuals from the original bars model, and the estimates labelled as (G) are from the more general model of crime.

The only location that appears to have a worse fit is 813850, a location from the Adam’s Morgan cluster. Previously it was severely under-predicted, and now it is severely over-predicted (it is actually the location with the highest predicted crime total in the entire sample for the general model). Because the variance of this location is inflated though, the Pearson and deviance residuals are not very large. All other locations the predictions are clearly closer to the true values, although are still not very good predictions!

MarID	Description	Crimes	Predicted (B)	Predicted (G)	Pearson (B)	Pearson (G)	Deviance (B)	Deviance (G)
800252	9th and Florida	22	272	86	-0.5	-0.5	-1.8	-1.1
800728	Jeff. and Conn.	5	316	47	-0.5	-0.6	-2.5	-1.6
804518	U St.	48	309	70	-0.4	-0.2	-1.4	-0.4
813230	Verizon Center	14	365	99	-0.5	-0.5	-2.1	-1.5
810506	Union Station	1	932	59	-0.5	-0.7	-3.3	-2.3
807751	Adam’s Morgan	0	215	58	-0.5	-0.6	-3.3	-2.9
813215	Adam’s Morgan	29	966	118	-0.5	-0.5	-2.2	-1.1
815376	Adam’s Morgan	6	3,911	126	-0.5	-0.6	-3.3	-2.0
905208	Adam’s Morgan	1	264	24	-0.5	-0.6	-2.9	-1.9
813850	Adam’s Morgan	63	8	1,510	4.9	-0.7	3.0	-2.1
800897	DC USA (Colum. Heights)	249	11	21	10.7	6.8	5.8	4.0

Identifying outlying locations in the new model was done by examining Pearson and deviance residuals, as well as Cook’s distance and leverage values. The Pearson residuals were decidedly not normal and had a long tail with a strong right skew. Locations with a Pearson residual over 30 (so are large under-predictions) were selected - although the distribution had a continuous right tail so other values could be chosen. A few locations were also selected as having large, outlying leverage values for the Cook’s distance or leverage measure. The two figures below show the process of identifying outliers.

The first scatterplot is the linear predictor on the x axis and the Pearson residual on the y axis. Points marked as outliers with a Pearson residual above 30 are labelled, although one can see the long tail of Pearson residuals so the cut off is somewhat arbitrary. The street unit 813850 is also shown to clearly be the largest predicted value in the distribution. The second scatterplot display the Cook’s distance values on the x axis and the leverage values on the y axis. Most cluster near the origin and the values appear to be orthogonal. A few locations that were outliers in the plot on either or both axes are labelled.

These same locations are presented in the map below.

Briefly are some discussion about the locations and potential explanations why the model failed to predict these locations accurately.

813457 - McMillan Reservoir - was identified as a high leverage location likely because this street unit had 4 of the Toxic Release sites. Its predictions are consistent with the model though. One should be concerned this site is what drives the negative observed effect of toxic release sites though.
813850 - Adam’s Morgan - This place was grossly underpredicted previously, and is now pretty grossly overpredicted (it is the highest predicted location in the city). There is alot going on in the area, and is an outlier in the number of bars locally and neighboring. The standardized residuals are not that large because the variance in the Poisson model increases with the expectation, so large predictions are known to have a large variance.
The other locations generally have nearby place attractors that can explain the large amount of crime. Given the discretization of street units they are likely not associated with some of the place generators though. This might suggest more spatial lags of generators (hospitals, schools) should be included in the model. For example, 905861 (Martin Luther King Jr. Ave SE) is nearby but not adjacent to a large apartment complex. It is possible that crime occurring on the grounds of this complex are assigned to that location.

Interactive Map of Final Data Set

Using Google Fusion tables I have uploaded all of the data used to estimate the final models, as well as geographies of the Voronoi tesselations. The Fusion table, DC Crime Data at Micro Places, can be found at this link, and embedded below is an interactive map that includes the point locations for the 21,506 street units as well as a descriptive pop-up of some of the model information. If you go to the Fusion table the entire set of data used in the models is available for download.

To view the Thiessen polygons that are associated with the data I have created a separate Fusion table, Merge of Thiessen Areas for D.C. Street units and DC Crime Data at Micro Places. This ends up with 21,514 rows of data, because two of the street unit areas in clipping the Thiessen polygon to the boundary of the city result in multi-part polygons (e.g. have several disconnected areas but are still associated with the same street unit). These two street units numbers, marid or name, are 805196 and 812383.

Here is a brief review of the meta-data for each of the fields. Again if you have questions or want access to the original data these derived datasets are based off of feel free to send me an email.

Supplemental Online Maps for Dissertation

By Andrew Wheeler -

As of 7/18/2014 - http://andrewpwheeler.wordpress.com/

Introduction

Outlier Incidents: Bars and Crime

Updated Outlier Incidents

Interactive Map of Final Data Set