The motivation for these interactive maps are to present supplemental information for my dissertation on crime at micro-places in Washington, D.C. If you have any questions, or would like access to any of the original data feel free to send me an email. I will update this document to link to the dissertation - but for now feel free to either email me or see my web-page, http://andrewpwheeler.wordpress.com/, for updates.
This map portrays outlier incidents from the model predicting crime by the bars on local and neighboring streets. If you click on the icons it shows the associated information from the model - and a table of the same information is below the map. I have given names to some of the locations as generic descriptors, not as absolute reference to the location.
MarID | TotalLic | NeighBar | TotalCrime | PredMeanCrime | PredLogCrime | StdErrPredLogCrime | DevResVal | StdPearResid | Area Name |
---|---|---|---|---|---|---|---|---|---|
800252 | 12 | 7 | 22 | 272 | 5.606 | .454 | -1.776 | -.487 | 9th and Florida |
800728 | 2 | 27 | 5 | 316 | 5.756 | .340 | -2.478 | -.516 | Jeff. and Conn. |
804518 | 12 | 7 | 48 | 309 | 5.734 | .453 | -1.422 | -.448 | U St. |
813230 | 3 | 28 | 14 | 365 | 5.899 | .359 | -2.129 | -.506 | Verizon Center |
810506 | 12 | 0 | 1 | 932 | 6.837 | .668 | -3.302 | -.540 | Union Station |
807751 | 1 | 26 | 0 | 215 | 5.371 | .344 | -3.279 | -.524 | Adam’s Morgan |
813215 | 0 | 27 | 29 | 966 | 6.874 | .364 | -2.245 | -.509 | Adam’s Morgan |
815376 | 0 | 34 | 6 | 3911 | 8.272 | .458 | -3.287 | -.527 | Adam’s Morgan |
905208 | 1 | 28 | 1 | 264 | 5.577 | .362 | -2.898 | -.522 | Adam’s Morgan |
813850 | 20 | 22 | 63 | 8 | 2.061 | .843 | 3.026 | 4.896 | Adam’s Morgan |
800897 | 2 | 4 | 249 | 11 | 2.440 | .090 | 5.799 | 10.657 | DC USA (Colum. Heights) |
Fitting a more general model of crime and including other variables in the model, like recreational areas or shopping areas, improved the fit of these same 11 locations. Here is an updated table with these locations and there predictions from the more general model compared to the previous fit only including bars. The models labelled as (B) are the estimates and residuals from the original bars model, and the estimates labelled as (G) are from the more general model of crime.
The only location that appears to have a worse fit is 813850, a location from the Adam’s Morgan cluster. Previously it was severely under-predicted, and now it is severely over-predicted (it is actually the location with the highest predicted crime total in the entire sample for the general model). Because the variance of this location is inflated though, the Pearson and deviance residuals are not very large. All other locations the predictions are clearly closer to the true values, although are still not very good predictions!
MarID | Description | Crimes | Predicted (B) | Predicted (G) | Pearson (B) | Pearson (G) | Deviance (B) | Deviance (G) |
---|---|---|---|---|---|---|---|---|
800252 | 9th and Florida | 22 | 272 | 86 | -0.5 | -0.5 | -1.8 | -1.1 |
800728 | Jeff. and Conn. | 5 | 316 | 47 | -0.5 | -0.6 | -2.5 | -1.6 |
804518 | U St. | 48 | 309 | 70 | -0.4 | -0.2 | -1.4 | -0.4 |
813230 | Verizon Center | 14 | 365 | 99 | -0.5 | -0.5 | -2.1 | -1.5 |
810506 | Union Station | 1 | 932 | 59 | -0.5 | -0.7 | -3.3 | -2.3 |
807751 | Adam’s Morgan | 0 | 215 | 58 | -0.5 | -0.6 | -3.3 | -2.9 |
813215 | Adam’s Morgan | 29 | 966 | 118 | -0.5 | -0.5 | -2.2 | -1.1 |
815376 | Adam’s Morgan | 6 | 3,911 | 126 | -0.5 | -0.6 | -3.3 | -2.0 |
905208 | Adam’s Morgan | 1 | 264 | 24 | -0.5 | -0.6 | -2.9 | -1.9 |
813850 | Adam’s Morgan | 63 | 8 | 1,510 | 4.9 | -0.7 | 3.0 | -2.1 |
800897 | DC USA (Colum. Heights) | 249 | 11 | 21 | 10.7 | 6.8 | 5.8 | 4.0 |
Identifying outlying locations in the new model was done by examining Pearson and deviance residuals, as well as Cook’s distance and leverage values. The Pearson residuals were decidedly not normal and had a long tail with a strong right skew. Locations with a Pearson residual over 30 (so are large under-predictions) were selected - although the distribution had a continuous right tail so other values could be chosen. A few locations were also selected as having large, outlying leverage values for the Cook’s distance or leverage measure. The two figures below show the process of identifying outliers.
The first scatterplot is the linear predictor on the x axis and the Pearson residual on the y axis. Points marked as outliers with a Pearson residual above 30 are labelled, although one can see the long tail of Pearson residuals so the cut off is somewhat arbitrary. The street unit 813850 is also shown to clearly be the largest predicted value in the distribution. The second scatterplot display the Cook’s distance values on the x axis and the leverage values on the y axis. Most cluster near the origin and the values appear to be orthogonal. A few locations that were outliers in the plot on either or both axes are labelled.
These same locations are presented in the map below.
Briefly are some discussion about the locations and potential explanations why the model failed to predict these locations accurately.
Using Google Fusion tables I have uploaded all of the data used to estimate the final models, as well as geographies of the Voronoi tesselations. The Fusion table, DC Crime Data at Micro Places, can be found at this link, and embedded below is an interactive map that includes the point locations for the 21,506 street units as well as a descriptive pop-up of some of the model information. If you go to the Fusion table the entire set of data used in the models is available for download.
To view the Thiessen polygons that are associated with the data I have created a separate Fusion table, Merge of Thiessen Areas for D.C. Street units and DC Crime Data at Micro Places. This ends up with 21,514 rows of data, because two of the street unit areas in clipping the Thiessen polygon to the boundary of the city result in multi-part polygons (e.g. have several disconnected areas but are still associated with the same street unit). These two street units numbers, marid or name, are 805196 and 812383.
Here is a brief review of the meta-data for each of the fields. Again if you have questions or want access to the original data these derived datasets are based off of feel free to send me an email.