Retrieves data from the Open Crime Database for the specified years. Latitude and longitude are specified using the WGS 84 (EPSG:4326) co-ordinate reference system.
Usage
get_crime_data(
years = NULL,
cities = NULL,
type = "sample",
cache = TRUE,
quiet = FALSE,
output = "tbl"
)
Arguments
- years
A single integer or vector of integers specifying the years for which data should be retrieved. If NULL (the default), data for the most recent year will be returned.
- cities
A character vector of city names for which data should be retrieved. Case insensitive. If NULL (the default), data for all available cities will be returned.
- type
Either "sample" (the default), "core" or "extended".
- cache
Should the result be cached and then re-used if the function is called again with the same arguments?
- quiet
Should messages and warnings relating to data availability and processing be suppressed?
- output
Should the data be returned as a tibble by specifying "tbl" (the default) or as a simple features (SF) object using WGS 84 by specifying "sf"?
Details
By default this function returns a one-percent sample of the 'core' data. This is the default to minimize accidentally requesting large files over a network.
Setting type = "core" retrieves the core fields (e.g. the type, co-ordinates and date/time of each offense) for each offense. The data retrieved by setting type = "extended" includes all available fields provided by the police department in each city. The extended data fields have not been harmonized across cities, so will require further cleaning before most types of analysis.
Requesting all data (more than 17 million rows) may lead to problems with memory capacity. Consider downloading smaller quantities of data (e.g. using type = "sample") for exploratory analysis.
Setting output = "sf" returns the data in simple features format by calling
sf::st_as_sf(..., crs = 4326, remove = FALSE)
Examples
# \donttest{
# Retrieve a 1% sample of data for specific years and cities
get_crime_data(
years = 2016:2017,
cities = c("Tucson", "Virginia Beach"),
quiet = TRUE
)
#> # A tibble: 4,910 × 10
#> uid city_name offense_code offense_type offense_group offense_against
#> <int> <fct> <fct> <fct> <fct> <fct>
#> 1 26330033 Tucson 26U other fraud fraud offens… property
#> 2 26330392 Tucson 23F theft from mot… larceny/thef… property
#> 3 26330434 Tucson 23H all other larc… larceny/thef… property
#> 4 26330458 Tucson 290 destruction/da… destruction/… property
#> 5 26330469 Tucson 23F theft from mot… larceny/thef… property
#> 6 26330509 Tucson 23G theft of motor… larceny/thef… property
#> 7 26330744 Tucson 26A false pretense… fraud offens… property
#> 8 26330852 Tucson 90Z all other offe… all other of… other
#> 9 26331052 Tucson 90Z all other offe… all other of… other
#> 10 26331270 Tucson 22U other burglary… burglary/bre… property
#> # … with 4,900 more rows, and 4 more variables: date_single <dttm>,
#> # longitude <dbl>, latitude <dbl>, census_block <chr>
# }