Skip to contents

Retrieves data from the Open Crime Database for the specified years. Latitude and longitude are specified using the WGS 84 (EPSG:4326) co-ordinate reference system.


  years = NULL,
  cities = NULL,
  type = "sample",
  cache = TRUE,
  quiet = FALSE,
  output = "tbl"



A single integer or vector of integers specifying the years for which data should be retrieved. If NULL (the default), data for the most recent year will be returned.


A character vector of city names for which data should be retrieved. Case insensitive. If NULL (the default), data for all available cities will be returned.


Either "sample" (the default), "core" or "extended".


Should the result be cached and then re-used if the function is called again with the same arguments?


Should messages and warnings relating to data availability and processing be suppressed?


Should the data be returned as a tibble by specifying "tbl" (the default) or as a simple features (SF) object using WGS 84 by specifying "sf"?


A tibble containing data from the Open Crime Database.


By default this function returns a one-percent sample of the 'core' data. This is the default to minimize accidentally requesting large files over a network.

Setting type = "core" retrieves the core fields (e.g. the type, co-ordinates and date/time of each offense) for each offense. The data retrieved by setting type = "extended" includes all available fields provided by the police department in each city. The extended data fields have not been harmonized across cities, so will require further cleaning before most types of analysis.

Requesting all data (more than 17 million rows) may lead to problems with memory capacity. Consider downloading smaller quantities of data (e.g. using type = "sample") for exploratory analysis.

Setting output = "sf" returns the data in simple features format by calling sf::st_as_sf(..., crs = 4326, remove = FALSE)


# \donttest{
# Retrieve a 1% sample of data for specific years and cities
  years = 2016:2017,
  cities = c("Tucson", "Virginia Beach"),
  quiet = TRUE
#> # A tibble: 4,910 × 10
#>         uid city_name offense_code offense_type    offense_group offense_against
#>       <int> <fct>     <fct>        <fct>           <fct>         <fct>          
#>  1 26330033 Tucson    26U          other fraud     fraud offens… property       
#>  2 26330392 Tucson    23F          theft from mot… larceny/thef… property       
#>  3 26330434 Tucson    23H          all other larc… larceny/thef… property       
#>  4 26330458 Tucson    290          destruction/da… destruction/… property       
#>  5 26330469 Tucson    23F          theft from mot… larceny/thef… property       
#>  6 26330509 Tucson    23G          theft of motor… larceny/thef… property       
#>  7 26330744 Tucson    26A          false pretense… fraud offens… property       
#>  8 26330852 Tucson    90Z          all other offe… all other of… other          
#>  9 26331052 Tucson    90Z          all other offe… all other of… other          
#> 10 26331270 Tucson    22U          other burglary… burglary/bre… property       
#> # … with 4,900 more rows, and 4 more variables: date_single <dttm>,
#> #   longitude <dbl>, latitude <dbl>, census_block <chr>
# }