RNRFA interface between R and the UK National River Flow Archive data APIs

Demo at http://cvitolo.github.io/r_rnrfa

Claudia Vitolo
Post Doctoral Research Fellow, Brunel University London

Download

National River Flow Archive

The UK National River Flow Archive serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors.


There are currently data APIs under development that provide access to the following services:

  • metadata catalogue (JSON),

  • catalogue filters based on a geographical bounding-box,

  • catalogue filters based on metadata entries,

  • gauged daily data and catchment mean rainfall for about 400 stations (WaterML, the OGC standard used to describe hydrological time series).

The RNRFA package

The rnrfa package aims to achieve a simpler and more efficient access to data by providing wrapper functions to assemble HTTP GET requests and parse XML/JSON responses.


If you wish to use NRFA Data, please refer to the following Terms & Conditions: http://nrfa.ceh.ac.uk/costs-terms-and-conditions


To cite this software:
Vitolo C. and Fry M., R interface for the National River Flow Archive (rnrfa, R package version 0.4.0), GitHub repository, https://github.com/cvitolo/r_rnrfa,
DOI: http://dx.doi.org/10.5281/zenodo.14722

Dependencies

The rnrfa package is dependent on a number of CRAN packages. Install them first:

install.packages(c("XML2R", "RCurl", "zoo", "rjson", "rgdal", "sp", "stringr"))


This demo makes also use of external libraries. To install and load them run the following commands:

packs <- c("devtools", "parallel", "ggplot2", "DT", "leaflet", "dygraphs")
install.packages(packs)
lapply(packs, require, character.only = TRUE)

Installation

The stable version of the rnrfa package is available from CRAN:

# Install stable version
install.packages("rnrfa")



The development version is available from GitHub via devtools. This is preferred option to test this demo. Install and load rnrfa development version using the following commands:

# Install dev version
install_github("cvitolo/r_rnrfa", subdir = "rnrfa")

Load the package

Load the package using the following command:

# Load the rnrfa package
library(rnrfa)
## RNRFA (Versions 0.4.1, built: )
## 
## +----------------------------------------------------------------+
## |  If you wish to use NRFA Data, please refer to the following   |
## |  Terms & Conditions:                                           |
## |  http://nrfa.ceh.ac.uk/costs-terms-and-conditions              |
## +----------------------------------------------------------------+

List of monitoring stations

The function catalogue(), used with no inputs, requests the full list of gauging stations.

# Retrieve information for all the stations in the catalogue:
allStations <- catalogue()

dim(allStations)
## [1] 1539   20
allStations[1:2,1:4]
##     id                   name  location     river
## 1 1001        Wick at Tarroul   Tarroul      Wick
## 2 2001 Helmsdale at Kilphedir Kilphedir Helmsdale

The output is a data object of type data.frame, containing 1539 records (total number of monitored gauging stations) and 20 columns (total number of metadata entries available).

Station information (1)

The function catalogue() can be used to filter stations based on various criteria, listed below:

(1) id = Station identification number
(2) name = Name of the station
(3) location = Area in which the station is located
(4) river = River catchment
(5) stationDescription = General station description, e.g. weirs, ratings, etc.
(6) catchmentDescription = Information on topography, geology, land cover, etc.
(7) hydrometricArea = UK hydrometric area identification number
(8) operator = UK measuring authorities
(9) haName = Hydrometric Area name
(10) gridReference = OS Grid Reference number

Station information (2)

(11) stationType = Type of station (e.g. flume, weir, etc.)
(12) catchmentArea = Catchment area in (Km2)
(13) gdfStart = Year in which recordings started
(14) gdfEnd = Year in which recordings ended
(15) farText = Information on the regime (e.g. natural, regulated, etc.)
(16) categories = various tags (e.g. FEH_POOLING, FEH_QMED)
(17) altitude = Altitude measured in metres above Ordnance Datum or, in Northern Ireland, Malin Head.
(18) sensitivity = Sensitivity index calculated as the percentage change in flow associated with a 10 mm increase in stage at the \(Q_{95}\) flow.
(19) lat = a numeric vector of latitude coordinates.
(20) lon = a numeric vector of longitude coordinates.

Selected information

(1) id = Station identification number
(2) name = Name of the station
(9) haName = Hydrometric Area name
(10) gridReference = OS Grid Reference number
(12) catchmentArea = Catchment area in (Km2)

selectedInfo <- c(1,2,9,10,12)

Filter catalogue using station identification numbers

# Filter stations based on identification number
someStations <- catalogue(metadataColumn="id",
                          entryValue=c(3001,3002,3003))

dim(someStations)
## [1]  3 20
someStations[1:2,selectedInfo]
##     id                 name     haName gridReference catchmentArea
## 4 3001        Shin at Lairg Shin Group      NC581062         494.6
## 5 3002 Carron at Sgodachail Shin Group      NH491921         241.1

Filter catalogue using a geographic bounding box

# Define a bounding box:
bbox <- list(lonMin=-3.82, lonMax=-3.63, latMin=52.43, latMax=52.52)

# Filter stations based on bounding box
someStations <- catalogue(bbox)

dim(someStations)
## [1]  9 20
someStations[1:2,selectedInfo]
##      id                         name haName gridReference catchmentArea
## 1 54022    Severn at Plynlimon flume Severn      SN853872           8.7
## 2 54090 Tanllwyth at Tanllwyth Flume Severn      SN843876           0.9

Filter catalogue using a selected hydrometric area

# Filter stations belonging to a certain hydrometric area
someStations <- catalogue(metadataColumn="haName", entryValue="Wye (Hereford)")

dim(someStations)
## [1] 34 20
someStations[1:2,selectedInfo]
##         id           name         haName gridReference catchmentArea
## 1101 55001  Wye at Cadora Wye (Hereford)      SO535090          4040
## 1102 55002 Wye at Belmont Wye (Hereford)      SO485387        1895.9

Filter catalogue using catchment area

# Filter stations based on catchment area
someStations <- catalogue(metadataColumn="catchmentArea", entryValue=">1")

dim(someStations)
## [1] 1531   20
someStations[1:2,selectedInfo]
##     id                   name          haName gridReference catchmentArea
## 1 1001        Wick at Tarroul      Wick Group      ND262549         161.9
## 2 2001 Helmsdale at Kilphedir Helmsdale Group      NC998181         551.4

Filter catalogue using minimum number of recording years

# Filter based on minimum number of recording years
someStations <- catalogue(minRec=30)

dim(someStations)
## [1] 1054   20
someStations[1:2,selectedInfo]
##     id                   name          haName gridReference catchmentArea
## 2 2001 Helmsdale at Kilphedir Helmsdale Group      NC998181         551.4
## 5 3002   Carron at Sgodachail      Shin Group      NH491921         241.1

Complex queries can be run combining multiple selection criteria

# Example 1
someStations <- catalogue(bbox,
                          metadataColumn="haName", entryValue="Wye (Hereford)")

# Example 2
someStations <- catalogue(bbox,
                          metadataColumn="catchmentArea", entryValue=">1",
                          minRec=30)

# Example 3
someStations <- catalogue(bbox,
                          metadataColumn="id",
                          entryValue=c(54022,54090,54091,54092,54097),
                          minRec=35)

Conversions

NRFA stations are located based on the OS grid reference (column 10, "gridRef"). The rnrfa package allows convenient conversion to more standard coordinate systems. The function OSGParse() converts the string to easting and northing in the British/Irish National Grid coordinate system (EPSG code: 27700/29902) by default.

# Convert OS Grid reference to BNG
OSGparse("SN853872")
## $xlon
## [1] 285300
## 
## $ylat
## [1] 287200

Conversions

To get coordinates in latitude and longitude (WSGS84 coordinate system, EPSG code: 4326) use the parameter CoordSystem = "WGS84".

# Convert BNG to WSGS84
OSGparse("SN853872", CoordSystem = "WGS84")
## $xlon
## [1] -3.689987
## 
## $ylat
## [1] 52.47065

Conversions

OSGparse() also works with multiple references:

OSGparse(someStations$gridReference)
## $xlon
## [1] 285300 284300 284600
## 
## $ylat
## [1] 287200 287600 287300

Get time series data

Stations id numbers can be used to retrieve time series data. These data are automatically converted from WaterML2 format to time series object of class zoo.

The National River Flow Archive serves two types of time series data:

  • Gauged Daily Flows, get data using the function GDF()

  • Catchment Mean Rainfall, get data using the function CMR()

Gauged Daily Flows, GDF()

This function accepts one input, the station id (as single string or character vector). Here is how to retrieve daily flows for Shin at Lairg (id = 3001) catchment.

# Fetch time series data and metadata from the waterml2 service
data <- GDF("3001")
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=3001&dt=gdf
metadata <- GDFmeta("3001")
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=3001&dt=gdf

Gauged Daily Flows, GDF()

plot(data, main=paste("Daily flow data for", metadata$stationName, "catchment"),
     xlab="", ylab=expression(m^3/s))

plot of chunk unnamed-chunk-14

Catchment Mean Rainfall, CMR()

This function accepts one input, the station id (as single string or character vector). Here is how to retrieve rainfall data for Shin at Lairg (id = 3001) catchment.

# Fetch time series data from the waterml2 service
data <- CMR("3001")
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=3001&dt=cmr
metadata <- CMRmeta("3001")
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=3001&dt=cmr

Catchment Mean Rainfall, CMR()

plot(data, main=paste("Monthly rainfall for",metadata$stationName,"catchment"), 
     xlab="", ylab=expression(mm))

plot of chunk unnamed-chunk-16

INTEROPERABILITY

The rnrfa can be used in combination with other R packages.

For example, to:

  • Upgrade the station data.frame to a searchable data.table
  • Create interactive maps
  • Explore flow and rainfall events using interactive plots
  • Run multiple data requests efficiently

Upgrade your data.frame to a data.table:

### Retrieve information for all the stations in the catalogue:
allStations <- catalogue()

library(DT)
datatable(allStations[,c(1:3,8,9,10,12:14,17)])

Create interactive maps using leaflet:

### Define a bounding box:
bbox <- list(lonMin=-3.82, lonMax=-3.63, latMin=52.43, latMax=52.52)
### Get a list of stations
someStations <- catalogue(bbox)
library(leaflet)
leaflet(data = someStations) %>% addTiles() %>%
  addMarkers(~lon, ~lat, popup = ~as.character(paste(id,name)))

Explore flow and rainfall events using dygraphs:

# Fetch time series data from the waterml2 service
data <- GDF("3001")

library(dygraphs)
dygraph(data) %>% dyRangeSelector()

Parallel processing (1)

A simple benchmark test: calculate the mean flow for the list stored in the object someStations. This task can be performed sequentially (using 1 core) or in parallel (using multiple cores).

# Get flow data with a sequential approach, using the lapply function
system.time(s1 <- lapply(someStations$id, GDF))
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=54022&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=54090&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=54091&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=54092&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=54097&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=55008&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=55033&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=55034&dt=gdf 
## http://nrfaapps.ceh.ac.uk/nrfa/xml/waterml2?db=nrfa_public&stn=55035&dt=gdf
##    user  system elapsed 
## 457.122   1.553 490.963

Parallel processing (2)

# Get flow data with a parallel approach
# using mclapply from the parallel package
library(parallel)

# Use detectCores() to find out how many cores are available on your machine
# then set the mclapply function accordingly
system.time(s2 <- mclapply(someStations$id, GDF, mc.cores = detectCores())) 
##    user  system elapsed 
## 817.635   3.453 351.801

Simple regression

# Calculate the mean flow for each catchment
someStations$meanGDF <- unlist( lapply(s2, mean) )

# Linear model
library(ggplot2)
ggplot(someStations, aes(x = as.numeric(catchmentArea), y = meanGDF)) + 
  geom_point() + stat_smooth(method = "lm", col = "red") +
  xlab(expression(paste("Catchment area [Km^2]",sep=""))) + 
  ylab(expression(paste("Mean flow [m^3/s]",sep="")))

plot of chunk unnamed-chunk-23

Typical application