Focal statistics for python

This module aims to provide focal statistics for python, that runs without the installation of extensive GIS packages. The only dependency is numpy. Additionally, the package provides sliding window functionality to implement your own focal statistics functions (see Tutorials). This is implemented in line with numpy.lib.stride_tricks.sliding_window_view(), but extends its functionality.

Documentation

Installation

Required dependencies

Instructions

Install with

conda install --channel conda-forge focal-stats

Focal statistics

Focal statistics are functions calculated on the neighborhood of the data, often referred to as sliding, or rolling window operations on 2D data. Functions are statistical and range from mean and standard deviation to quantiles and mode (majority).

Conceptually, the algorithm iterates over each pixel of a raster, and looks in all four directions to calculate the focal statistic. The neighborhoods can overlap, but might also be stacked next to each other, so a sparse output is generated.

An example from the ArcGIS documentation, considering the focal sum of a raster with a specified neighborhood of 3x3 pixels. The values in this window are summed and placed in the output array at the location of the most central pixel in the window:

focal sum example

The second example shows what the same example looks like when a complete raster is considered:

focal sum example

The statistical operations implemented in this package are:

None

Note

This tutorial was generated from an IPython notebook that can be downloaded here.

Usage example

import focal_stats
import rasterio as rio
import matplotlib.pyplot as plt
import numpy as np
import os
Matplotlib is building the font cache; this may take a moment.
os.chdir("../../../")

Loading raster (containing water table depth (Fan et al., 2017)).

with rio.open("data/wtd.tif") as f:
    a = f.read(1).astype(np.float64)
    a[a == -999.9] = np.nan

Inspecting the data

plt.imshow(a, cmap='Blues', vmax=100)
plt.title("Water table depth")
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7ff3c4aec550>
_images/tutorial_6_1.png

Focal statistics

Calculation of the focal mean:

plt.imshow(focal_stats.focal_mean(a, window_size=15), vmax=100, cmap="Blues")
<matplotlib.image.AxesImage at 0x7ff3c4184b50>
_images/tutorial_9_1.png

This looks quite similar to the input raster, but with smoothing applied. Let’s try a higher window_size, which should increase the smoothing

plt.imshow(focal_stats.focal_mean(a, window_size=101), vmax=100, cmap="Blues")
<matplotlib.image.AxesImage at 0x7ff3c4112c90>
_images/tutorial_11_1.png

This same functionality can be used to reduce the shape of the raster based on this window_size.

x = focal_stats.focal_mean(a, window_size=108, reduce=True)
plt.imshow(x, vmax=100, cmap="Blues")
<matplotlib.image.AxesImage at 0x7ff3c4084b90>
_images/tutorial_13_1.png

The shape of this new raster is exactly 108 times smaller than the input raster. Note that for this to work both x and y-axes need to be divisible by the window size.

None

Note

This tutorial was generated from an IPython notebook that can be downloaded here.

Creating custom focal statistic function

import focal_stats
import rasterio as rio
import matplotlib.pyplot as plt
import numpy as np
import os
os.chdir("../../../")

Loading raster (containing water table depth (Fan et al., 2017)).

with rio.open("data/wtd.tif") as f:
    a = f.read(1).astype(np.float64)
    a[a == -999.9] = np.nan

Inspecting the data

plt.imshow(a, cmap='Blues', vmax=100)
plt.title("Water table depth")
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f3f63ec3e90>
_images/custom_focal_stats_6_1.png

Creating custom focal mean

Firstly, a windowed version of the input raster needs to be defined.

a_windowed = focal_stats.rolling_window(a, window_size=5)
print(a.shape, a_windowed.shape)
(4320, 5400) (4316, 5396, 5, 5)

This windowed version has a slightly different shape on the first two axes. This is because there is no window behaviour defined on the edges. If this is undesired the original array can be padded with the missing number of columns and rows with numpy.pad. Through this function many different edge value assumptions can be made. Here I use the example of continuing with the closest values.

a_padded = np.pad(a, pad_width=2, mode='edge')
a_windowed_padded = focal_stats.rolling_window(a_padded, window_size=5)
print(a.shape, a_windowed_padded.shape)
(4320, 5400) (4320, 5400, 5, 5)

This has the result that the input and output raster share their first two axes.

Now the only thing that needs to happen is a mean operation on the third and fourth axes:

a_mean = a_windowed.mean(axis=(2, 3))

Plotting this shows that the operation generates an image that is very close to the original raster, with some limited smoothing

plt.imshow(a_mean, cmap="Blues", vmax=100)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f3f62503510>
_images/custom_focal_stats_17_1.png

This can be captured in a custom focal_mean function as follows:

def focal_mean(a, window_size):
    a_windowed = focal_stats.rolling_window(a, window_size=window_size)
    return a_windowed.mean(axis=(2, 3))

Resulting in the same image:

plt.imshow(focal_mean(a, window_size=5), cmap="Blues", vmax=100)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f3f623f4550>
_images/custom_focal_stats_21_1.png

Note that if a single NaN-value was present in the window, it results in a NaN-value. I dealt with this by inserting 0 in the pixels with NaN-values and using the sum of this array divided by the number of valid values per window (e.g. rolling_sum(~np.isnan(a), window_size=5)).

What’s new

API reference

Focal statistics

Focal statistics functions operating on array-like input data. They share functionality of window_size (determining the size of the sliding window), mask (superseding window_size and allowing for non-rectangular windows), fraction_accepted (nan-behaviour based on the fraction of available data in the input window) and reduce (switch between returning same shape as input data and patching the sliding window without overlapping, leading to a much smaller output array). focal_correlation() calculates the correlation between two arrays in contrast to the other functions that operate on a single array.

focal_min

Focal minimum

focal_max

Focal maximum

focal_mean

Focal mean

focal_std

Focal standard deviation

focal_sum

Focal summation

focal_majority

Focal majority

focal_correlation

Focal correlation

Rolling functions

This module additionally implements standalone rolling functions, accepting ND arrays and accepting window_size, mask and reduce similarly to the focal statistics functionality. It does however not guard against NaN occurrences specifically, staying with the raw numpy behaviour. The functions are intended to be used to define custom focal statistics functionality, potentially in higher than 2D dimensionality.

rolling_window(a, *[, window_size, mask, ...])

Takes an array and returns a windowed version

rolling_sum(a, *[, window_size, mask, reduce])

Takes an array and returns the rolling sum.

rolling_mean(a, *[, window_size, mask, reduce])

Takes an array and returns the rolling mean.

License

focal_stats is published under a MIT license.

Indices and tables