[WUR mini MOOCs] WUR logo

At the end of this tutorial, students would be able to:

  • Understand the sources of uncertainties when validating AGB maps
  • Quantify these uncertainties and compare estimates of AGB from plots and maps to assess map accuracy

Background

Aboveground biomass (AGB) maps provide data and information of the carbon stored by the terrestrial vegetation. AGB maps are created through remote sensing (RS) as satellite images and their signals of the forests and other woody vegetation can be translated to biomass. AGB is also considered to be an Essential Climate Variable (ECV) due to its value for climate and carbon modelling. Have a look at this new global AGB map produced using Synthetic Apertrure Radar (SAR) at 100 m spatial scale (here we look at a resampled coarser map for viewing purposes):

agb_glob <- raster(paste0(dataDir,'/GlobBiomass_25km.tif'))
plot(agb_glob, main='GlobBiomass 2010 (Mg/ha)')

These spatially explicit maps have been created over the past decade mostly for the purpose of quantifying the amount of carbon in the terrestrial vegetation everywhere in the world. Currently, there are at least 15 global maps existing, and with more to come because of the recent investments in satellite missions dedicated to measure forest structure and AGB such as GEDI, NISAR, BIOMASS.

But these maps can be imperfect as there are several reported limitations when using spaceborne images to produce AGB maps i.e. signal saturation, leading to potential bias in the maps. But how do we validate these biases, in other words, how do we validate the maps?

Usually, an “independent” validation is needed using in-situ/ground-truth data i.e. biomass plot data. But like the maps, plot data are not error-free. Have a look at this boxplot of plot uncertainty (in standard deviation; SD) when using small plots that are < 0.3 ha for validating a global map of 1 km pixel size.

sd_samp <- read.csv(paste0(dataDir,'/sd_plt.csv'))
boxplot(sd_samp$sd, main='Uncertainty (Mg/ha) of small plots when compared with AGB map (1 km)')

Functionalities

So before comparing plots and maps, their uncertainties should be accounted! That is the main goal of the Plot-to-Map tool (p2m). Here we will estimate three prominent uncertainty sources in the plot data.

The largest plot uncertainty comes from tree measurement because AGB is indirectly measured; destructive harvest or cutting many trees to weigh them is undesirable! So while estimating the biomass of every tree, p2m can also estimate the uncertainty from tree measurement errors. We adopted the error propagation method of BIOMASS R pakcage (Rejou-Mechain et al. 2017) where errors from tree diameter, height, wood density, and allometric model parameters are all propagated using Monte Carlo simulations and respective probability distribution functions.

For plot data with tree-level measurements, diameter is required and if wood density and tree height are missing, those variables can be obtained from a regional/global database and a Height-Diameter model, respectively. Take a look at the portion of this sample data:

plotTree<- read.csv(paste0(dataDir, '/SampleTree.csv')) 
xy <- read.csv(paste0(dataDir,'/SampleTreeXY.csv'))
head(plotTree)
##     id      genus    species  diameter  size                 fez      gez year
## 1 BSP1 Terminalia  bellirica  3.501409 10000 tropical rainforest tropical 1996
## 2 BSP1   Ziziphus   oenoplia  3.819719 10000 tropical rainforest tropical 1996
## 3 BSP1    Aporosa lindleyana 16.870424 10000 tropical rainforest tropical 1996
## 4 BSP1      Ixora  brachiata  4.138029 10000 tropical rainforest tropical 1996
## 5 BSP1   Wrightia    arborea  5.411268 10000 tropical rainforest tropical 1996
## 6 BSP1      Ixora  brachiata  3.183099 10000 tropical rainforest tropical 1996

This sample data is a research plot in India where plots are 1-ha (size=m2) and is preferred compared to small plots (recall the box plot of SD) since it has the same size with the global map for this exercise at 100 m pixel size.

To estimate the uncertainty from measurement errors, we will use the BIOMASS R package which will give us an estimated AGB per plot and its associated uncertainty (SD), which we can scale into AGB Mg/ha.

# we only select relatively large trees
plotTree <- subset(plotTree, diameter >= 10) 

# taxonomy correction needed for assigning wood density values based on global and regional data (not needed if plot data has WD)
tax <- correctTaxo(genus = plotTree$genus, species = plotTree$species)
plotTree$genus <- tax$genusCorrected
plotTree$species <- tax$speciesCorrected

# get wood density using global database; mean WD of family will be assigned for uncorrected taxa
wd <- getWoodDensity(genus = plotTree$genus, species = plotTree$species, stand = plotTree$id, region='World') 
plotTree$wd <- wd$meanWD
plotTree$sd.wd<- wd$sdWD

# Try changing the argument 'region' for some regional data source and check the difference from the WD usign 'World'.

# Compute height from diameter using a Height-Diameter (HD) model. It is possible (and even suggested) to use actual height data if included in the forest inventory. This will include a 
HDmodel <- modelHD(D = NouraguesHD$D, H = NouraguesHD$H, method='weibull', useWeight = TRUE)
dataHlocal <- retrieveH(D = plotTree$diameter, model = HDmodel)
plotTree$height <- dataHlocal$H
  
# Run MC simulations now that we have diamter, height and WD, and their SDs. The argument "Dpropag" is the propagation method for diameter using data from destructive harvest in Chave 2004 study. This will estimate AGB and its SD at the stand level
mc <- by(plotTree, plotTree$id, function(x) AGBmonteCarlo(D = x$diameter, WD = x$wd, errWD = x$sd.wd, HDmodel = HDmodel, Dpropag = "chave2004"),simplify = F)


# Get AGB and associated SD at stand-level (you may need some scaling if plot data is not one hectare)
agb <- unlist(sapply(mc, "[", 1))
sd <- unlist(sapply(mc, "[", 3))
  
# We need to assign the coordinates per plot ID and summarize the estimates at the plot level
plotTree <- left_join(plotTree, xy, by = c('id' = 'id'))
plotTree <- plotTree[,c("id","x","y", 'size', 'year')] # retain columns of interest
plotTree$x <- as.numeric(plotTree$x)
plotTree$y <- as.numeric(plotTree$y)
  
plotHa <- plotTree %>% group_by(id) %>% summarise_all(funs(mean)) 
  
#scale values per ha
plotHa$agb <- agb
plotHa$sd <- sd
plotHa <- as.data.frame(plotHa[,c("id","x","y", 'size', 'year', 'agb', 'sd')]) # retain columns of interest
plotHa$size <- plotHa$size / 10000
names(plotHa) <- c('pltID', 'POINT_X', 'POINT_Y', 'SIZE_HA', 'AVG_YEAR',   'AGB_T_HA', 'sdTree')
hist(plotHa$sdTree, main='SD of measurement error (Mg/ha)')
boxplot(plotHa$sdTree, main='SD of measurement error (Mg/ha)')

If tree-level data is unavailable, which is often the case when conducting a global map validation (data sources often not sharing tree level data), an alternative to estimate the same uncertainty of measurement error is to use a pre-trained model that would “predict” the measurement error based on the AGB, plot size, and eco-region/biome. The function “BiomePair” will automatically label the corresponding biomes (e.g. Temperate Coniferous Forest) of your plots.

# Load the function
source(paste0(scriptsDir,'/BiomePair.R'))

# Get biomes and zones 
plotsHaSamp <- read.csv(paste0(dataDir,'/SamplePlots.csv'))[1:100,] #sample plots without tree data
plotsHaSamp <- BiomePair(plotsHaSamp) #biome labeller
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\PlotToMap_MOOC\data", layer: "eco_zone"
## with 22 features
## It has 1 fields
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\PlotToMap_MOOC\data", layer: "world_region"
## with 99 features
## It has 2 fields
plotsPred <- plotsHaSamp[,c('AGB_T_HA','SIZE_HA', 'GEZ')]
names(plotsPred) <- c('agb', 'size', 'gez')
plotsPred$size <- as.numeric(plotsPred$size) * 10000 #convert size to m2
plotsPred$gez <- factor(plotsPred$gez, levels = c("Boreal","Subtropical","Temperate","Tropical")) 
load(paste0(dataDir, '/rf1.RData')) #pre-trained RF model from 8000+ plots across all major biomes 
plotsPred <- subset(plotsPred, !is.na(plotsPred$size))
plotsHaSamp$sdTree <- predict(rf1, plotsPred)[[1]]
hist(plotsHaSamp$sdTree, main='SD of measurement error (Mg/ha)')
boxplot(plotsHaSamp$sdTree, main='SD of measurement error (Mg/ha)')

Next, your plot biomass data may need a bit of pre-processing if they will be used to compare with a map from a different year at spatial resolution. At this point, we will use pre-created functions to make things more efficient. First, if the plot data is surveyed before or after the AGB map epoch, the function “TempApply” and “TempVar” will modify the plot biomass based on the 2020 IPCC growth rate data unique to biomes and continents. Functions “HistoShift” and “HistoVis” summarizes the outcomes. We will use the previous sample plots.

source(paste0(scriptsDir,'/TempFix.R'))
source(paste0(scriptsDir,'/TempVis.R'))

# Apply growth data to whole plot data by identifying AGB map year
yr <- 2000+mapYear
gez <- sort(as.vector((unique(plotsHaSamp$GEZ)))) #get unique gez and without NA (sorting removes it also)
plotsHaSamp <- ldply(lapply (1:length(gez), function(x) 
  TempApply(plotsHaSamp, gez[[x]], yr)), data.frame) #make sure mapYear is set!

# Tree growth data uncertainty estimate
plotsHaSamp <- ldply(lapply (1:length(gez), function(x) 
  TempVar(plotsHaSamp, gez[[x]], yr)), data.frame) 


# Histogram of temporal fix effect (before = plot AGB during plot inventory year; after = plot AGB in 2018 after the adjustment; overlap is included just to highlight unchanged histogram)
dir.create(file.path(outDir), recursive = TRUE)
HistoTemp(plotsHaSamp, yr)
# Summary table of the AGBs before and after temporal adjustment per AGB bin
HistoShift(plotsHaSamp, yr) 
##    agb_Mgha_bins n_pre n_post agb_Mgha_pre agb_Mgha_post
## 1         (0,20]    24     18       7.5975      10.59389
## 2        (20,40]    12     18      29.4475      29.35389
## 3        (40,60]     6      5      48.3200      51.90400
## 4        (60,80]     8      6      69.4825      70.46333
## 5       (80,100]    10      5      89.8010      90.92200
## 6      (100,120]     2      8     103.4300     110.59875
## 7      (120,140]     4      5     130.0025     130.06400
## 8      (140,160]     3      4     148.1500     148.61000
## 9      (160,180]     2      2     174.0650     174.06500
## 10     (180,200]     3      3     189.1867     187.55333
## 11     (200,300]     8      8     259.9588     257.57125
## 12     (300,400]     8      8     352.7950     353.33250
## 13     (400,500]     2      2     448.0000     451.25000
## 14     (500,Inf]     8      8     690.8937     690.76625

Another pre-processing step is to scale the plot biomass based on the “forest” variation between the AGB map pixel and plot. This used the global forest change (Hansel et al. 2013) dataset to identify the forest fraction of the map pixel and multiply it to the plot biomass (function “InvDasymetry”). The GFC dataset is also used to remove already deforested plots relative to the AGB map epoch (function “Deforested”).

Then, the uncertainty from the spatial mismatch between pixel and plots (samping error) is estimated using geostatistical approach. We did not include this estimation due to the on-going integration of such functionality to p2m and since our plot size equals the biomass map pixel.

Lastly, weighted comparison of plot data with the AGB map is needed to account for plot uncertainty. The inverse of the plot variance is used to estimate the weighted mean. This is applicable for validating global maps where the comparison is aggregated at coarser scales i.e. 0.1 degree spatial scale From here we load an Rdata of the comparison between the WUR plots and the GEOCARBON map at 0.1 degree for the global climate and carbon modellers. The comparison results can be visualized through binned graphs and with a tabulated summary statistics:

Results

source(paste0(scriptsDir,'/Binned.R'))
plots01 <- get(load(paste0(dataDir,'/GlobBiomass.Rdata')))
Binned(plots01$plotAGB_10, plots01$mapAGB,outDir, 'GlobBiomass 2010', 'samp.png') 

With such comparison at 0.1 degrees, the scatter/random errors can be partially removed ,leaving map bias as the dominant error. Such bias can be predicted using a model-based approach. Once predicted spatially through ecological covariates, bias can be subtracted to the map AGB and partially removed. See paper below currently under review.

Thank you and for more info, you can access Plot-to-map with technical documentation at:

References

  • Araza et al., (2021). A comprehensive framework for assessing the accuracy and uncertainty of global above-ground biomass maps. Under review. ## References:
  • Chave, J., Condit, R., Aguilar, S., Hernandez, A., Lao, S., & Perez, R. (2004). Error propagation and scaling for tropical forest biomass estimates. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 359(1443), 409-420.
  • Rejou-Mechain, M., Tanguy, A., Piponiot, C., Chave, J., & H?rault, B. (2017). biomass: An r package for estimating above-ground biomass and its uncertainty in tropical forests. Methods in Ecology and Evolution, 8(9), 1163-1167.

Repository

  • https://github.com/arnanaraza/PlotToMap