This section we use SOC data at 89 test locations to validate ordinary kriging prediction from 386 training data. Since we will not use the 89 points of the test data set either to fit the model or to predict, these are an independent test of the model. We can compare the predictions to the actual values.
library(plyr)
library(dplyr)
library(gstat)
library(raster)
library(ggplot2)
library(car)
library(classInt)
library(RStoolbox)
library(spatstat)
library(dismo)
library(fields)
library(gridExtra)
The soil organic carbon data (train and test data set) that we have created before is available for download from here.
# Define data folder
dataFolder<-"D:\\Dropbox\\Spatial Data Analysis and Processing in R\\DATA_08\\DATA_08\\"
train<-read.csv(paste0(dataFolder,"train_data.csv"), header= TRUE)
test<-read.csv(paste0(dataFolder,"test_data.csv"), header= TRUE)
Power Transform uses the maximum likelihood-like approach of Box and Cox (1964) to select a transformation of a univariate or multivariate response for normality. First we have to calculate appropriate transformation parameters using powerTransform() function of car package and then use this parameter to transform the data using bcPower() function.
powerTransform(train$SOC)
## Estimated transformation parameter
## train$SOC
## 0.2523339
powerTransform(test$SOC)
## Estimated transformation parameter
## test$SOC
## 0.3379903
train$SOC.bc<-bcPower(train$SOC, 0.2523339)
test$SOC.bc<-bcPower(test$SOC, 0.3379903)
coordinates(train) = ~x+y
coordinates(test) = ~x+y
# Variogram
v<-variogram(SOC.bc~ 1, data = train, cloud=F)
# Intial parameter set by eye esitmation
m<-vgm(1.5,"Exp",40000,0.5)
# least square fit
m.f<-fit.variogram(v, m)
m.f
## model psill range
## 1 Nug 0.5165678 0.00
## 2 Exp 1.0816886 82374.23
We will evaluate the model with k-fold cross validation. We will use krige.cv() function
val<-krige(SOC.bc ~ 1,
train,
test,
model = m.f)
## [using ordinary kriging]
summary(val)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## x -1185457 95251.19
## y 1102846 2514145.42
## Is projected: NA
## proj4string : [NA]
## Number of points: 101
## Data attributes:
## var1.pred var1.var
## Min. :0.1426 Min. :0.7888
## 1st Qu.:1.3910 1st Qu.:0.9471
## Median :1.8303 Median :1.0099
## Mean :1.9214 Mean :1.0212
## 3rd Qu.:2.4667 3rd Qu.:1.0756
## Max. :3.8089 Max. :1.3132
test$SOC.pred<-val$var1.pred
test$SOC.var<-val$var1.var
test$residual<-(test$SOC.bc-test$SOC.pred)
bubble(test, zcol = "residual", maxsize = 2.0, main = "Residuals at Test Data")
# Mean Error (ME)
ME<-round(mean(test$residual),3)
# Mean Absolute Error
MAE<-round(mean(abs(test$residual)),3)
# Root Mean Squre Error (RMSE)
RMSE<-round(sqrt(mean(test$residual^2)),3)
# Mean Squared Deviation Ratio (MSDR)
MSDR<-mean(test$residual^2/test$SOC.var)
ME
## [1] 0.152
MAE
## [1] 0.873
RMSE
## [1] 1.126
MSDR
## [1] 1.267439
Another way to compare actual vs. predicted values is by a linear regression between them. Ideally, this would be a 1:1 line: intercept is 0 (no bias) and the slope is set at 1 (gain is equal).
lm.val <- lm(test$SOC.pred ~ test$SOC.bc)
summary(lm.val)
##
## Call:
## lm(formula = test$SOC.pred ~ test$SOC.bc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.16895 -0.42721 0.07515 0.28345 1.30121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.19921 0.09196 13.041 < 2e-16 ***
## test$SOC.bc 0.34824 0.03592 9.695 5.07e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5419 on 99 degrees of freedom
## Multiple R-squared: 0.487, Adjusted R-squared: 0.4818
## F-statistic: 93.99 on 1 and 99 DF, p-value: 5.069e-16
plot(test$SOC.bc, test$SOC.pred,main=list("Validation at Test Locations",cex=1),
sub = "1:1 line red, regression green",
xlab = "Observed Box-COX SOC",
ylab = "Predicted Box-COX SOC",
cex.axis=.6,
xlim = c(-2,6),
ylim =c(-2,6),
pch = 21,
cex=1)
abline(0, 1, col="red")
abline(lm.val, col = "green")
rm(list = ls())