As I discussed earlier that zero-dimensional points are essential for geographical features like wells, soil-sampling locations that can be best expressed by a single reference point. Often point vector data consist of several points into a multi-point structure, with a single attribute record. For example, all the soil sampling points could be considered as a single geometry. The main drawback of a point feature, however, is that they cannot be used to make measurements (as you can with a polygon)

In this section, we will use 650 soil samples from Colorado, Kansas, New Mexico, and Wyoming. These samples were collected by the United States Geological Survey (USGS) as a part of the USGS Geochemical Landscapes Project [Smith et al., 2011]. We will learn how to join spatial and non-spatial data and create a dataframe for spatial interpolation.

This exercise consists of following major steps:

Part-1: Create a Spatial Point Data Frame

Load CSV Files as Pandas Data Frame

Convert Pandas Data Frame to Geo Data Frame

Define Map Projection

Merging Tabular and Spatial Data

Write as a ESRI Shape File

Extract Geometry From Spatial Point Data Frame

Convert Geo Data Frame to Pandas Data Frame

Write as a CSV File

Part-2: Data Visualization

Boxplot

Barplot

Scatter Plot

3D Plot

Map Spatial Distribution of SOC

Interactive Map with Plotly

Part 3: Exploratory Data Analysis

Distribution of SOC

Normality Test

Descriptive Statistics

Summary Statistics by Group

Part 4: Statistical Analysis

Analysis of Variance (ANOVA)

Correlation Analysis

Regression Analysis

Data¶

We will use following spatial data to create the data frame. The data could be download from here.

GP_ID.: Containts FIPS code, State and County names
Soil sampling locations (GP_GPS.CSV): This file contains coordinates of 473 soil sampling sites in Colorado, Kansas, New Mexico, and Wyoming. The soil samples were collected by United States Geological Survey (USGS) throughout the A horizon with sampling densities of 1 sample per ~1600 km2 (Smith et al., 2011).
Soil organic carbon (SOC) (GP_SOC.csv): SOC concentration of these 473 sites. The concentration of SOC was predicted by means of mid-infrared (MIR) spectroscopy and partial least squares regression (PLSR) analysis described previously (Janik et al., 2007; Ahmed et al., 2017).
Environmental covariables (gp_point_data.csv)
Prediction grid data

Setting consistent plotting style throughout notebook¶

import seaborn as sns

sns.set_style("white")
sns.set(font_scale=1.5)

Part 1: Create a Spatial Point Data Frame¶

## Create Woking directory

import os

path= "E:\GitHub\geospatial-python-github.io\geospatial-python.io\Lesson_07_working_with_point_data"
os.chdir(path)
print("Current Working Directory " , os.getcwd())

Current Working Directory  E:\GitHub\geospatial-python-github.io\geospatial-python.io\Lesson_07_working_with_point_data

Load CSV Files as Pandas Data Frame¶

1. Site ID file¶

import pandas as pd

# Read ID file
f_id = 'Data_07/GP_ID.csv'
id_df = pd.read_csv(f_id)
id_df.head()

2. Geo-coordnates Soil sampling Sites¶

import pandas as pd

# Read GPS of soil sampling locations
f_gps = 'Data_07/GP_GPS.csv'
gps = pd.read_csv(f_gps)
gps.head()

3. Soil Carbon Data¶

import pandas as pd

#  Soil OC data
f_soc = 'Data_07/GP_SOC.csv'
soc = pd.read_csv(f_soc)
soc.head()

4. Environmental Covariables¶

import pandas as pd

f_co = 'Data_07/gp_point_data.csv'
co_var = pd.read_csv(f_co)
# Convert NLCD to integer
co_var['NLCD']=co_var['NLCD'].astype(int)
co_var.head()

5. NLCD ID File¶

import pandas as pd

f_nlcd = 'Data_07/NLCD_ID.csv'
nlcd_id = pd.read_csv(f_nlcd)
nlcd_id.head()

Convert Pandas Data Frame to Geo Data Frame¶

A GeoDataFrame needs a shapely object. We use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame. (note that points_from_xy() is an enhanced wrapper for [Point(x, y) for x, y in zip(df.Longitude, df.Latitude)])

import geopandas as gpd

gps_point = gpd.GeoDataFrame(
    gps, geometry=gpd.points_from_xy(gps.long, gps.lat))

gps_point.head()

# Plot data
gps_point.plot()

<AxesSubplot:>

Define Map Projection¶

We will transform point shape file from WGS 1984 to Albers Equal Area Conic NAD1983 (EPSG:5070). We could reproject the data from one CRS to another CRS using the geopandas method (i.e. function belonging to geopandas objects): to_crs(). Function .to_crs() will only work if your original spatial object has a CRS assigned to it AND if that CRS is the correct CRS for that data!

print(gps_point.crs)

None

We have define the projection before change CRS to projected coordinatesystem. We assign the WGS84 latitude-longitude CRS (EPSG:4326) to the crs attribute:

# Define CRS
gps_point.crs = ({'init' :'epsg:4326'})

E:\PyGeo\lib\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

Now we will repoject gps_point data using the CRS of GP_STATE shape file. First we extract CRS from State shape file

import geopandas as gpd

# US State shape files
fn="Data_07/GP_STATE.shp"
GP_STATE= gpd.read_file(fn)
# CRS of state polygon
GP_crs=GP_STATE.crs
GP_crs

<Projected CRS: PROJCS["Albers_Conical_Equal_Area",GEOGCS["NAD83", ...>
Name: Albers_Conical_Equal_Area
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

# Repojection to  epsg:5070 projection
point_proj=gps_point.to_crs(GP_crs)

import matplotlib.pyplot as plt

# Make subplots that are next to each other
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 8))

# Plot the data in WGS84 CRS
gps_point.plot(ax=ax1, facecolor='gray');

# Add title
ax1.set_title("WGS84");

# Plot the one 
point_proj.plot(ax=ax2, facecolor='blue');

# Add title
ax2.set_title("Albers Equal Area Conic");

# Remove empty white space around the plot
plt.tight_layout()

Merging Tabular and Spatial Data¶

Attribute joins are accomplished using the merge() function. In general, it is recommended to use the merge method called from the spatial dataset. The stand-alone merge() function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.

First we will create PD frame with all non-spatial data. We will use pd.merge() function to create a dataframe.

from functools import reduce

df1 = [id_df, soc,co_var]
df2 = reduce(lambda left,right: pd.merge(left,right,on='SiteID'), df1)
df2

Now we add NLCD Description column to the data-frame

from functools import reduce

d3 = [df2, nlcd_id]
df = reduce(lambda left,right: pd.merge(left,right,on='NLCD'), d3)
df

Now, we will merge GeoDataFrame (point_proj) with a pandas DataFrame (df) by SiteID of each each sampling location. Be

point_spdf = point_proj.merge(df, on='SiteID')
point_spdf.head()

Write as a ESRI Shape File¶

point_spdf.to_file("Data_07/gp_point_all_data.shp")

Extract Geometry From Spatial Point Data Frame¶

Now we will extract projected geometry (m) from the spatial point data-frame.

# Get X (m) and Y (m) Values from geometry column
point_spdf['x'] = point_spdf.geometry.x
point_spdf['y'] = point_spdf.geometry.y
point_spdf.head()

Convert Geo Data Frame to Pandas Data Frame¶

Now we will convert geopandas data frame to pandas data frame without geometry column. We will use this data-frame for futher analysis.

import pandas as pd

point_df = pd.DataFrame(point_spdf.drop(columns=['geometry']))

# Organize variables
point_mf= point_df[['SiteID', 'long', 'lat', 'x', 'y', 'FIPS','STATE', 'COUNTY', 
    'SOC', 'DEM','Slope','MAP', 'MAT','NDVI','NLCD','NLCD_DES']]
point_mf.head()

Write as a CSV File¶

point_mf.to_csv('Data_07/gp_all_data.csv', header =True, index=False)

Part 2: Data Visualization¶

Boxplot¶

Boxplot with Seaborn¶

import seaborn as sns

p, ax = plt.subplots(figsize=(10, 6))
ax=sns.boxplot(y="SOC", 
            x="NLCD_DES", 
            data=point_mf,
            palette="Blues_d")

ax.axes.xaxis.label.set_text("NLCD")
ax.axes.yaxis.label.set_text("Soil OC (mg/g)")
#plt.title('Mean ± SE of Soil OC')

Interactive Boxplot with Plotly¶

import plotly.express as px

fig = px.box(point_mf, 
             x="NLCD_DES", 
             y="SOC",
             width=800, 
             height=500)
fig.update_layout(
    title={
        'text': "Sourface Soil Organic (mg/g)",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
     xaxis_title="NLCD",
     yaxis_title="Soil OC",
     font=dict(
        family="sans-serif, monospace",
        size=18,
        color="#7f7f7f"))

fig.show()

Barplot¶

# short data-set
mf_soc =point_mf.sort_values("SOC",ascending=False)

import seaborn as sns

p, ax = plt.subplots(figsize=(10, 6))
ax=sns.barplot(y="SOC", 
            x="NLCD_DES", 
            data=point_mf,
            capsize=.2, 
            #width=.3,
            palette="Blues_d")

ax.axes.xaxis.label.set_text("NLCD")
ax.axes.yaxis.label.set_text("Soil OC (mg/g)")
#plt.title('Mean ± SE of Soil OC')

import seaborn as sns

p, ax = plt.subplots(figsize=(10, 7))
ax=sns.barplot(x="SOC", 
            y="NLCD_DES", 
            data=point_mf,
            capsize=.2, 
            #width=.3,
            palette="Blues_d")

ax.axes.yaxis.label.set_text("")
ax.axes.xaxis.label.set_text("Soil OC (mg/g)")

Scatter Plot¶

point_mf.plot(x='NDVI', y='SOC', style ='o')

<AxesSubplot:xlabel='NDVI'>

import seaborn as sns

sns.jointplot(x=point_mf["NDVI"], y=point_mf["SOC"], kind='scatter')
sns.jointplot(x=point_mf["NDVI"], y=point_mf["SOC"], kind='hex')
sns.jointplot(x=point_mf["NDVI"], y=point_mf["SOC"], kind='kde')

<seaborn.axisgrid.JointGrid at 0x1d5af9b8148>

3D Plot¶

import plotly.express as px

fig = px.scatter_3d(point_mf, x='DEM', y='NDVI', z='SOC',
                    color='SOC',
                    size='SOC', size_max=18,
                    opacity=0.7)

# tight layout
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))

Map Spatial Distribution of SOC¶

We will create a quantile map to see spatial distribution SOC in the four states.

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, BoundaryNorm
from matplotlib import cm
from mpl_toolkits.axes_grid1 import make_axes_locatable

## Plot Soil C
# set the range for the choropleth
vmin, vmax = 0, 32
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(6, 7))
# State boundary
GP_STATE.plot(ax=ax, color="white", edgecolor="gray")
# Point data
point_spdf.plot(column='SOC', 
              cmap='Spectral', 
              ax=ax,
              alpha=1, 
              scheme='quantiles',
              k =7,
              markersize=30, 
              )
# remove the axis
#ax.axis('off')
# add a title
ax.set_title('Soil Organic (%)',
            fontdict={'fontsize': '20', 'fontweight' : '3'})
# create an axes on the right side of ax. The width of cax will be 5%
# of ax and the padding between cax and ax will be fixed at 0.05 inch.
# You need to import mpl_toolkits.axes_grid1 
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
# Create colorbar as a legend
sm = plt.cm.ScalarMappable(cmap='Spectral',
                           norm=plt.Normalize(vmin=vmin, vmax=vmax))
# empty array for the data range
sm._A = []
# add the colorbar to the figure
cbar = fig.colorbar(sm, cax=cax)

Interactive Map with Plotly¶

import plotly.graph_objects as go


point_spdf['text'] =point_spdf['STATE'] + '' +point_spdf['COUNTY'] + ''+  'Arrivals: ' +point_spdf['SOC'].astype(str)


fig = go.Figure(data=go.Scattergeo(
        locationmode = 'USA-states',
        lon = point_spdf['long'],
        lat = point_spdf['lat'],
        text = point_spdf['text'],
        mode = 'markers',
        marker = dict(
            size = 6,
            opacity = 1,
            reversescale = False,
            autocolorscale = False,
            symbol = 'circle-open',
            line = dict(
                width=0.8,
                color='rgba(102, 102, 102)'
            ),
            colorscale = "YlGnBu",
            cmin = 0,
            color = point_spdf['SOC'],
            cmax = point_spdf['SOC'].max(),
            colorbar_title="Soil OC <br> (mg/g) "
        )))

fig.update_layout(
        title={
        'text': "Spatial Distribution of Soil Organic C (mg/g)",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
        geo = dict(
            scope='usa',
            projection_type='albers usa',
            showland = True,
            landcolor = "rgb(250, 250, 250)",
            subunitcolor = "rgb(217, 217, 217)",
            countrycolor = "rgb(217, 217, 217)",
            countrywidth = 0.5,
            subunitwidth = 0.5))
        

fig.show()

Part 3: Exploratory Data Analysis¶

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task (Wikipedia, 2020).

Distribution of SOC¶

import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
from scipy.stats import norm 

fig = plt.figure(figsize=(7, 6))
sns.distplot(point_mf['SOC'], fit=norm);


# Get the fitted parameters used by the function
(mu, sigma) = norm.fit(point_mf['SOC'])
print( '\n mu = {:.2f} and sigma = {:.2f}\n'.format(mu, sigma))

#Now plot the distribution
plt.legend(['Normal dist. ($\mu=$ {:.2f} and $\sigma=$ {:.2f} )'.format(mu, sigma)],
            loc='best')
plt.ylabel('Frequency')
plt.title('SOC distribution')

#Get also the QQ-plot
fig = plt.figure(figsize=(7, 6))
res =st.probplot(point_mf['SOC'], plot=plt)
plt.show()

 mu = 6.11 and sigma = 5.01

You can create a nice looking Q-Q plot with "Pingouin". It is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.

import numpy as np
import pingouin as pg

ax = pg.qqplot(point_mf['SOC'], dist='norm')

Normality Test¶

The pingouin.normality() function works with lists, arrays, or pandas DataFrame in wide or long-format.

import numpy as np
import pingouin as pg

print(pg.normality(point_mf['SOC']))

            W          pval  normal
SOC  0.866947  1.307438e-21   False

Skewness of SOC¶

from scipy.stats import skew 

skew(point_mf['SOC'])

1.5317088546155244

Boxcox Transformation¶

from sklearn.preprocessing import power_transform

point_mf['SOC_BC'] = st.boxcox(point_mf["SOC"], lmbda=0 )

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(7, 6))
sns.distplot(point_mf['SOC_BC'], fit=norm);

Descriptive Statistics¶

point_mf[['SOC', 'DEM', 'Slope','MAP', 'MAT','NDVI']].describe()

Summary Statistics by Group¶

# Descriptive statistics
soc_stat=point_mf.groupby("NLCD_DES")['SOC'].describe().reset_index()
soc_stat['count_squareroot']=soc_stat['count']**(1/2)
# Standard error
soc_stat['sem']= soc_stat['mean']/soc_stat['count_squareroot']
# acending data 
mf_soc_stat=soc_stat.sort_values("mean",ascending=True)
mf_soc_stat

import numpy as np

# Create lists for the plot
NLCD = mf_soc_stat['NLCD_DES']
x_pos = np.arange(len(NLCD))
SOC = mf_soc_stat['mean']
error =mf_soc_stat ['sem']

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(NLCD, SOC, yerr=error, align='center', alpha=0.5, ecolor='black', capsize=10)
ax.set_ylabel('Soil SOC (mg/g)')
ax.set_xticks(x_pos)
ax.set_xticklabels(NLCD)
ax.set_title('Mean ± SE of  Soil OC')
ax.yaxis.grid(True)

# Save the figure and show
plt.tight_layout()
# plt.savefig('bar_plot_with_error_bars.png')
plt.show()

Part 4: Statistcal Analysis¶

Analysis of Variance¶

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among group means in a sample (Source: https://en.wikipedia.org/wiki/Analysis_of_variance).

import numpy as np
import pingouin as pg

aov = pg.anova(data=df, dv='SOC', between='NLCD_DES', detailed=True)

aov

Correlation Analysis¶

mf=point_mf[['SOC', 'DEM', 'Slope','MAP', 'MAT','NDVI']]

mf.corr()

You can use Pingouin pakage for correlation analysis

import numpy as np
import pingouin as pg

pg.pairwise_corr(mf, columns=['SOC', 'DEM', 'Slope','MAP', 'MAT','NDVI'], method='pearson')

import matplotlib.pyplot as plt

corrmat = mf.corr() 
f, ax = plt.subplots(figsize =(9, 8)) 
sns.heatmap(corrmat, ax = ax, cmap ="YlGnBu", linewidths = 0.1)

<AxesSubplot:>

Grid Correlation Matrix¶

cg = sns.clustermap(corrmat, cmap ="YlGnBu", linewidths = 0.1); 
plt.setp(cg.ax_heatmap.yaxis.get_majorticklabels(), rotation = 0) 
  
cg

<seaborn.matrix.ClusterGrid at 0x1d5b3c643c8>

Correlation Matrix Plot With Pandas¶

pd.plotting.scatter_matrix(point_mf[['SOC', 'DEM', 'MAT','NDVI']], alpha=0.2, figsize=(10, 10))
plt.show()

Correlation Matrix Plot With Seaborn¶

Seaborn allows to make a correlogram or correlation matrix really easily. Correlogram are awesome for exploratory analysis: it allows to quickly observe the relationship between every variable of your matrix. It is easy to do it with seaborn: just call the pairplot function

sns.pairplot(point_mf[['SOC', 'DEM', 'MAT','NDVI']], diag_kind="kde")

<seaborn.axisgrid.PairGrid at 0x1d5b45cc248>

Regression Analysis¶

"Regression is a fundamental problem in statistics and machine learning. In regression studies, we are typically interested in inferring a real-valued function (called a regression function) whose values correspond to the mean of a dependent (or response or output) variable conditioned on one or more independent (or input) variables. Many different techniques for estimating this regression function have been developed, including parametric, semi-parametric, and nonparametric methods." (Source: Quadrianto N., Buntine W.L. (2011) Regression. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_710)

Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model (Wikipedia).

We will use "scikit-learn", a machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy (Wikipedia).

Simple Linear Regression (SLR)¶

Simple linear regression summarize and study relationships between two continuous (quantitative) variables. One variable denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable denoted y, is regarded as the response, outcome, or dependent variable.

Separate the target value, or "y", from the features. This label is the value that you will train the model to predict.

x = mf['NDVI'].values.reshape(-1,1)
y = mf['SOC'].values.reshape(-1,1)

Split the data into train and test¶

from sklearn.model_selection import train_test_split 

x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

print(x_train.shape)
print(y_test.shape)
print(y_train.shape)
print(y_test.shape)

(427, 1)
(143, 1)
(427, 1)
(143, 1)

Standardize features¶

It is good practice to normalize or standardize features that use different scales and ranges. Although the model might converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

from sklearn import preprocessing

x_train = preprocessing.scale(x_train)
x_test = preprocessing.scale(x_test)

Fit SLR¶

from sklearn.linear_model import LinearRegression

slr = LinearRegression()  
slr=slr.fit(x_train, y_train) #training the algorithm

print('Intercept:', slr.intercept_) 
print('Slope:', slr.coef_)

Intercept: [6.16803279]
Slope: [[3.0336134]]

Calculate Residuals¶

y_train_pred=slr.predict(x_train)
train_pred= pd.DataFrame({'Actual': y_train.flatten(), 'Predicted': y_train_pred.flatten()})
train_pred['residuals']=train_pred['Actual'] - train_pred['Predicted']
train_pred

Prediction at test locations¶

Now that we have trained our algorithm, it’s time to make some predictions. To do so, we will use our test data and see how accurately SRL predicts the SOC.

y_slr_pred = slr.predict(x_test)

slr_pred = pd.DataFrame({'Observed': y_test.flatten(), 'Predicted': y_slr_pred.flatten()})
slr_pred.describe()

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(6, 6))
ax = sns.regplot(x="Observed", y="Predicted", data=slr_pred)
ax.set_ylim(0,40)
ax.set_xlim(0,40)
ax.plot([0, 40], [0, 40], 'k--', lw=2)
plt.title('SLR Predicted SOC (%)',fontsize=18)

Text(0.5, 1.0, 'SLR Predicted SOC (%)')

from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_slr_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_slr_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_slr_pred)))

Mean Absolute Error: 2.4996874518375476
Mean Squared Error: 10.842598955878769
Root Mean Squared Error: 3.292810191292351

Multiple Linear Regression (MLR)¶

Multiple linear regression (MLR) explore relationships of several explanatory variables and a response variable. The goal of MLR is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable.

Create a data frame¶

Xm =mf[['DEM', 'Slope', 'MAT', 'MAP','NDVI']].values
y = mf[['SOC']].values

Split to training and test data¶

x_train, x_test, y_train, y_test = train_test_split(Xm, y, test_size=0.2, random_state=0)

print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(456, 5)
(114, 5)
(456, 1)
(114, 1)

Fit MLR¶

from sklearn.linear_model import LinearRegression

mlr = LinearRegression()  
mlr.fit(x_train, y_train)

LinearRegression()

Co-efficients MLR model¶

pred_mlr = pd.DataFrame(['DEM', 'Slope', 'MAT', 'MAP','NDVI'])
coeff = pd.DataFrame(mlr.coef_, index=['Co-efficient']).transpose()

pd.concat([pred_mlr,coeff], axis=1, join='inner')

Training Statistics¶

import sklearn.metrics as metrics

y_train_mlr=mlr.predict(x_train)

explained_variance=metrics.explained_variance_score(y_train, y_train_mlr)
mean_absolute_error=metrics.mean_absolute_error(y_train, y_train_mlr) 
mse = metrics.mean_squared_error(y_train, y_train_mlr) 
median_absolute_error = metrics.median_absolute_error(y_train, y_train_mlr)
r2 = metrics.r2_score(y_train, y_train_mlr)

print('explained_variance: ', round(explained_variance,4))    
print('r2: ', round(r2,4))
print('MAE: ', round(mean_absolute_error,4))
print('MSE: ', round(mse,4))
print('RMSE: ', round(np.sqrt(mse),4))

explained_variance:  0.4064
r2:  0.4064
MAE:  2.7994
MSE:  15.4298
RMSE:  3.9281

MLR with Pingouin¶

import numpy as np
import pingouin as pg

pg.linear_regression(mf[['DEM', 'Slope', 'MAT', 'MAP','NDVI']], mf['SOC'])

Assessing the Quality of Regression Models¶

K-fold Cross Validation¶

"Cross-validation is a process for creating a distribution of pairs of training and test sets out of a single data set. In cross validation the data are partitioned into k subsets, S1…Sk, each called a fold. The folds are usually of approximately the same size. The learning algorithm is then applied k times, for i = 1 to k, each time using the union of all subsets other than Si as the training set and using Si as the test set". (Source: (2011) Cross-Validation. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_190)

In K-Folds Cross Validation data is splitted into k different subsets (or folds). We use k-1 subsets to train our data and leave the last subset (or the last fold) as test data. We then average the model against each of the folds and then finalize our model. After that we test it against the test set. We will use entire data set (n=666) in this tutorial.

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_validate
import sklearn.metrics as metrics

cv_Kfold = KFold(n_splits=10, random_state=1, shuffle=True)

for i in range(1,11):
    mlr = LinearRegression()  
    mlr.fit(Xm, y)
    cv_scores_Kfold = cross_val_score(mlr, Xm, y, scoring="neg_mean_squared_error", cv=cv_Kfold, n_jobs=1)

all_scores = pd.DataFrame(cross_validate(mlr, Xm, y, scoring=['r2', 'neg_mean_absolute_error'], n_jobs=-1, verbose=0))
print(all_scores)

   fit_time  score_time   test_r2  test_neg_mean_absolute_error
0  0.006001    0.002002  0.310929                     -3.639326
1  0.003999    0.002001  0.265494                     -3.101695
2  0.005001    0.002002  0.326416                     -2.331438
3  0.003999    0.002000  0.322108                     -1.805035
4  0.006998    0.000999  0.483584                     -2.947291

# mean and STD
print("Folds: " + str(len(cv_scores_Kfold)) + ", MSE: " + str(np.mean(np.abs(cv_scores_Kfold))) + ", STD: " + str(np.std(cv_scores_Kfold)))

Folds: 10, MSE: 15.076049305815697, STD: 3.0349086993795664

# Cross-validation prediction
y_pred_KFold = cross_val_predict(mlr, Xm, y, cv=10)

Kfold_pred = pd.DataFrame({'Observed': y.flatten(), 'Predicted': y_pred_KFold.flatten()})
Kfold_pred.describe()

plt.figure(figsize=(6, 6))
ax = sns.regplot(x="Observed", y="Predicted", data=Kfold_pred)
ax.set_ylim(0,35)
ax.set_xlim(0,35)
ax.plot([0, 35], [0, 35], 'k--', lw=2)
plt.title('K-Fold Cross-Validation',fontsize=18)

Text(0.5, 1.0, 'K-Fold Cross-Validation')

import sklearn.metrics as metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y, y_pred_KFold))  
print('Mean Squared Error:', metrics.mean_squared_error(y, y_pred_KFold))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y, y_pred_KFold)))
print('R2:', metrics.r2_score(y, y_pred_KFold))

Mean Absolute Error: 2.7864972893328037
Mean Squared Error: 15.407817905098318
Root Mean Squared Error: 3.9252793410276317
R2: 0.3850719859175731

Leave-One-Out Cross-Validation¶

Leave-one-out cross-validation is a special case of cross-validation where the number of folds equals the number of instances in the data set. Thus, the learning algorithm is applied once for each instance, using all other instances as a training set and using the selected instance as a single-item test set. This process is closely related to the statistical method of jack-knife estimation (Efron, 1982). (Source:(2011) Leave-One-Out Cross-Validation. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_469).

cv_LOV = KFold(n_splits=570, random_state=None, shuffle=False)
cv_scores_LOV = cross_val_score(mlr, Xm, y, scoring="neg_mean_squared_error", cv=cv_LOV, n_jobs=1)

# Mean and SD 
print("Folds: " + str(len(cv_scores_LOV)) + ", MSE: " + str(np.mean(np.abs(cv_scores_LOV))) + ", STD: " + str(np.std(cv_scores_LOV)))

Folds: 570, MSE: 15.002241515906002, STD: 32.7190311755415

# Prediction
y_pred_LOV = cross_val_predict(mlr, Xm, y, cv=10)

LOV_pred = pd.DataFrame({'Observed': y.flatten(), 'Predicted': y_pred_LOV.flatten()})
LOV_pred.describe()

plt.figure(figsize=(6, 6))
ax = sns.regplot(x="Observed", y="Predicted", data=LOV_pred)
ax.set_ylim(0,35)
ax.set_xlim(0,35)
ax.plot([0, 35], [0, 35], 'k--', lw=2)
plt.title('LOV Cross-Validation',fontsize=18)

Text(0.5, 1.0, 'LOV Cross-Validation')

Prediction at test locations¶

Now that we have trained our algorithm, it’s time to make some predictions. To do so, we will use our test data and see how accurately SRL predicts the SOC.

y_mlr_pred = mlr.predict(x_test)

mlr_pred = pd.DataFrame({'Observed': y_test.flatten(), 'Predicted': y_mlr_pred.flatten()})
mlr_pred.describe()

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(6, 6))
ax = sns.regplot(x="Observed", y="Predicted", data=mlr_pred)
ax.set_ylim(0,30)
ax.set_xlim(0,30)
ax.plot([0, 30], [0, 30], 'k--', lw=2)
plt.title('MLR Predicted SOC (%)',fontsize=18)

Text(0.5, 1.0, 'MLR Predicted SOC (%)')

import sklearn.metrics as metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_mlr_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_mlr_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_mlr_pred)))
print('R2:', metrics.r2_score(y_test, y_mlr_pred))

Mean Absolute Error: 2.4324964664504187
Mean Squared Error: 11.391382213714996
Root Mean Squared Error: 3.375112177945349
R2: 0.4608027564669763

Prediction at Grid Locations¶

Now we will SOC at grid locations of four US States

import pandas as pd

g_id = 'Data_07/gp_grid_data.csv'
grid_df = pd.read_csv(g_id)
grid_df.head()

x_grid=grid_df[['DEM','Slope','MAP','MAT','NDVI']]
x_grid = preprocessing.scale(x_grid)

mlr_grid = mlr.predict(x_grid)
mlr_grid

array([[-6.60331576],
       [-6.60331576],
       [-9.42666201],
       ...,
       [ 9.38705417],
       [ 9.38705417],
       [13.35855728]])

grid_df['Pred_SOC'] = pd.DataFrame(mlr_grid)
grid_df

Covert to SPDF¶

import geopandas as gpd

# create data-frame
grid_df_rf = grid_df[['GridID','x','y','Pred_SOC']]
# Convert to spatial point data-frame
grid_rf = gpd.GeoDataFrame(
    grid_df_rf, geometry=gpd.points_from_xy(grid_df_rf.x, grid_df_rf.y))
#Save as ESRI shape file
grid_rf.to_file('Data_07/MLR_PRED_SOC.shp')

Covert to raster¶

Convert point to raster is little tricky in python. You have follow several steps to convert point data to raster.

# Input  point shape file
mlr_point = 'Data_07/MLR_PRED_SOC.shp'

# Output raster
mlr_output = 'Data_07/MLR_PRED_SOC.tif'

from osgeo import ogr, gdal
from osgeo import gdalconst

# Reference raster
point_ndsm = 'Data_07/DEM.tif'
point_data = gdal.Open(point_ndsm, gdalconst.GA_ReadOnly)

# Get geo-information
geo_transform = point_data.GetGeoTransform()
x_min = geo_transform[0]
y_max = geo_transform[3]
x_max = x_min + geo_transform[1] * point_data.RasterXSize
y_min = y_max + geo_transform[5] * point_data.RasterYSize
x_res = point_data.RasterXSize
y_res = point_data.RasterYSize
mb_v = ogr.Open(mlr_point) # change 
mb_l = mb_v.GetLayer()
pixel_width = geo_transform[1]

# Raster coversion
target_ds = gdal.GetDriverByName('GTiff').Create(mlr_output, x_res, y_res, 1, gdal.GDT_Float32) # Change
target_ds.SetGeoTransform((x_min, pixel_width, 0, y_min, 0, pixel_width))
band = target_ds.GetRasterBand(1)
band.FlushCache()
gdal.RasterizeLayer(target_ds, [1], mb_l, options=["ATTRIBUTE=Pred_SOC"]) # Change 

target_ds = None

Visulaize the Predicted SOC Raster¶

# Open GP=state polygon data 

import geopandas as gpd

fp_state="Data_07/GP_state.shp"
state_BD= gpd.GeoDataFrame.from_file(fp_state)
state_BD.geom_type.head()

0    Polygon
1    Polygon
2    Polygon
3    Polygon
dtype: object

from osgeo import ogr, gdal
from osgeo import gdalconst
import numpy as np

filename = "Data_07/MLR_PRED_SOC.tif"
gdal_data = gdal.Open(filename)
gdal_band = gdal_data.GetRasterBand(1)
nodataval = gdal_band.GetNoDataValue()

# convert to a numpy array
data_array = gdal_data.ReadAsArray().astype(np.float)
data_array

# replace zero to missing values if necessary
if np.any(data_array == 0):
    data_array[data_array == 0] = np.nan
    data_array[data_array == 0] = np.nan

%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize = (7,6))

plt.imshow(data_array, 
           cmap='jet',
           origin='lower')
plt.colorbar(cax = fig.add_axes([1, 0.25, 0.05, 0.5]))
plt.title("MLR Predicted \n SOC (mg/g)")

plt.show()

	SiteID	FIPS	STATE_ID	STATE	COUNTY
0	1	8001	8	Colorado	Adams County
1	2	8003	8	Colorado	Alamosa County
2	3	8005	8	Colorado	Arapahoe County
3	4	8005	8	Colorado	Arapahoe County
4	5	8005	8	Colorado	Arapahoe County

	SiteID	long	lat
0	1	-104.444250	39.95648
1	2	-105.927230	37.60194
2	3	-104.840678	39.64298
3	4	-104.504870	39.61835
4	5	-104.421550	39.61246

	SiteID	SOC
0	1	2.304
1	2	3.154
2	3	1.309
3	4	4.717
4	5	2.230

	SiteID	DEM	Slope	MAP	MAT	NDVI	NLCD
0	1	1551.130005	1.30141	384.401001	9.79839	0.331707	7
1	2	2308.689941	0.85811	194.281998	5.36274	0.430466	7
2	3	1689.930054	2.96257	469.255005	10.05500	0.471263	6
3	4	1822.170044	2.71405	470.238007	9.39621	0.396961	6
4	5	1758.680054	1.79807	439.721985	9.52607	0.348821	6

	NLCD_DES	NLCD
0	Forest	4
1	Herbaceous	6
2	Planted/Cultivated	7
3	Shrubland	5

	SiteID	long	lat	geometry
0	1	-104.444250	39.95648	POINT (-104.44425 39.95648)
1	2	-105.927230	37.60194	POINT (-105.92723 37.60194)
2	3	-104.840678	39.64298	POINT (-104.84068 39.64298)
3	4	-104.504870	39.61835	POINT (-104.50487 39.61835)
4	5	-104.421550	39.61246	POINT (-104.42155 39.61246)

	SiteID	long	lat	geometry	FIPS	STATE_ID	STATE	COUNTY	SOC	DEM	Slope	MAP	MAT	NDVI	NLCD	NLCD_DES
0	1	-104.444250	39.95648	POINT (-714067.484 1913846.289)	8001	8	Colorado	Adams County	2.304	1551.130005	1.30141	384.401001	9.79839	0.331707	7	Planted/Cultivated
1	2	-105.927230	37.60194	POINT (-866566.008 1663513.090)	8003	8	Colorado	Alamosa County	3.154	2308.689941	0.85811	194.281998	5.36274	0.430466	7	Planted/Cultivated
2	3	-104.840678	39.64298	POINT (-750759.166 1881917.141)	8005	8	Colorado	Arapahoe County	1.309	1689.930054	2.96257	469.255005	10.05500	0.471263	6	Herbaceous
3	4	-104.504870	39.61835	POINT (-722566.371 1876565.288)	8005	8	Colorado	Arapahoe County	4.717	1822.170044	2.71405	470.238007	9.39621	0.396961	6	Herbaceous
4	5	-104.421550	39.61246	POINT (-715564.605 1875277.417)	8005	8	Colorado	Arapahoe County	2.230	1758.680054	1.79807	439.721985	9.52607	0.348821	6	Herbaceous

	SiteID	long	lat	x	y	FIPS	STATE	COUNTY	SOC	DEM	Slope	MAP	MAT	NDVI	NLCD	NLCD_DES
0	1	-104.444250	39.95648	-714067.484244	1.913846e+06	8001	Colorado	Adams County	2.304	1551.130005	1.30141	384.401001	9.79839	0.331707	7	Planted/Cultivated
1	2	-105.927230	37.60194	-866566.008141	1.663513e+06	8003	Colorado	Alamosa County	3.154	2308.689941	0.85811	194.281998	5.36274	0.430466	7	Planted/Cultivated
2	3	-104.840678	39.64298	-750759.166338	1.881917e+06	8005	Colorado	Arapahoe County	1.309	1689.930054	2.96257	469.255005	10.05500	0.471263	6	Herbaceous
3	4	-104.504870	39.61835	-722566.371080	1.876565e+06	8005	Colorado	Arapahoe County	4.717	1822.170044	2.71405	470.238007	9.39621	0.396961	6	Herbaceous
4	5	-104.421550	39.61246	-715564.604578	1.875277e+06	8005	Colorado	Arapahoe County	2.230	1758.680054	1.79807	439.721985	9.52607	0.348821	6	Herbaceous

	SOC	DEM	Slope	MAP	MAT	NDVI
count	570.000000	570.000000	570.000000	570.000000	570.000000	570.000000
mean	6.110298	1627.776373	4.780463	497.145379	8.846380	0.432971
std	5.010023	750.175428	4.715820	208.120256	4.061732	0.162786
min	0.152000	258.648987	0.649253	194.281998	-0.849688	0.153098
25%	2.569750	1177.099976	1.370800	357.262756	6.008715	0.305333
50%	4.701000	1595.719971	2.647395	432.712509	9.099600	0.413890
75%	8.625500	2230.392456	6.852565	580.362259	12.327850	0.550799
max	30.473000	3618.020020	26.104200	1128.109985	17.018200	0.796992

	NLCD_DES	count	mean	std	min	25%	50%	75%	max	count_squareroot	sem
3	Shrubland	160.0	3.703113	3.511537	0.244	1.1275	2.8555	4.90525	19.099	12.649111	0.292757
1	Herbaceous	185.0	5.184076	3.760195	0.152	2.3520	4.5170	6.53600	18.814	13.601471	0.381141
2	Planted/Cultivated	111.0	7.199550	3.482110	0.462	4.3100	7.0960	9.78550	15.740	10.535654	0.683351
0	Forest	114.0	9.931298	6.979229	0.736	4.4470	7.7185	14.26750	30.473	10.677078	0.930151

	Source	SS	DF	MS	F	p-unc	np2
0	NLCD_DES	2881.938907	3	960.646302	47.694621	1.713925e-27	0.201787
1	Within	11400.149410	566	20.141607	NaN	NaN	NaN

	SOC	DEM	Slope	MAP	MAT	NDVI
SOC	1.000000	0.127140	0.349538	0.518146	-0.315259	0.597899
DEM	0.127140	1.000000	0.691905	-0.263876	-0.803150	-0.071148
Slope	0.349538	0.691905	1.000000	0.181212	-0.625302	0.298365
MAP	0.518146	-0.263876	0.181212	1.000000	0.025930	0.814111
MAT	-0.315259	-0.803150	-0.625302	0.025930	1.000000	-0.206880
NDVI	0.597899	-0.071148	0.298365	0.814111	-0.206880	1.000000

	X	Y	method	tail	n	r	CI95%	r2	adj_r2	z	p-unc	BF10	power
0	SOC	DEM	pearson	two-sided	570	0.127140	[0.05, 0.21]	0.016165	0.012694	0.127832	2.356984e-03	5.284	0.861341
1	SOC	Slope	pearson	two-sided	570	0.349538	[0.28, 0.42]	0.122177	0.119080	0.364917	8.005180e-18	5.447e+14	1.000000
2	SOC	MAP	pearson	two-sided	570	0.518146	[0.46, 0.58]	0.268475	0.265895	0.573802	1.774551e-40	1.393e+37	1.000000
3	SOC	MAT	pearson	two-sided	570	-0.315259	[-0.39, -0.24]	0.099388	0.096211	-0.326374	1.282272e-14	3.856e+11	1.000000
4	SOC	NDVI	pearson	two-sided	570	0.597899	[0.54, 0.65]	0.357483	0.355217	0.689871	1.533655e-56	1.23e+53	1.000000
5	DEM	Slope	pearson	two-sided	570	0.691905	[0.65, 0.73]	0.478733	0.476894	0.851602	2.132024e-82	6.216e+78	1.000000
6	DEM	MAP	pearson	two-sided	570	-0.263876	[-0.34, -0.19]	0.069631	0.066349	-0.270270	1.554746e-10	3.897e+07	0.999996
7	DEM	MAT	pearson	two-sided	570	-0.803150	[-0.83, -0.77]	0.645050	0.643798	-1.107424	7.360595e-130	1.059e+126	1.000000
8	DEM	NDVI	pearson	two-sided	570	-0.071148	[-0.15, 0.01]	0.005062	0.001553	-0.071268	8.968713e-02	0.221	0.396904
9	Slope	MAP	pearson	two-sided	570	0.181212	[0.1, 0.26]	0.032838	0.029426	0.183236	1.342990e-05	666.436	0.991953
10	Slope	MAT	pearson	two-sided	570	-0.625302	[-0.67, -0.57]	0.391003	0.388854	-0.733664	3.614999e-63	4.732e+59	1.000000
11	Slope	NDVI	pearson	two-sided	570	0.298365	[0.22, 0.37]	0.089022	0.085808	0.307724	3.488650e-13	1.512e+10	1.000000
12	MAP	MAT	pearson	two-sided	570	0.025930	[-0.06, 0.11]	0.000672	-0.002853	0.025936	5.366929e-01	0.063	0.094749
13	MAP	NDVI	pearson	two-sided	570	0.814111	[0.78, 0.84]	0.662777	0.661588	1.139101	3.482758e-136	2.098e+132	1.000000
14	MAT	NDVI	pearson	two-sided	570	-0.206880	[-0.28, -0.13]	0.042799	0.039423	-0.209910	6.281647e-07	1.248e+04	0.998827

	Actual	Predicted	residuals
0	1.823	7.268126	-5.445126
1	11.220	8.201743	3.018257
2	8.328	6.759348	1.568652
3	8.780	11.474975	-2.694975
4	10.720	5.046893	5.673107
...	...	...	...
422	6.448	5.922508	0.525492
423	6.361	6.476301	-0.115301
424	2.335	6.719013	-4.384013
425	3.134	4.764200	-1.630200
426	2.821	5.431658	-2.610658

	Observed	Predicted
count	143.000000	143.000000
mean	5.937902	6.168033
std	4.371507	3.044276
min	0.152000	1.104464
25%	2.769500	3.881187
50%	4.701000	5.921257
75%	8.550000	8.180455
max	21.591000	12.142558

	SiteID	FIPS	STATE_ID	STATE	COUNTY	SOC	DEM	Slope	MAP	MAT	NDVI	NLCD	NLCD_DES
0	1	8001	8	Colorado	Adams County	2.304	1551.130005	1.30141	384.401001	9.79839	0.331707	7	Planted/Cultivated
1	2	8003	8	Colorado	Alamosa County	3.154	2308.689941	0.85811	194.281998	5.36274	0.430466	7	Planted/Cultivated
2	12	8009	8	Colorado	Baca County	6.129	1221.199951	1.06476	443.065002	12.13140	0.366326	7	Planted/Cultivated
3	13	8009	8	Colorado	Baca County	6.129	1221.199951	1.06476	443.065002	12.13140	0.366326	7	Planted/Cultivated
4	14	8009	8	Colorado	Baca County	6.129	1221.199951	1.06476	443.065002	12.13140	0.366326	7	Planted/Cultivated
...	...	...	...	...	...	...	...	...	...	...	...	...	...
565	558	56043	56	Wyoming	Washakie County	2.899	1380.719971	2.71461	246.516006	7.45715	0.171190	5	Shrubland
566	559	56043	56	Wyoming	Washakie County	2.899	1380.719971	2.71461	246.516006	7.45715	0.171190	5	Shrubland
567	562	56043	56	Wyoming	Washakie County	1.764	1421.709961	4.37551	290.014008	7.40410	0.212164	5	Shrubland
568	563	56043	56	Wyoming	Washakie County	11.615	1631.560059	6.85482	378.261993	6.82645	0.298178	5	Shrubland
569	570	56045	56	Wyoming	Weston County	5.600	1417.859985	4.09692	434.790009	6.45444	0.489789	5	Shrubland

	0	Co-efficient
0	DEM	-0.000062
1	Slope	0.076634
2	MAT	-0.278529
3	MAP	0.005548
4	NDVI	10.528183

	names	coef	se	T	pval	r2	adj_r2	CI[2.5%]	CI[97.5%]
0	Intercept	0.894583	1.615034	0.553910	5.798601e-01	0.416187	0.411011	-2.277632	4.066798
1	DEM	-0.000133	0.000492	-0.271273	7.862806e-01	0.416187	0.411011	-0.001100	0.000833
2	Slope	0.094482	0.055896	1.690318	9.151970e-02	0.416187	0.411011	-0.015308	0.204272
3	MAT	-0.257827	0.074714	-3.450864	6.006176e-04	0.416187	0.411011	-0.404579	-0.111076
4	MAP	0.005239	0.001478	3.545001	4.251981e-04	0.416187	0.411011	0.002336	0.008141
5	NDVI	10.757545	1.882793	5.713611	1.792271e-08	0.416187	0.411011	7.059403	14.455687

	Observed	Predicted
count	114.000000	114.000000
mean	5.729605	5.679312
std	4.616657	3.217073
min	0.152000	-0.032682
25%	2.356500	3.560598
50%	4.562000	5.229532
75%	7.782000	7.728301
max	24.954000	13.828552

	GridID	x	y	DEM	Slope	MAP	MAT	NDVI	NLCD
0	16	-1.210355e+06	1.001813e+06	1670.989990	7.23222	453.648987	14.2848	0.288304	5.0
1	17	-1.205355e+06	1.001813e+06	1670.989990	7.23222	453.648987	14.2848	0.288304	5.0
2	18	-1.200355e+06	1.001813e+06	1563.839966	4.70358	402.542999	14.9359	0.245139	5.0
3	19	-1.195355e+06	1.001813e+06	1563.839966	4.70358	402.542999	14.9359	0.245139	5.0
4	20	-1.190355e+06	1.001813e+06	1428.630005	2.02707	348.114990	15.6579	0.218998	6.0