As I discussed earlier that zero-dimensional points are essential for geographical features like wells, soil-sampling locations that can be best expressed by a single reference point. Often point vector data consist of several points into a multi-point structure, with a single attribute record. For example, all the soil sampling points could be considered as a single geometry. The main drawback of a point feature, however, is that they cannot be used to make measurements (as you can with a polygon)
In this section, we will use 650 soil samples from Colorado, Kansas, New Mexico, and Wyoming. These samples were collected by the United States Geological Survey (USGS) as a part of the USGS Geochemical Landscapes Project [Smith et al., 2011]. We will learn how to join spatial and non-spatial data and create a dataframe for spatial interpolation.
This exercise consists of following major steps:
Part-1: Create a Spatial Point Data Frame
We will use following spatial data to create the data frame. The data could be download from here.
GP_ID.: Containts FIPS code, State and County names
Soil sampling locations (GP_GPS.CSV): This file contains coordinates of 473 soil sampling sites in Colorado, Kansas, New Mexico, and Wyoming. The soil samples were collected by United States Geological Survey (USGS) throughout the A horizon with sampling densities of 1 sample per ~1600 km2 (Smith et al., 2011).
Soil organic carbon (SOC) (GP_SOC.csv): SOC concentration of these 473 sites. The concentration of SOC was predicted by means of mid-infrared (MIR) spectroscopy and partial least squares regression (PLSR) analysis described previously (Janik et al., 2007; Ahmed et al., 2017).
Environmental covariables (gp_point_data.csv)
Prediction grid data
import seaborn as sns
sns.set_style("white")
sns.set(font_scale=1.5)
## Create Woking directory
import os
path= "E:\GitHub\geospatial-python-github.io\geospatial-python.io\Lesson_07_working_with_point_data"
os.chdir(path)
print("Current Working Directory " , os.getcwd())
import pandas as pd
# Read ID file
f_id = 'Data_07/GP_ID.csv'
id_df = pd.read_csv(f_id)
id_df.head()
import pandas as pd
# Read GPS of soil sampling locations
f_gps = 'Data_07/GP_GPS.csv'
gps = pd.read_csv(f_gps)
gps.head()
import pandas as pd
# Soil OC data
f_soc = 'Data_07/GP_SOC.csv'
soc = pd.read_csv(f_soc)
soc.head()
import pandas as pd
f_co = 'Data_07/gp_point_data.csv'
co_var = pd.read_csv(f_co)
# Convert NLCD to integer
co_var['NLCD']=co_var['NLCD'].astype(int)
co_var.head()
import pandas as pd
f_nlcd = 'Data_07/NLCD_ID.csv'
nlcd_id = pd.read_csv(f_nlcd)
nlcd_id.head()
A GeoDataFrame needs a shapely object. We use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame. (note that points_from_xy() is an enhanced wrapper for [Point(x, y) for x, y in zip(df.Longitude, df.Latitude)])
import geopandas as gpd
gps_point = gpd.GeoDataFrame(
gps, geometry=gpd.points_from_xy(gps.long, gps.lat))
gps_point.head()
# Plot data
gps_point.plot()
We will transform point shape file from WGS 1984 to Albers Equal Area Conic NAD1983 (EPSG:5070). We could reproject the data from one CRS to another CRS using the geopandas method (i.e. function belonging to geopandas objects): to_crs(). Function .to_crs() will only work if your original spatial object has a CRS assigned to it AND if that CRS is the correct CRS for that data!
print(gps_point.crs)
We have define the projection before change CRS to projected coordinatesystem. We assign the WGS84 latitude-longitude CRS (EPSG:4326) to the crs attribute:
# Define CRS
gps_point.crs = ({'init' :'epsg:4326'})
Now we will repoject gps_point data using the CRS of GP_STATE shape file. First we extract CRS from State shape file
import geopandas as gpd
# US State shape files
fn="Data_07/GP_STATE.shp"
GP_STATE= gpd.read_file(fn)
# CRS of state polygon
GP_crs=GP_STATE.crs
GP_crs
# Repojection to epsg:5070 projection
point_proj=gps_point.to_crs(GP_crs)
import matplotlib.pyplot as plt
# Make subplots that are next to each other
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 8))
# Plot the data in WGS84 CRS
gps_point.plot(ax=ax1, facecolor='gray');
# Add title
ax1.set_title("WGS84");
# Plot the one
point_proj.plot(ax=ax2, facecolor='blue');
# Add title
ax2.set_title("Albers Equal Area Conic");
# Remove empty white space around the plot
plt.tight_layout()
Attribute joins are accomplished using the merge() function. In general, it is recommended to use the merge method called from the spatial dataset. The stand-alone merge() function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.
First we will create PD frame with all non-spatial data. We will use pd.merge() function to create a dataframe.
from functools import reduce
df1 = [id_df, soc,co_var]
df2 = reduce(lambda left,right: pd.merge(left,right,on='SiteID'), df1)
df2
Now we add NLCD Description column to the data-frame
from functools import reduce
d3 = [df2, nlcd_id]
df = reduce(lambda left,right: pd.merge(left,right,on='NLCD'), d3)
df
Now, we will merge GeoDataFrame (point_proj) with a pandas DataFrame (df) by SiteID of each each sampling location. Be
point_spdf = point_proj.merge(df, on='SiteID')
point_spdf.head()
point_spdf.to_file("Data_07/gp_point_all_data.shp")
Now we will extract projected geometry (m) from the spatial point data-frame.
# Get X (m) and Y (m) Values from geometry column
point_spdf['x'] = point_spdf.geometry.x
point_spdf['y'] = point_spdf.geometry.y
point_spdf.head()
Now we will convert geopandas data frame to pandas data frame without geometry column. We will use this data-frame for futher analysis.
import pandas as pd
point_df = pd.DataFrame(point_spdf.drop(columns=['geometry']))
# Organize variables
point_mf= point_df[['SiteID', 'long', 'lat', 'x', 'y', 'FIPS','STATE', 'COUNTY',
'SOC', 'DEM','Slope','MAP', 'MAT','NDVI','NLCD','NLCD_DES']]
point_mf.head()
point_mf.to_csv('Data_07/gp_all_data.csv', header =True, index=False)
import seaborn as sns
p, ax = plt.subplots(figsize=(10, 6))
ax=sns.boxplot(y="SOC",
x="NLCD_DES",
data=point_mf,
palette="Blues_d")
ax.axes.xaxis.label.set_text("NLCD")
ax.axes.yaxis.label.set_text("Soil OC (mg/g)")
#plt.title('Mean ± SE of Soil OC')
import plotly.express as px
fig = px.box(point_mf,
x="NLCD_DES",
y="SOC",
width=800,
height=500)
fig.update_layout(
title={
'text': "Sourface Soil Organic (mg/g)",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
xaxis_title="NLCD",
yaxis_title="Soil OC",
font=dict(
family="sans-serif, monospace",
size=18,
color="#7f7f7f"))
fig.show()