Geo-processing is a GIS operation used to manipulate spatial data. In this exercise we will learn following Geo-processing operations of vector data in Python.
We will use following data set, and data could available for download from here.
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame
from shapely.geometry import Point
from shapely.geometry import box
from mpl_toolkits.axes_grid1 import make_axes_locatable
import adjustText as aT
path= "E:/GitHub/geospatial-python-github.io/Lesson_04_geoprocessing_vector_data"
os.chdir(path)
#print("Current Working Directory " , os.getcwd())
Clipping spatial data is a basic GIS task. For vector data, it involves removing unwanted features outside of an area of interest. For example, you might want to do some geospatial modeling covering a area in New York state, but we may have data for USA, in this case you need to apply clipping function to remove area outside of the New York State. It acts like a cookie cutter to cut out a piece of one feature class using one or more of the features in another feature class.
You can do this several ways.In this exercise, we will clip out other state or counties from US State and County polygon shape files, expect our area of interest (for example New York).
# US State shape files
fp_state="Data_04/US_STATE.shp"
state= gpd.read_file(fp_state)
# US County
fp_county="Data_04/US_COUNTY.shp"
county= gpd.read_file(fp_county)
%matplotlib inline
import matplotlib.pyplot as plt
# Make subplots that are next to each other
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 8))
# Plot US State
state.plot(alpha=1, color="white", edgecolor="gray", ax=ax1)
# Add title
ax1.set_title("US State");
# Plot US County
county.plot(alpha=1, color="white", edgecolor="gray", ax=ax2);
# Add title
ax2.set_title("US Couty");
# Remove empty white space around the plot
plt.tight_layout()
# Get state names
state['STATE']
NY_state = state[state['STATE'] == 'New York']
p1=NY_state.plot(color="white", edgecolor="gray")
p1.set_title('New York State)')
county.head()
The most useful function to select a area of interest using normal pandas command to create a sub-set of data.
NY_county = county[county['STATE'] == 'New York']
p2=NY_county.plot(color="white", edgecolor="gray")
p2.set_title('New York County)')
state[state.REGION == 'Northeast']. plot(color="white", edgecolor="gray")
You can select multiple states using follwing command:
GP_states_NAME = ['Colorado', 'Kansas','New Mexico', 'Wyoming']
GP_STATE= state[state['STATE'].isin(GP_states_NAME)]
GP_STATE.head()
GP_STATE.plot(color="white", edgecolor="gray")
# Write as ESRI shape file
GP_STATE.to_file("Data_04/GP_STATE.shp")
To remove the points that are outside of your study area, you can clip the data. Removing or clipping data can make the data smaller and inturn plotting and analysis faster.
One way to clip a points layer is to:
Create a mask where every point that overlaps the polygon that you wish to clip to is set to true
Apply that mask to filter the geopandas dataframe.
To clip the data you first create a unified polygon object that represents the total area covered by your clip layer. If your study area contains only one polygon you can use boundary.geometry[0] to select the first (and only) polygon n the layer. You can also use .unary_union if you have many polygons in your clip boundary. unary.union will combine all of the polygons in your boundary layer into on vector object to use for clipping. Next you can use the .intersects() method to select just the points within the pop_places object that fall within the geometry in the poly object.
We will clip GP_POINT_SOC_PROJ shape file by CO_State shape. file.
gp_point="Data_04/GP_POINT_SOC_PROJ.shp"
gp_point_proj= gpd.read_file(gp_point)
fig, ax = plt.subplots()
GP_STATE.plot(ax=ax, color="white", edgecolor="gray")
gp_point_proj.plot(ax=ax, color='blue', markersize=5)
CO_state = state[state['STATE'] == 'Colorado']
CO_state.plot()
CO_poly = CO_state.geometry.unary_union
CO_point = gp_point_proj[gp_point_proj.geometry.intersects(CO_poly)]
fig, ay = plt.subplots()
CO_state.plot(ax=ay, color="white", edgecolor="gray")
CO_point.plot(ax=ay, color='blue', markersize=5)
ay.set_title('Soil Sampling Location at CO State')
We can save each each state s into separate Shapefiles and named the file according to the state names. Before that you have to group data. The grouping operations can be really handy when dealing with Shapefiles. Doing similar process manually would be really laborious and error-prone. One really useful function that can be used in Pandas/Geopandas is .groupby(). Group by function is useful to group data based on values on selected column(s).
# Group the data by column 'STATE'
grouped = GP_STATE.groupby('STATE')
grouped
# Iterate over the group object
for key, values in grouped:
GP_STATE = values
# Let's see what is the LAST item
GP_STATE
outFolder =path
# Create a new folder called 'Results' (if does not exist) to that folder using os.makedirs() function
resultFolder = os.path.join(outFolder, 'GP_STATES')
if not os.path.exists(resultFolder):
os.makedirs(resultFolder)
# Iterate over the
for key, values in grouped:
# Format the filename (replace spaces with underscores)
outName = "%s.shp" % key.replace(" ", "_")
# Print some information for the user
print("Processing: %s" % key)
# Create an output path
outpath = os.path.join(resultFolder, outName)
# Export the data
values.to_file(outpath)
Dissolve aggregate features based on the attribute. It is an important tools that we may need to perform regularly in spatial data processing.
In this excercise will create GP_state boundary after dissloving states boundary of CO, KS, NM and WY.
gp_state= gpd.read_file("Data_04/GP_STATE.shp")
gp_state.head()
gp_bd = gp_state.dissolve(by='STATE',aggfunc='sum')
gp_bd.plot()
gp_bd.head()
Union or Merge combines two or multiple spatial objects and a create new features where geometry and attributes of input features retain.
We will use state boundary of CO, AK, NY and WY to create a new feature class using .concat function of geopandas package.
CO = gpd.read_file('Data_04/GP_STATES/Colorado.shp')
KS = gpd.read_file('Data_04/GP_STATES/Kansas.shp')
NM = gpd.read_file('Data_04/GP_STATES/New_Mexico.shp')
WY = gpd.read_file('Data_04/GP_STATES/Wyoming.shp')
# Merge files
gdf_merge = gpd.GeoDataFrame(pd.concat([CO, KS,NM, WY]))
gdf_merge.plot()
You can merge hundreds of spatial polygons in a folder with similar geometry and attribute table using .concat function function in a loop. First, you have to create a list these shape files using folder.glob function, then use for loop to read all the files using .read_file function.
from pathlib import Path
folder = Path("Data_04/GP_STATES")
shapefiles = folder.glob("*.shp")
gdf = pd.concat([
gpd.read_file(shp)
for shp in shapefiles
]).pipe(gpd.GeoDataFrame)
gdf.to_file(folder / 'GP_STATES_merge.shp')
gp_state_merge= gpd.read_file("Data_04/GP_STATES/GP_STATES_merge.shp")
gp_state_merge.plot()
gp_state_merge.head()
STATE = gpd.read_file('Data_04/US_STATE.shp')
PARK = gpd.read_file('Data_04/Yellow_Stone.shp')
PARK.plot()
#park_state=gpd.sjoin(STATE,PARK, op='intersects')
park_state=gpd.overlay(STATE,PARK, how='intersection')
park_state.plot()
park_state.head()
In the shapefiles we have polygons which describe the shape of three states: Idaho, Montana and Wyoming. To place a label on each states we ideally need to find an identifiable point which exists within each polygon so that we can say where we want the text to be placed. Now we will plot th intersect plot with state names.
park_state["center"] =park_state["geometry"].centroid
park_state_points = park_state.copy()
park_state_points.set_geometry("center", inplace = True)
ax = park_state.plot(figsize = (6, 6), color = "whitesmoke", edgecolor = "lightgrey", linewidth = 0.5)
texts = []
for x, y, label in zip(park_state_points.geometry.x, park_state_points.geometry.y, park_state_points["STATE_1"]):
texts.append(plt.text(x, y, label, fontsize = 8))
park_map=aT.adjust_text(texts, force_points=0.2, force_text=0.3, expand_points=(1,1), expand_text=(1,1),
arrowprops=dict(arrowstyle="-", color='grey', lw=0.5))
ax.set_title('Yellow Stone National Park')
Buffering creates an envelope of space around selected features in a vector data. It is sometimes referred to as a zone of a specified distance around a polygon, line, or point feature. Buffering is often used for proximity analysis. In this section, we will create 400 m buffer zones around the road network and soil sampling points of CO. Such a buffer could be used later on to examine the extent of farmland or sampling points within the buffer, etc. We will use a small part of road-network of Ononda County to create 100 m buffer around them.
%matplotlib inline
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import shapely.geometry as geoms
from shapely.geometry import box
road = gpd.read_file('Data_04/Ononda_Street_PROJ.shp')
# buffer_lines =road.geometry.apply(lambda g: g.buffer(250, cap_style=1))
road_buffer= road.buffer(250)
fig, ay = plt.subplots()
road_buffer.plot(ax=ay, color="white", edgecolor="gray")
road.plot(ax=ay, color='red', markersize=5)
ay.set_title('250 m Buffer')
points = gpd.read_file('Data_04/GP_POINT_SOC_PROJ.shp')
points.head()
# Remove columns
columns=['SiteID','Longitude','Latitude','StateID','LandCover1','LandCover2','SOC']
points.drop(columns, axis=1, inplace=True)
points.head()
# Create a buffered polygon layer from your plot location points
points_poly =points.copy()
# Buffer each point using a 100 meter circle radius and replace the point geometry with the new buffered geometry
points_poly["geometry"] = points_poly.geometry.buffer(100)
points_poly.head()