Unlike vector data, a raster data consists of cells or pixels organized into rows and columns as a matrix where each cell contains a value representing geographical feature on the earth. The size of the area on the surface that each pixel covers is known as the spatial resolution of the image. For instance, an image that has a 1 m spatial resolution means that each pixel in the image represents a 1 m x 1 m area. There are two types of raster data: continuous and discrete. An example of discrete raster data is Land use raster. Data types are flexible, including discrete and categorical data, such as soil or land-use maps, or continuous data as in digital elevation models, precipitation gradient maps, or pollutant concentration maps, etc.
Generally, two types of raster use in GIS and remote sensing application: a single band, or layer measure of a single characteristic of spatial area and multiple bands raster contains multiple spatially corresponding matrices of cell values representing the same spatial area. An example of a single-band raster data set is a digital elevation model (DEM). Each cell in a DEM contains only one value representing elevation of earth surface. Most satellite imagery has multiple bands, typically containing values within band of the electromagnetic spectrum.
In this exercise, we will use SRTM 90 Digital Elevation Model of Onondaga County, New York State which was downloaded from CGIAR-CSI.
This exercise will cover following raster processing in python:
The data set use in this exercise could be found here.
import os
path= "E:/GitHub/geospatial-python-github.io/Lesson_06_working_with_raster_data/"
os.chdir(path)
print("Current Working Directory " , os.getcwd())
%matplotlib inline
import sys
from glob import glob
from osgeo import ogr, gdal
from osgeo import gdalconst
import subprocess
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap, BoundaryNorm
import numpy as np
import seaborn as sns
import geopandas as gpd
import pycrs
import fiona
from fiona.crs import from_epsg
from shapely.geometry import box
from shapely.geometry import Point
import shapely.geometry as geoms
import rasterio as rio
from rasterio.plot import show
import rasterio.warp
import rasterio.shutil
from rasterio.warp import calculate_default_transform, reproject, Resampling
from rasterio.plot import plotting_extent
from rasterio.plot import show_hist
from rasterio.mask import mask
from rasterio.merge import merge
from rasterio import Affine, MemoryFile
from rasterio.enums import Resampling
from rasterio import plot
import rasterstats as rs
import georasters as gr
from rastertodataframe import raster_to_dataframe
import earthpy.spatial as es
import earthpy.plot as ep
import earthpy as et
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_validate
print(f"Python {sys.version}")
print(f"geopandas {gpd.__version__}")
print(f"osgeo {gdal.__version__}")
print(f"rasterio {rio.__version__}")
sns.set_style("white")
sns.set(font_scale=1.5)
In this exercise, we will use SRTM 90 Digital Elevation Model of Onondaga County, New York State which was downloaded from CGIAR-CSI. We will use rasterio package to open and visualization raster in Python.
dem_path = "Data_06/RASTER/DEM/Onondaga_DEM.tif"
dem=rio.open(dem_path)
dem.crs
dem.width
# raster column
dem.height
dem.count
type(dem)
DEM_meta=dem.meta
print(DEM_meta)
show(dem,cmap='RdYlGn')
from rasterio.plot import show_hist
show_hist(dem, bins=20, lw=0.0, stacked=False, alpha=0.3,
histtype='stepfilled', title="Histogram")
You can also open a raster layer with gdal.Open()
raster = gdal.Open(dem_path)
type(raster)
gt =raster.GetGeoTransform()
print (gt)
pixelSizeX = gt[1]
pixelSizeY =-gt[5]
print (pixelSizeX)
print (pixelSizeY)
raster.GetProjection()
data = raster.GetRasterBand(1).ReadAsArray()
plt.imshow(data,cmap='gray')
plt.show()
We can export a raster file in python using the rasterio write() function.
Use rio.open() to create a new blank raster ‘template’. Then write the DEM numpy array to to that template using dst.write().
outTIF= "Data_06/RASTER/DEM/Onondaga_DEM_NEW.tif"
dem_meta=dem.profile
with rio.open(outTIF, 'w', **dem_meta) as dst:
dst.write(dem.read(1), 1)
src = rio.open("Data_06/RASTER/DEM/Onondaga_DEM_NEW.tif")
show(src)
Now I will show how to write raster with GDAL driver. We will use a method to save the data in a format provided by the user.
g = gdal.Open ("Data_06/RASTER/DEM/Onondaga_DEM_NEW.tif" )
# Get the projection information
g.GetProjection()
# Get the x, y and number of bands from the original file
x_size, y_size, n_bands = g.RasterXSize, g.RasterYSize, g.RasterCount
data = g.ReadAsArray ()
# Get a handler to a driver
driver = gdal.GetDriverByName ( "HFA" )
data
# Next line creates the output dataset with
# 1. The filename ("Onondaga_DEM.img")
# 2. The raster size (x_size, y_size)
# 3. The number of bands
# 4.The data type (in this case, Byte.
# Other typical values might be gdal.GDT_Int16
# or gdal.GDT_Float32)
dataset_out = driver.Create( "Data_06/RASTER/DEM/Onondaga_DEM_gdal.img", x_size, y_size, n_bands, \
gdal.GDT_Byte )
# Set the output geotransform by reading the input one
dataset_out.SetGeoTransform ( g.GetGeoTransform() )
# Set the output projection by reading the input one
dataset_out.SetProjection ( g.GetProjectionRef() )
# Now, get band # 1, and write our data array.
# Note that the data array needs to have the same type
# as the one specified for dataset_out
dataset_out.GetRasterBand ( 1 ).WriteArray ( data )
# This bit forces GDAL to close the file and write to it
dataset_out = None
!gdalinfo "Data_06/RASTER/DEM/Onondaga_DEM_gdal.img"
For loading multi-bands image in Python, we will use a sub-set of Landsat 8 multispectral image covering Onondaga county of New York state. This image was downloaded from USGS Earth Explore.
b2="Data_06/RASTER/LC08_20150706_subset/LC08_L1TP_015030_20150716_20170226_01_T1_B2.TIF"
b3="Data_06/RASTER/LC08_20150706_subset/LC08_L1TP_015030_20150716_20170226_01_T1_B3.TIF"
b4="Data_06/RASTER/LC08_20150706_subset/LC08_L1TP_015030_20150716_20170226_01_T1_B4.TIF"
b5="Data_06/RASTER/LC08_20150706_subset/LC08_L1TP_015030_20150716_20170226_01_T1_B5.TIF"
b2_r=rio.open(b2)
b3_r=rio.open(b3)
b4_r=rio.open(b4)
b5_r=rio.open(b5)
# Initialize subplots
import matplotlib.pyplot as plt
%matplotlib inline
fig, (ax1, ax2, ax3, ax4) = plt.subplots(ncols=4, nrows=1, figsize=(10, 4), sharey=True)
# Plot 4 bands
show(b2_r, ax=ax1)
show(b3_r, ax=ax2)
show(b4_r, ax=ax3)
show(b5_r, ax=ax4)
# Add titles
ax1.set_title("Band-2")
ax2.set_title("Band-3")
ax3.set_title("Band-4")
ax4.set_title("Band-5")
Some remote sensing datasets are stored with each band in a separate file. However, often you want to use all of the bands together in your analysis. For example you need all of the bands together in the same file or “stack” in order to plot a color RGB image. EarthPy has a stack() function that allows you to take a set of .tif files that are all in the same spatial extent, CRS and resolution and either export them together a single stacked .tif file or work with them in Python directly as a stacked numpy array.
To begin using the EarthPy stack() function, import the needed packages and create an array to be plotted. Below you plot the data as continuous with a colorbar using the plot_bands() function.
We will use es.stack() function of earthpy library to create a raster stack of multi-band raster. It need three steps:
You will use the glob function see the files in landsat data directory
glob ("Data_06/RASTER/LC08_20150706_subset/*")
Now we will create a list for band (2,3,4 and 5)
multi_bands=glob( "Data_06/RASTER/LC08_20150706_subset/*B[2:3:4:5]*.tif")
multi_bands
Now you have a list of all of the landsat bands in your landsat collections folder. You could chose to open each file individually using the rio.open (rasterio library) function.
Remember that Python uses 0 based indexing so band 3 is actually at index [2] not [3].
with rio.open(multi_bands[3]) as src:
landsat_band4 = src.read()
ep.plot_bands(landsat_band4[0],
title="Landsat NIR (Band-5)",
scale=False,
figsize=(5, 6))
plt.show()