I have published a comprehensive tutorial related to Geospatial Data Science in R. Due to the high demands for a similar kind of tutorial in Python, I have attempted to duplicate my R-tutorial in Python. I have to create this tutorial for the students who are from different disciplines such as agriculture, soil science, environmental health, environmental engineering, and data science. Most of them have no prior knowledge of GIS, remote sensing, or any other area of geoinformatics. But working with spatial data, it is necessary to know how to process spatial data from different domains and need to familiar with some basic spatial data analysis techniques.
Python is an open-source scripting language and uses in different GIS Software packages (such as ArcGIS, QGIS, PostGIS). It is highly efficient for big data analyzing and supports most of the data formats. It is challenging for me to develop a comprehensive Geospatial Analysis tutorial in Python like R since my Python coding skill not as good as R.
This tutorial has been tested to work on Windows 10 with Anaconda3 64 bit, using conda.
First you need to install Python (anaconda python) and necessary python modules that are used to perform various GIS-tasks.
Install Anaconda to your computer by double clicking the installer and install it into a directory you want (needs admin rights). Install it to all users and use default settings.
Creating a new environment is not strictly necessary, but given that installing other geospatial packages from different channels may cause dependency conflicts (as mentioned in the note above), it can be good practice to install the geospatial stack in a clean environment starting fresh.
The following commands in command prompt create a new environment with the name PyGeo, in C drive
# (base) c:\users\zahmed2>conda create --prefix PyGeo python=3.7
To activate this environment, use
# (base) c:\users\zahmed2>conda activate PyGeo
To deactivate an active environment, use
# (C:\PyGeo) c:\users\zahmed2>conda deactivate
It is easy to install Jupyter notebooks with the following command:
conda install -y jupyter
numpy: the fundamental package for array computing with Python
pandas: powerful data structures for data analysis, time series, and statistics
scipy: scientific Library for Python
matplotlib: Python plotting package
scikit-learn: a set of python modules for machine learning and data mining
statsmodels:statistical computations and models for Python
bokeh: interactive plots and applications in the browser from Python
h2o: fast Scalable Machine Learning for Python
tensorflow:an open source machine learning framework
conda install scipy
conda install -c anaconda scikit-learn
conda install -c anaconda pandas
conda install -c anaconda numpy
conda install -c anaconda statsmodels
conda install -c anaconda h2o
conda install -c conda-forge pillow
GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.
GeoPandas depends for its spatial functionality on a large geospatial, open source stack of libraries (GEOS, GDAL, PROJ). See the Dependencies section below for more details. Those base C libraries can sometimes be a challenge to install. Therefore, advise you to closely follow the recommendations below to avoid installation problems. for installtion please see http://geopandas.org/install.html
Required dependencies:
numpy
pandas (version 0.23.4 or later)
shapely (interface to GEOS)
fiona (interface to GDAL)
pyproj (interface to PROJ)
Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is based on the widely deployed GEOS (the engine of PostGIS) and JTS (from which GEOS is ported) libraries. Shapely is not concerned with data formats or coordinate systems, but can be readily integrated with packages that are.
Fiona reads and writes geographic data files and thereby helps Python programmers integrate geographic information systems with other computer systems. Fiona contains extension modules that link the Geospatial Data Abstraction Library (GDAL).
pyproj Python interface to PROJ (cartographic projections and coordinate transformations library).
six is a Python 2 and 3 compatibility library.
Rtree Spatial indexing for Python for quick spatial lookups.
conda install --channel conda-forge geopandas
The GDAL Python package and extensions are a number of tools for programming and manipulating the GDAL Geospatial Data Abstraction Library. Actually, it is two libraries – GDAL for manipulating geospatial raster data and OGR for manipulating geospatial vector data.
conda install gdal
The Python Shapefile Library (PyShp) provides read and write support for the Esri Shapefile format. The Shapefile format is a popular Geographic Information System vector data format created by Esri. You can install PyShp using following comand in your Terminal (Ubuntu) or cmd (Windows)
To read a shapefile create a new "Reader" object and pass it the name of an existing shapefile. The shapefile format is actually a collection of three files. You specify the base filename of the shapefile or the complete filename of any of the shapefile component files.
conda install -c conda-forge pyshp
Earthpy is a set of helper functions to make working with spatial data in open source tools easier. This package is maintained by Earth Lab and was originally designed to support the earth analytics education program.
conda install -c conda-forge earthpy
Rasterio CAN read and write GeoTIF and other raster formats and provides a Python API based on N-D arrays and and GeoJSON. Installing rasterio from the conda-forge channel can be achieved by adding conda-forge to your channels with:
conda install -c conda-forge rasterio
georasters package is a python module that provides a fast and flexible tool to work with GIS raster files. It provides the GeoRaster class, which makes working with rasters quite transparent and easy. In a way it tries to do for rasters what GeoPandas does for geometries.
conda install -c conda-forge georasters
rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.
conda install -c conda-forge rasterstats
The mgwr module provides functionality to calibrate multiscale (M)GWR as well as traditional GWR. It is built upon the sparse generalized linear modeling (spglm) module.
conda install -c conda-forge mgwr
PySAL, the Python spatial analysis library, is an open source cross-platform library for geospatial data science with an emphasis on geospatial vector data written in Python. It supports the development of high level applications for spatial analysis, such as
conda install -c anaconda pysal
OSMnx is a package to easily download, model, project, visualize, and analyze complex street networks from OpenStreetMap in Python with NetworkX.
conda install -c conda-forge osmnx
Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.Cartopy makes use of the powerful PROJ.4, NumPy and Shapely libraries and includes a programmatic interface built on top of Matplotlib for the creation of publication quality maps.
conda install cartopy
OWSLib is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models.
OWSLib was buried down inside PCL, but has been brought out as a separate project in r481.
conda install -c conda-forge owslib
mapclassify is mapclassify is an open-source python library for Choropleth map classification. It is part of a refactoring of PySAL.
conda install -c conda-forge mapclassify
descartes Use geometric objects as matplotlib paths and patches
conda install -c conda-forge descartes
pyseds is simple and fast watershed delineation in python
conda install -c conda-forge pysheds
RichDEM — High-Performance Terrain Analysis
conda install -c giswqs richdem
geoplot is a high-level Python geospatial plotting library. It’s an extension to cartopy and matplotlib which makes mapping easy.
conda install -c conda-forge geoplot
folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in a Leaflet map via folium.
conda install -c conda-forge folium
The recommended way to install RSGISlib locally is from conda-forge using the following commands on MacOS and Linux:The Remote Sensing and GIS Software Library (RSGISLib) is a collection of tools for processing remote sensing and GIS datasets. The tools are accessed using Python bindings or an XML interface.
The binary downloads available for Windows, Linux and MacOD, built using Python 3, through conda. You can get conda through the Anaconda or Miniconda Python distributio. The recommended way to install RSGISlib locally is from conda-forge using the following commands on MacOS and Linux:
conda create -n osgeo-env-v1 python=3.7
source activate osgeo-env-v1
conda install -c conda-forge rsgislib
python -m ipykernel install --user --name osgeo-env-v1 --display-name "Python 3.7 (OSGEO-ENV)"
GeostatsPy ncludes functions that run 2D workflows in GSLIB from Python (i.e. low tech wrappers), Python translations and in some cases reimplementations of GSLIB methods, along with utilities to move between GSLIB's Geo-EAS data sets and DataFrames, and grids and 2D Numpy arrays respectively and other useful operations such as resampling from regular datasets and rescaling distributions. Here's a sumary list of functions avaible.
For installtion, you have to download . whl file (geostatspy-0.0.2-py3-none-any.whl (20.3 kB) for your OS and install it in your system.
pip install geostatspy==0.0.2
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
conda install py-xgboost
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
The Keras library is a high-level API that runs on top of TensorFlow for building deep learning models. Often, building a very complex deep learning network with Keras can be achieved with only a few lines of code. For installtion of TensorFlow GUP in Python 3.7, please see Jeff Heaton very usefull github site. I follow mostly his class website for regression analysis with Keras.
conda install tensorflow or conda install tensorflow-gpu
H2O is a fully open source, distributed in-memory machine learning platform with linear scalability. H2O supports the most widely used statistical & machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more.
conda install -c h2oai h2o
mlxtend is a library of Python tools and extensions for data science.
conda install mlxtend --channel conda-forge