Building Spatial DataΒΆ

Compiling spatial data from raw Latitude/Longitude coordinate pairs sometimes is tedious and error-prone, especially for social scientists without much background in GIS. This example will guide you on how to convert raw Latitude/Longitude coordinate pairs data to spatial data using our function df_to_gdf.

The data we are using is making-up Latitude/Longitude coordinate pairs of Person 1 (P1) and Person 2 (P2) in Pennsylvania, USA from 2020-01-01 to 2020-08-01. See here for information about how we compile the data. You can download all the data we will be using from this GitHub Repository. Make sure the directory is your own working directory when implementing the following steps.

To begin with, we need to import libraries we will be using to read and manipulate data.

[1]:
%matplotlib inline

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

Now, we load our data example.

[2]:
df = pd.read_csv('../data/example.csv')
df.head()
[2]:
pid timestamp latitude longitude
0 P2 2020-04-27 10:42:22.162176000 40.993799 -76.669419
1 P2 2020-06-02 01:12:45.308505600 39.946904 -78.926234
2 P2 2020-05-08 23:47:33.718185600 41.237403 -79.252317
3 P2 2020-04-26 14:31:12.100310400 41.991390 -77.467769
4 P2 2020-03-31 15:53:27.777897600 41.492674 -76.542921

We import our geodf module. The geodf module has a function df_to_gdf which takes three parameters:

  • df: This is the name of your DataFrame
  • x: This is the column name of your Longitude
  • y: This is the column name of your Latitude

Make sure that you pass your Longitude and Latitude columns to x and y, respectively.

[3]:
from gps2space import geodf
[4]:
gdf = geodf.df_to_gdf(df, x='longitude', y='latitude')
gdf.head()
[4]:
pid timestamp latitude longitude geometry
0 P2 2020-04-27 10:42:22.162176000 40.993799 -76.669419 POINT (-76.66942 40.99380)
1 P2 2020-06-02 01:12:45.308505600 39.946904 -78.926234 POINT (-78.92623 39.94690)
2 P2 2020-05-08 23:47:33.718185600 41.237403 -79.252317 POINT (-79.25232 41.23740)
3 P2 2020-04-26 14:31:12.100310400 41.991390 -77.467769 POINT (-77.46777 41.99139)
4 P2 2020-03-31 15:53:27.777897600 41.492674 -76.542921 POINT (-76.54292 41.49267)

Now the data are in spatial format with a geometry column that stores the geographical location information, we can plot the Point GeoDataFrame on the map of Pennsylvania, USA. Please note the spatial data is in WGS84 datum and is not projected.

[5]:
pa = gpd.read_file('../data/pacounty.shp')
[6]:
ax = pa.boundary.plot(figsize=(12, 12), edgecolor='black', linewidth=0.6)
gdf.plot(ax=ax, color='r')

plt.show();
_images/building_spatial_data_11_0.png

We can then save the spatial data to shp file.

[7]:
gdf.to_file('../data/example.shp')