Introduction
Before we start outlining the workflow of importing GeoParquet File into ArcGIS Pro, We need to know why we have to use GeoParquet File for Spatial Data Integration.
GeoParquet is rapidly evoloving, with numerous new software libraries and tools being introduced. Apache Parquet is an open-source, columnar data storage format designed for efficient data loading. It offers excellent compression, significantly reducing file sizes. Parquet supports APIs in languages like Python, Java, C++, and more, and integrates seamlessly with Apache Arrow.
Here are the key benefits of GeoParquet:
- Efficient compression that reduces cloud storage costs.
- Data skipping and field statistics that enhance data processing performance by loading smaller data chunks.
- Scalability to handle large datasets of any type.
- While Parquet is excellent for storing complex, large datasets, it lacks geospatial support, which led to the creation of GeoParquet.
And you can know more information about GeoParquet Cloud Native Geospatial Format,its benefits, goals and features from here https://geoparquet.org/.
Technical Guide
Our goal in this article is to import the parquet file into ArcGIS Pro, Since it's not natively supported in ArcGIS Pro.
I searched and investigated for a while to get to this simple workflow to import the parquet file into ArcGIS Pro.
Download GeoParquet from any source, For example :
https://source.coop/tabaqat/riyadh-places/riyadh_places.parquet
To import this Parquet File into ArcGIS Pro , You need to check the installation of Python on you device.
By running `python --version` command in Windows Command Prompt or Windows Powershell.
data:image/s3,"s3://crabby-images/791e3/791e351f1aa745b914a0ee125d3bc498483b9500" alt="Screenshot 2024-11-30 191216"
If python is not installed, this message will appear "Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases."
data:image/s3,"s3://crabby-images/df163/df163f65693a8dab3c90fe3cbcaf4e43cbbaf0df" alt="Screenshot 2024-11-30 191231"
To install python visit this website and download the release compatible with your device https://www.python.org/downloads/release/python-3130 , (If you have windows on your device download Windows installer (64-bit)
data:image/s3,"s3://crabby-images/e148e/e148e3663a6b2ca5adb82282f5080e32a2504299" alt="Screenshot 2024-11-30 185714"
Open downloaded python-3.13.0-amd64 .exe file, install it on your device and don't forget to check on Add python.exe to PATH in installation Page, then finish installation process by default.
data:image/s3,"s3://crabby-images/3c902/3c902346fc783b4d54764b9d6de29a45c69edce5" alt="Screenshot 2024-11-30 185844"
Once you installed python on your device, Open Windows Powershell and re-run the command again `python --version` .
data:image/s3,"s3://crabby-images/d1673/d1673d9d638c224a2fd5a17611117dddecba5c8c" alt="Screenshot 2024-11-30 191341"
the output message indicates that the python version is 3.13.0.
The next step is to install geopandas , pyarrow and pyogrio python packages to enable you to import geoparquet file into ArcGIS Pro.
But we need to know why specifically geopandas , pyarrow and pyogrio ?
GeoPandas:
- Supports GeoParquet natively.
- Handles geospatial operations (e.g., reprojection, attribute management).
- Exports to ArcGIS-compatible formats.
PyArrow:
- Efficiently reads/writes Parquet files.
- Optimizes performance with compression and columnar storage.
- Integrates seamlessly with GeoPandas.
PyOgrio
- Write the GeoDataFrame to a File Geodatabase using the write_dataframe function from pyogrio.
Install the python packages by running the following commands in windows powershell :
for geopandas :
` pip install geopandas `
data:image/s3,"s3://crabby-images/ce0b6/ce0b64f704922c178edd130a41d397380513eac3" alt="Screenshot 2024-11-30 193201"
geopandas 1.0.1 version will be installed successfully.
data:image/s3,"s3://crabby-images/ee675/ee675ce27e0d8939af05c8ae8eb0f4f13787ed06" alt="Screenshot 2024-11-30 1930491"
for installing pyarrow package, run the following command :
` pip install pyarrow`
data:image/s3,"s3://crabby-images/bd895/bd8959fe6870601a7b48d25a3e7dcd0cd0ca52d7" alt="Screenshot 2024-11-30 194306"
pyarrow 18.1.0 version will be installed successfully.
data:image/s3,"s3://crabby-images/d7ee9/d7ee9ba0fc65d8827f4d2c47cfdef8fd17974f42" alt="Screenshot 2024-11-30 193239"
for installing pyogrio package, run the following command :
` pip install pyogrio`
data:image/s3,"s3://crabby-images/85c77/85c77bfdafd03ec302689879f8d93783725c99c3" alt="Screenshot 2024-12-03 185118"
pyogrio 0.10.0 will be installed successfully.
data:image/s3,"s3://crabby-images/3a163/3a1632315917c8c8b8b945552ca323d42f0851ec" alt="Screenshot 2024-12-03 185139"
After installing the required python packages to import the geoparquet file into ArcGIS Pro, run ` python` in windows powershell to enable you to interact with python interpreter to execute the following commands.
data:image/s3,"s3://crabby-images/5c474/5c4745e1ccd0fa1b16c4daaea3af6a809fa38bb3" alt="Screenshot 2024-11-30 195327"
Then, import the required libraries to import the geoparquet file into ArcGIS Pro.
` import pandas as pd `
` import geopandas as gpd `
` import pyarrow.parquet as pq `
` from shapely import wkb `
` import pyogrio `
data:image/s3,"s3://crabby-images/5b09b/5b09b7349d340927f700a5d434ddfc0a4565785f" alt="Screenshot 2024-11-30 195707"
Read the Parquet file using pyarrow using this command :
` table = pq.read_table(r'C:\Users\Sarah\riyadh_places.parquet') `
Then ,
` df = table.to_pandas() `
data:image/s3,"s3://crabby-images/76bfe/76bfeb74df4dff821b173b65a7e410eec6629a46" alt="Screenshot 2024-11-30 200605"
Check the first few rows to confirm the structure of geoparquet file.
` print(df.head()) `
data:image/s3,"s3://crabby-images/9d148/9d148fd158f252ae582e2f2b736394b13ae9ea46" alt="Screenshot 2024-11-30 200809"
Geometry columns are stored in wkb (well known binary) data format. So we need to decode WKB geometry column to convert binary geometry data stored in WKB format into a more interpretable or usable format, such as a geometry object in a GIS or spatial library (e.g., Shapely or GeoPandas).
By running the following command :
` df['geometry'] = df['geometry'].apply(lambda x: wkb.loads(x)) `
data:image/s3,"s3://crabby-images/9bf86/9bf8650caef231efc0e705ae83eea34dbd95a314" alt="Screenshot 2024-11-30 201413"
Convert the dataframe into geodataframe by running this command :
` gdf = gpd.GeoDataFrame(df, geometry='geometry') `
Then , Choose the output data format either FGDB (Esri File Geodatabase) or any other format compatible with ArcGIS Pro.
Define the path for the output File Geodatabase :
` output_gdb = r'D:/output_fgdb/riyadh_places.gdb' `
data:image/s3,"s3://crabby-images/2a4f9/2a4f9a34ff7bc0bb99054ecca3a08cfd4f542fd1" alt="Screenshot 2024-12-03 191104"
Then , Use driver='OpenFileGDB' argument to ensure that pyogrio uses the correct driver for writing the .gdb file by executing this command :
` pyogrio.write_dataframe(gdf, output_gdb, driver='OpenFileGDB', layer='riyadh_places')' `
data:image/s3,"s3://crabby-images/1d6f1/1d6f1d2c5b5625fda6930b9fbc08e693c8374984" alt="Screenshot 2024-12-03 191303"
A File Geodatabase will be generated successfully in the specified folder in Path as shown here.
data:image/s3,"s3://crabby-images/6a472/6a4723e5fff59bfd0eda1e6632829b8648368a91" alt="Screenshot 2024-12-03 191454"
Open a New Project in ArcGIS Pro , Click on Add Data and navigate to the output File Geodatabase Path to visualize the output FGDB in ArcGIS Pro.
data:image/s3,"s3://crabby-images/24179/24179e7630524931d53ca8495e6d409f6b092594" alt="Screenshot 2024-12-03 191818"
The feature layer of "riyadh_places" will be imported Successfully to ArcGIS Pro which is derived from "Riyadh Places Dataset Derived From Source Cooperative " tabaqat" Repository".
data:image/s3,"s3://crabby-images/867d1/867d1707c36f494374b9db4db6b756e66d42ec1a" alt="Screenshot 2024-12-03 191927"
Conclusion
By using GeoPandas , PyArrow and PyOgrio Python Packages you have converted GeoParquet files into compatible formats with ArcGIS Pro (FGDB). This allowed you to efficiently work with GeoParquet data in ArcGIS Pro. As the format gains popularity, ArcGIS Pro may eventually include built-in support for GeoParquet.