Data Pipeline¶
This guide covers the full data pipeline: downloading species biomass rasters from the FIA BIGMAP service, converting them to Zarr format, and running analysis.
Overview¶
flowchart LR
A["BIGMAP API"] --> B["download_species()"]
B --> C["GeoTIFF Files"]
C --> D["create_zarr()"]
D --> E["Zarr Store"]
E --> F["calculate_metrics()"]
F --> G["Results"]
E --> H["create_maps()"]
H --> I["Visualizations"]
Step 1: Download Species Data¶
Use download_species() to fetch biomass rasters from the USDA Forest Service BIGMAP ImageServer:
By State¶
from gridfia import GridFIA
api = GridFIA()
# Download all species for an entire state
files = api.download_species(
state="Montana",
output_dir="data/montana"
)
print(f"Downloaded {len(files)} species files")
# Download specific species only
files = api.download_species(
state="Montana",
species_codes=["0202", "0122"], # Douglas-fir, Ponderosa pine
output_dir="data/montana_pines"
)
By County¶
files = api.download_species(
state="North Carolina",
county="Wake",
species_codes=["0131", "0068"],
output_dir="data/wake"
)
By Bounding Box¶
files = api.download_species(
bbox=(-79.5, 35.5, -78.5, 36.5),
crs="4326",
species_codes=["0131"],
output_dir="data/custom"
)
By Polygon¶
files = api.download_species(
polygon="study_area.geojson",
species_codes=["0131", "0068"],
output_dir="data/study_area"
)
With Boundary Clipping¶
Store the actual state/county boundary for later clipping during Zarr creation:
files = api.download_species(
state="Texas",
county="Harris",
species_codes=["0131"],
output_dir="data/harris",
use_boundary_clip=True # Stores boundary for create_zarr()
)
Step 2: Create Zarr Store¶
Convert downloaded GeoTIFF files to cloud-optimized Zarr format:
# Basic usage
zarr_path = api.create_zarr(
input_dir="data/montana_pines",
output_path="data/montana_pines.zarr"
)
# With custom chunking and compression
zarr_path = api.create_zarr(
input_dir="data/montana_pines",
output_path="data/montana_pines.zarr",
chunk_size=(1, 2000, 2000), # Larger chunks for faster reads
compression="zstd",
compression_level=3
)
# With polygon clipping (auto-detect from saved config)
zarr_path = api.create_zarr(
input_dir="data/harris",
output_path="data/harris.zarr",
clip_to_polygon=True # Uses boundary saved by download_species()
)
Validate the Store¶
info = api.validate_zarr(zarr_path)
print(f"Shape: {info['shape']}")
print(f"Species: {info['num_species']}")
print(f"CRS: {info['crs']}")
Why Zarr?¶
- Chunked storage -- only load the data you need
- Compression -- typically 3-5x reduction in storage size
- Parallel access -- multiple processes can read simultaneously
- Cloud-ready -- works with S3, GCS, and other object storage
- Expandable -- add species without rewriting existing data
Data Structure¶
The Zarr store organizes data as a 3D array:
Dimensions: (species, height, width)
species[0]: Total biomass (sum of all species)
species[1]: Species 1 (e.g., Loblolly Pine)
species[2]: Species 2 (e.g., Douglas-fir)
...
Default chunking is (1, 1000, 1000) -- one species layer at a time, in 1000x1000 pixel spatial tiles (~4MB per chunk with LZ4 compression).
Step 3: Calculate Metrics¶
Run forest metric calculations on the Zarr data:
# List available calculations
calcs = api.list_calculations()
print(f"Available: {calcs}")
# Run specific calculations
results = api.calculate_metrics(
zarr_path,
calculations=["species_richness", "shannon_diversity", "total_biomass"],
output_dir="output/metrics"
)
for result in results:
print(f"{result.name}: {result.output_path}")
With Statistical Context¶
api = GridFIA(seed=42)
results = api.calculate_metrics_with_stats(
zarr_path,
calculations=["shannon_diversity", "species_richness"],
n_bootstrap=1000,
confidence_level=0.95
)
for name, result in results.items():
print(f"{name}: {result.value:.3f} "
f"[{result.confidence_interval[0]:.3f}, {result.confidence_interval[1]:.3f}]")
Step 4: Visualize¶
Generate publication-ready maps:
# Diversity maps
maps = api.create_maps(zarr_path, map_type="diversity", output_dir="maps/")
# Species biomass map
maps = api.create_maps(
zarr_path,
map_type="species",
species=["0202"],
state="MT"
)
Complete Example¶
from gridfia import GridFIA
api = GridFIA()
# Download
files = api.download_species(
state="North Carolina",
county="Wake",
species_codes=["0131", "0068", "0110"],
output_dir="tutorial_data"
)
# Convert
zarr_path = api.create_zarr(
input_dir="tutorial_data",
output_path="tutorial_data/wake.zarr"
)
# Analyze
results = api.calculate_metrics(
zarr_path,
calculations=["species_richness", "shannon_diversity", "total_biomass"],
output_dir="output"
)
# Visualize
maps = api.create_maps(zarr_path, map_type="diversity", output_dir="output/maps")
print("Analysis complete!")
Advanced: Direct REST Access¶
For advanced use cases, you can use the BigMapRestClient directly:
from gridfia.external.fia_client import BigMapRestClient
client = BigMapRestClient()
# List available species
species = client.list_available_species()
for s in species[:5]:
print(f"{s['species_code']}: {s['common_name']}")
# The client handles:
# - Automatic retry with exponential backoff
# - Rate limiting to respect server constraints
# - Progress tracking for long-running downloads
# - Chunked downloads for large areas
Note
Most users should use GridFIA.download_species() instead of the REST client
directly. The client is exposed for advanced use cases that need finer control
over the download process.
See Also¶
- Configuration - Settings and options
- API Reference: GridFIA - Full method documentation
- Calculations - Available forest metrics