Tutorial: Species Diversity Analysis¶

This tutorial demonstrates how to perform a comprehensive species diversity analysis using GridFIA and BIGMAP data.

Scientific Background¶

Species diversity is a fundamental measure of ecosystem health and resilience. This tutorial covers three key diversity metrics:

Shannon Diversity Index (H')¶

The Shannon diversity index (Shannon, 1948) measures both species richness and evenness:

\[H' = -\sum_{i=1}^{S} p_i \ln(p_i)\]

Where \(p_i\) is the proportion of species \(i\). Higher values indicate greater diversity.

Values typically range from 0 to 5
H' = 0 indicates a monoculture
H' > 3 indicates high diversity

Simpson Diversity Index¶

The Simpson index (Simpson, 1949) has multiple formulations:

Simpson's Dominance (D): \(\sum p_i^2\)

Probability that two individuals belong to the same species
Values range from 0 to 1 (lower = more diverse)

Simpson's Diversity (1-D): \(1 - \sum p_i^2\)

Probability that two individuals belong to different species
Values range from 0 to 1 (higher = more diverse)

GridFIA calculates Simpson's Diversity (1-D) by default.

Pielou's Evenness (J)¶

Pielou's evenness (Pielou, 1966) measures how evenly species are distributed:

\[J = \frac{H'}{\ln(S)}\]

Where S is the number of species.

Values range from 0 to 1
J = 1 indicates perfect evenness
J < 0.5 suggests dominance by few species

When to Use Each Index¶

Index	Best For
Shannon	General biodiversity assessment, sensitive to rare species
Simpson	When dominance patterns are important
Species Richness	Simple count when presence/absence is sufficient
Evenness	Assessing community balance independent of richness

Overview¶

We'll analyze forest species diversity by:

Downloading species biomass data from BIGMAP
Creating a Zarr array for efficient processing
Calculating diversity metrics
Visualizing and interpreting results

Prerequisites¶

GridFIA installed (pip install gridfia)
Basic Python knowledge
~2GB disk space for data

Step 1: Initialize and Explore¶

from gridfia import GridFIA

# Initialize the API
api = GridFIA()

# List available species from BIGMAP
species = api.list_species()
print(f"BIGMAP provides data for {len(species)} tree species")

# Display some common species
for s in species[:10]:
    print(f"  {s.species_code}: {s.common_name}")

Step 2: Download Species Data¶

Download biomass rasters for common North Carolina tree species:

from gridfia import GridFIA

api = GridFIA()

# Define species of interest
species_codes = [
    "0131",  # Loblolly pine
    "0068",  # Eastern white pine
    "0110",  # Shortleaf pine
    "0316",  # Eastern redcedar
    "0611",  # Sweetgum
    "0802",  # White oak
    "0833",  # Northern red oak
]

# Download for Wake County, NC
files = api.download_species(
    state="North Carolina",
    county="Wake",
    species_codes=species_codes,
    output_dir="tutorial_data"
)

print(f"Downloaded {len(files)} species files")
for f in files:
    print(f"  {f}")

Step 3: Create Zarr Store¶

Convert downloaded GeoTIFF files to cloud-optimized Zarr format:

from gridfia import GridFIA

api = GridFIA()

# Create Zarr store from downloaded rasters
zarr_path = api.create_zarr(
    input_dir="tutorial_data",
    output_path="tutorial_data/wake_forest.zarr",
    chunk_size=(1, 1000, 1000)
)

# Validate the store
info = api.validate_zarr(zarr_path)
print(f"Created Zarr store:")
print(f"  Species: {info['num_species']}")
print(f"  Shape: {info['shape']}")
print(f"  CRS: {info['crs']}")

Step 4: Calculate Diversity Metrics¶

Run all diversity calculations:

from gridfia import GridFIA

api = GridFIA()

# List available calculations
print("Available calculations:")
for calc in api.list_calculations():
    print(f"  - {calc}")

# Calculate diversity metrics
results = api.calculate_metrics(
    zarr_path="tutorial_data/wake_forest.zarr",
    calculations=[
        "species_richness",
        "shannon_diversity",
        "simpson_diversity",
        "evenness",
        "dominant_species",
        "total_biomass"
    ],
    output_dir="tutorial_results"
)

# Display results
print("\nCalculation results:")
for result in results:
    print(f"  {result.name}: {result.output_path}")

Step 5: Create Maps¶

Generate publication-ready visualizations:

from gridfia import GridFIA

api = GridFIA()

# Create diversity maps
maps = api.create_maps(
    zarr_path="tutorial_data/wake_forest.zarr",
    map_type="diversity",
    output_dir="tutorial_results/maps",
    dpi=300
)

print(f"Created {len(maps)} map files")

Step 6: Analyze Results¶

Load and analyze the calculated metrics:

import rasterio
import numpy as np
from pathlib import Path

results_dir = Path("tutorial_results")

# Load species richness
with rasterio.open(results_dir / "species_richness.tif") as src:
    richness = src.read(1)
    valid = richness[richness > 0]

print("Species Richness Statistics:")
print(f"  Mean: {valid.mean():.2f} species")
print(f"  Max: {valid.max()} species")
print(f"  Min: {valid.min()} species")

# Load Shannon diversity
with rasterio.open(results_dir / "shannon_diversity.tif") as src:
    shannon = src.read(1)
    valid = shannon[shannon > 0]

print("\nShannon Diversity Statistics:")
print(f"  Mean: {valid.mean():.3f}")
print(f"  Max: {valid.max():.3f}")
print(f"  Min: {valid.min():.3f}")

Step 7: Identify Diversity Hotspots¶

Find areas of exceptional biodiversity:

import rasterio
import numpy as np
from scipy import ndimage

# Load Shannon diversity
with rasterio.open("tutorial_results/shannon_diversity.tif") as src:
    shannon = src.read(1)
    transform = src.transform

# Define hotspots as top 10% diversity areas
valid_shannon = shannon[shannon > 0]
threshold = np.percentile(valid_shannon, 90)
hotspots = shannon > threshold

# Clean up with morphological operations
hotspots = ndimage.binary_opening(hotspots, iterations=2)
hotspots = ndimage.binary_closing(hotspots, iterations=2)

# Label connected components
labeled, num_features = ndimage.label(hotspots)
print(f"Found {num_features} diversity hotspots")

# Calculate hotspot areas (30m pixels)
pixel_area_ha = 30 * 30 / 10000  # hectares per pixel
for i in range(1, min(num_features + 1, 6)):  # Top 5
    size = np.sum(labeled == i) * pixel_area_ha
    print(f"  Hotspot {i}: {size:.1f} hectares")

Complete Workflow Example¶

Here's the entire analysis in one script:

"""
Complete species diversity analysis workflow using GridFIA.
"""
from gridfia import GridFIA
from pathlib import Path

def main():
    # Initialize API
    api = GridFIA()

    # Configuration
    state = "North Carolina"
    county = "Wake"
    species_codes = ["0131", "0068", "0110", "0316", "0611", "0802", "0833"]
    output_dir = Path("diversity_analysis")
    output_dir.mkdir(exist_ok=True)

    # Step 1: Download BIGMAP data
    print("Downloading species data from BIGMAP...")
    files = api.download_species(
        state=state,
        county=county,
        species_codes=species_codes,
        output_dir=output_dir / "downloads"
    )
    print(f"  Downloaded {len(files)} files")

    # Step 2: Create Zarr store
    print("\nCreating Zarr store...")
    zarr_path = api.create_zarr(
        input_dir=output_dir / "downloads",
        output_path=output_dir / "forest.zarr"
    )

    # Validate
    info = api.validate_zarr(zarr_path)
    print(f"  Species: {info['num_species']}")
    print(f"  Shape: {info['shape']}")

    # Step 3: Calculate metrics
    print("\nCalculating diversity metrics...")
    results = api.calculate_metrics(
        zarr_path=zarr_path,
        calculations=[
            "species_richness",
            "shannon_diversity",
            "simpson_diversity",
            "evenness",
            "total_biomass"
        ],
        output_dir=output_dir / "metrics"
    )

    for r in results:
        print(f"  {r.name}: completed")

    # Step 4: Create maps
    print("\nGenerating maps...")
    maps = api.create_maps(
        zarr_path=zarr_path,
        map_type="diversity",
        output_dir=output_dir / "maps"
    )
    print(f"  Created {len(maps)} maps")

    print(f"\nAnalysis complete! Results in: {output_dir}")

if __name__ == "__main__":
    main()

Interpreting Results¶

Species Richness (S)¶

Value	Interpretation
1-3	Monoculture or degraded forest
4-7	Typical managed forest
8+	Mature, mixed forest ecosystem

Shannon Diversity (H')¶

Value	Interpretation
< 1.0	Very low diversity, 1-2 species dominate
1.0-2.0	Low to moderate diversity
2.0-3.0	Moderate to high diversity, healthy forest
> 3.0	Very high diversity, exceptional biodiversity

Simpson Index (1-D)¶

Value	Interpretation
< 0.5	Low diversity, few species dominate
0.5-0.7	Moderate diversity
> 0.7	High diversity

Evenness (J)¶

Value	Interpretation
< 0.5	Strong dominance by few species
0.5-0.7	Moderate evenness
> 0.7	High evenness, balanced community

Ecological Implications¶

High diversity areas often indicate:

Mature forest stands
Ecotone transitions between forest types
Areas with varied topography or hydrology
Minimal human disturbance

Low diversity areas may indicate:

Recent disturbance (fire, harvest, disease)
Plantations or managed stands
Environmental stress (drought, poor soils)
Early successional stages

Example Scripts¶

Complete working examples are in the examples/ directory:

File	Description
`01_quickstart.py`	Minimal working example
`04_calculations.py`	Custom calculation examples
`05_species_analysis.py`	Comprehensive species analysis
`06_wake_county_full.py`	Full workflow with publication outputs
`07_diversity_analysis.py`	Diversity-focused analysis

Next Steps¶

Try different biomass thresholds for species presence
Add more species to the analysis
Compare diversity patterns across counties
Export results to GIS software (QGIS, ArcGIS)
Analyze correlation with environmental variables

References¶

Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423.
Simpson, E.H. (1949). Measurement of diversity. Nature, 163(4148), 688.
Pielou, E.C. (1966). The measurement of diversity in different types of biological collections. Journal of Theoretical Biology, 13, 131-144.
Magurran, A.E. (2004). Measuring biological diversity. Blackwell Publishing.
Wilson, B.T., Knight, J.F., and McRoberts, R.E. (2018). Harmonic regression of Landsat time series for modeling attributes from national forest inventory data. ISPRS Journal of Photogrammetry and Remote Sensing, 137: 29-46.

For complete citations and how to cite GridFIA in your work, see CITATIONS.md in the repository.