PyFIA - Forest Inventory Analysis Tools¶
PyFIA is a modern Python toolkit for analyzing USDA Forest Service Forest Inventory and Analysis (FIA) data using DuckDB as the backend database engine. It provides efficient, Pythonic access to the comprehensive FIA database with a focus on performance, ease of use, and statistical accuracy.
🌟 Key Features¶
- 🚀 High Performance: Powered by DuckDB for lightning-fast analytical queries
- 📊 Statistical Accuracy: Implements official EVALIDator methodology for valid estimates
- 🐍 Pythonic Interface: Clean, intuitive API for forest data analysis
- 🔍 Comprehensive Queries: Pre-built query library covering all major FIA analyses
- 🤖 AI-Powered: Integrated AI agents for intelligent query assistance
- 📚 Rich Documentation: Extensive guides, examples, and FIA database reference
🚀 Quick Start¶
Installation¶
# Clone the repository
git clone https://github.com/your-username/pyfia.git
cd pyfia
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .
Basic Usage¶
import pyfia
# Initialize FIA database connection
fia = pyfia.FIA(database_path="path/to/fia.duckdb")
# Get total live trees in Oregon (2021)
oregon_trees = fia.tree_count(
evalid=412101, # Oregon 2021 evaluation
status="live"
)
print(f"Oregon has {oregon_trees:,.0f} live trees")
# Output: Oregon has 10,481,113,490 live trees
# Get biomass by species group in Colorado
colorado_biomass = fia.biomass_by_species_group(
evalid=82101, # Colorado 2021 evaluation
component="above_ground"
)
print(colorado_biomass.head())
📚 Documentation Structure¶
This documentation is organized into several key sections:
🏁 Getting Started¶
- AI Agent Documentation - Natural language interface for FIA queries
- Architecture - System design and components
🤖 AI Agent¶
Complete documentation for the natural language AI interface: - Quick Start - Get started with AI-powered queries - User Guide - Comprehensive feature guide - Examples - Real-world usage scenarios - Architecture - Technical architecture details - Developer Guide - Extend and customize the agent - Tools Reference - Available tools and functions - Result Formatting - Rich output formatting with statistics - Domain Filtering - Intelligent query filtering - Troubleshooting - Common issues and solutions
🔍 Query Library¶
- Query Overview - Complete guide to the query system
- EVALIDator Quick Reference - Essential patterns for Oracle translation
- EVALIDator Methodology - Comprehensive translation guide
- Working Query Bank - Reorganized query collection
🌲 Query Examples¶
Ready-to-use examples for common forest analysis tasks: - Basic Tree Queries - Tree counts, species analysis, diameter distributions - Biomass & Carbon - Above/below-ground biomass, carbon storage calculations - Growth & Mortality - Annual growth rates, mortality analysis, GRM methodology - Volume Analysis - Merchantable volume, board feet, cubic feet calculations - Forest Area - Timberland area, forest type distributions, ownership analysis
📖 FIA Database Reference¶
Comprehensive documentation of all FIA database tables and fields: - Database Overview - Schema, relationships, and key concepts - Survey & Project Tables - Plot location and measurement metadata - Tree Tables - Individual tree measurements and derived attributes - Vegetation Tables - Understory and ground cover measurements - Down Woody Material - Coarse and fine woody debris measurements - Population Tables - Statistical estimation framework and strata
🎯 Core Analysis Capabilities¶
Tree-Level Analysis¶
# Species composition analysis
species_comp = fia.species_composition(evalid=412101, unit="trees_per_acre")
# Diameter distribution
diameter_dist = fia.diameter_distribution(
evalid=412101,
species_code=131, # Loblolly pine
diameter_classes=[5, 10, 15, 20, 25, 30]
)
Area-Level Analysis¶
# Forest area by ownership
ownership_area = fia.area_by_ownership(evalid=412100)
# Forest type analysis
forest_types = fia.forest_types(
evalid=412100,
group_level="forest_type_group"
)
Volume and Biomass¶
# Net cubic volume by species
volume = fia.volume_by_species(
evalid=412101,
volume_type="net_cubic_feet",
minimum_diameter=5.0
)
# Carbon storage estimates
carbon = fia.carbon_storage(
evalid=412101,
components=["above_ground", "below_ground", "dead_wood"]
)
Growth, Removal, and Mortality (GRM)¶
# Annual mortality by species
mortality = fia.mortality_by_species(
evalid=132303, # Georgia GRM evaluation
volume_type="cubic_feet"
)
# Harvest removals analysis
harvest = fia.harvest_removals(
evalid=452303, # South Carolina GRM evaluation
grouping="species"
)
🔧 Advanced Features¶
EVALIDator Compatibility¶
PyFIA implements the exact statistical methodology used by the USDA Forest Service's EVALIDator web application:
# Exact EVALIDator translation
evaluator_query = fia.evaluator_query(
query_type="tree_count",
evalid=412101,
filters={"status_code": 1, "condition_status": 1},
exact_translation=True
)
AI-Powered Query Assistant¶
# Natural language query interface
result = fia.ai_query(
"What is the total volume of loblolly pine in South Carolina?"
)
# Query optimization suggestions
optimized = fia.optimize_query(my_query)
Custom Analysis¶
# Build custom queries with the query builder
custom_query = (fia.query_builder()
.select_trees()
.filter_by_species([131, 121]) # Loblolly and longleaf pine
.filter_by_diameter(min_dia=5.0)
.group_by("species_code")
.aggregate("volume", "sum")
.build()
)
🏗️ Architecture¶
PyFIA is built on modern data engineering principles:
- DuckDB Engine: Column-oriented analytics database for fast aggregations
- Pandas Integration: Seamless integration with the Python data science ecosystem
- Modular Design: Pluggable components for different analysis types
- Type Safety: Full type hints for better development experience
- Async Support: Non-blocking operations for large datasets
🤝 Contributing¶
We welcome contributions! Please see our contribution guidelines for details on:
- Setting up the development environment
- Running tests and validation
- Code style and documentation standards
- Submitting pull requests
📊 Performance Benchmarks¶
PyFIA is optimized for large-scale forest analysis:
Operation | Dataset Size | PyFIA Time | Traditional Time | Speedup |
---|---|---|---|---|
Tree Count | 100M records | 0.8s | 45s | 56x |
Volume Calculation | 50M records | 1.2s | 28s | 23x |
Biomass Analysis | 75M records | 2.1s | 67s | 32x |
Species Composition | 100M records | 1.5s | 52s | 35x |
📜 License¶
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments¶
- USDA Forest Service for the comprehensive FIA database and methodology
- DuckDB Team for the high-performance analytical database engine
- Forest Inventory Community for feedback and validation
📞 Support¶
- Documentation: https://pyfia.readthedocs.io
- Issues: GitHub Issues
- Discussions: GitHub Discussions
PyFIA: Making forest inventory analysis fast, accurate, and accessible for the modern data science era.