pyFIA Architecture¶
What is pyFIA?¶
pyFIA is a Python library for analyzing USDA Forest Inventory and Analysis (FIA) data. It provides: - Statistical estimation functions for forest metrics (area, volume, biomass, etc.) - Two usage paths: Direct Python API or Natural Language AI interface - High performance using DuckDB and Polars - Proper FIA methodology with EVALID-based statistical validity
Two Ways to Use pyFIA¶
graph TB
subgraph "User Entry Points"
U1[Python Scripts/<br/>Notebooks]
U2[Command Line]
end
subgraph "Usage Paths"
Direct[Direct API Path<br/>Statistical Functions]
AI[AI Agent Path<br/>Natural Language]
end
U1 --> Direct
U2 --> Direct
U2 --> AI
Direct --> R1[Statistical Results<br/>DataFrames]
AI --> R2[Formatted Results<br/>Tables & Explanations]
style Direct fill:#2ecc71
style AI fill:#9b59b6
Path 1: Direct API (Green Path)¶
- Import pyFIA functions directly
- Call estimation functions with parameters
- Get back Polars DataFrames with results
- Full control over analysis
Path 2: AI Agent (Purple Path)¶
- Ask questions in natural language
- Agent converts to appropriate queries
- Get formatted, explained results
- Interactive exploration
Core Architecture¶
graph TB
%% Entry Points
subgraph "Entry Layer"
PY[Python API<br/>import pyfia]
CLI1[pyfia CLI<br/>Direct Functions]
CLI2[pyfia-ai CLI<br/>Natural Language]
end
%% Core Components
subgraph "Core Layer"
FIA[FIA Class<br/>Database Connection<br/>EVALID Management]
DR[Data Reader<br/>DuckDB Interface]
end
%% Processing
subgraph "Processing Layer"
EST[Estimation Functions<br/>area, volume, biomass<br/>tpa, mortality, growth]
FILT[Filters<br/>Domain, EVALID<br/>Grouping, Joins]
UTILS[Utilities<br/>Statistical Calculations<br/>Stratification]
end
%% AI Components
subgraph "AI Layer"
AGENT[FIA Agent<br/>Query Understanding]
TOOLS[Agent Tools<br/>SQL, Schema, Species]
FORMAT[Result Formatter<br/>Rich Output]
end
%% Data
subgraph "Data Layer"
DB[(DuckDB<br/>FIA Database)]
end
%% Direct Path
PY --> FIA
CLI1 --> FIA
FIA --> EST
EST --> FILT
EST --> UTILS
FIA --> DR
DR --> DB
%% AI Path
CLI2 --> AGENT
AGENT --> TOOLS
TOOLS --> DR
AGENT --> FORMAT
style FIA fill:#e74c3c
style EST fill:#2ecc71
style AGENT fill:#9b59b6
style DB fill:#34495e
Data Flow¶
Direct API Flow¶
sequenceDiagram
participant User
participant pyFIA
participant FIA Class
participant Estimator
participant Database
User->>pyFIA: area(db, evalid=372301)
pyFIA->>FIA Class: Get filtered data
FIA Class->>Database: Query with EVALID
Database-->>FIA Class: Plot/Condition data
FIA Class-->>pyFIA: Filtered DataFrames
pyFIA->>Estimator: Calculate estimates
Estimator-->>pyFIA: Results with SE
pyFIA-->>User: DataFrame with estimates
AI Agent Flow¶
sequenceDiagram
participant User
participant Agent
participant Tools
participant Database
participant Formatter
User->>Agent: "How many oak trees in NC?"
Agent->>Agent: Understand query
Agent->>Tools: find_species_codes("oak")
Tools-->>Agent: Oak species codes
Agent->>Tools: execute_query(SQL)
Tools->>Database: Run query
Database-->>Tools: Raw results
Tools-->>Agent: Query results
Agent->>Formatter: Format with context
Formatter-->>Agent: Rich formatted output
Agent-->>User: Explained results
Key Components¶
Core Components¶
Component | Purpose | Key Functions |
---|---|---|
FIA Class | Main interface to database | clipFIA() , readFIA() , findEvalid() |
Data Reader | Database abstraction | Handles DuckDB connections and queries |
Settings | Configuration management | Database paths, default options |
Estimation Functions¶
Function | Calculates | Key Features |
---|---|---|
area() | Forest land area | By forest type, ownership, size class |
biomass() | Tree biomass | Above/below ground, carbon content |
volume() | Wood volume | Net/gross, merch/sound, board feet |
tpa() | Trees per acre | By species, size, status |
mortality() | Annual mortality | Trees, volume, biomass |
growth() | Annual growth | Net growth accounting for mortality |
Filter System¶
Filter Type | Purpose | Example |
---|---|---|
EVALID | Statistical validity | Only use data from one evaluation |
Domain | Tree/area filtering | "DIA >= 5", "OWNGRPCD == 10" |
Grouping | Result aggregation | By species, size class, ownership |
Classification | Tree categorization | Live/dead, growing stock |
AI Components¶
Component | Purpose | Key Features |
---|---|---|
Agent | Natural language processing | LangGraph ReAct pattern |
Tools | Agent capabilities | SQL execution, schema lookup |
Formatter | Result presentation | Rich tables, statistics, explanations |
Domain Knowledge | FIA expertise | Species codes, terminology |
Design Principles¶
1. Statistical Validity First¶
- EVALID-based filtering ensures proper population estimates
- All estimators follow FIA statistical methodology
- Standard errors and confidence intervals included
2. Performance Optimized¶
- DuckDB for fast analytical queries
- Polars for efficient data manipulation
- Lazy evaluation where possible
3. Two Clear Paths¶
- Direct API for programmatic control
- AI Agent for exploration and learning
- No mixing of concerns between paths
4. Modular Design¶
- Estimation functions are independent
- Filters can be composed
- Easy to add new estimators
5. User Friendly¶
- Consistent function signatures
- Clear parameter names
- Rich documentation and examples
File Organization¶
src/pyfia/
├── core/ # Database connection, EVALID management
├── estimation/ # Statistical estimation functions
├── filters/ # Data filtering and processing
├── ai/ # AI agent components
├── cli/ # Command-line interfaces
├── database/ # Database utilities and schema
├── models/ # Data models (Pydantic)
└── locations/ # Geographic parsing utilities
Key Concepts¶
EVALID System¶
The heart of FIA's statistical design: - Groups plots into valid populations - Ensures proper expansion factors - Links to specific time periods - Required for all population estimates
Stratification¶
FIA uses post-stratified estimation: 1. Plots assigned to strata 2. Strata have expansion factors 3. Estimates calculated by stratum 4. Combined for population totals
Dual Interface Design¶
- Direct path: Maximum control, pure functions
- AI path: Natural language, guided exploration
- Clean separation prevents complexity
This architecture provides a solid foundation for forest inventory analysis while remaining accessible to both programmers and domain experts.