Development Guide¶
Technical reference for pyFIA development. For business context and product strategy, see CLAUDE.md.
Setup¶
# Install with uv in development mode
uv venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -e .[dev]
# Setup pre-commit hooks
pre-commit install
Essential Commands¶
Testing¶
uv run pytest # Run all tests
uv run pytest tests/test_area.py # Run specific test file
uv run pytest --cov=pyfia --cov-report=html # With coverage
uv run pytest tests/test_property_based.py -v # Property-based tests
Code Quality¶
uv run ruff format # Format code
uv run ruff check --fix # Lint code
uv run mypy src/pyfia/ # Type checking
uv run pre-commit run --all-files # All hooks
Documentation¶
Architecture¶
Module Structure¶
pyfia/
├── core/ # Database and reader functionality
│ ├── fia.py # Main FIA database class
│ └── data_reader.py # Efficient data loading
├── estimation/ # Statistical estimation (~2,000 lines)
│ ├── base.py # BaseEstimator with Template Method pattern
│ └── estimators/ # Individual estimators (~300 lines each)
│ ├── area.py
│ ├── biomass.py
│ ├── growth.py
│ ├── mortality.py
│ ├── tpa.py
│ └── volume.py
├── filtering/ # Domain filtering and indicators
│ ├── core/parser.py # Centralized domain expression parser
│ ├── tree/filters.py
│ ├── area/filters.py
│ └── indicators/ # Land type classification
└── constants/ # FIA constants and standard values
Core Components¶
FIA Database Class (pyfia.core.fia.FIA)
- Main entry point for database connections
- Supports DuckDB and SQLite backends
- Key methods: clip_by_evalid(), clip_by_state(), clip_most_recent()
Estimation Functions
- Simple API: area(), biomass(), volume(), tpa(), mortality(), growth()
- All support domain filtering, grouping, variance calculations
- BaseEstimator uses Template Method for consistent workflow
Data Reader (pyfia.core.data_reader.FIADataReader)
- Efficient data loading with WHERE clause support
- Backend-specific optimizations
Dependencies¶
| Package | Purpose |
|---|---|
| Polars | Primary dataframe library |
| DuckDB | Database engine |
| Pydantic v2 | Settings management |
| Rich | Terminal output |
| ConnectorX | Fast database connectivity |
Code Patterns¶
Do¶
- Use Polars LazyFrame for memory efficiency
- Use Pydantic v2 for settings only (not data)
- Follow FIA naming conventions in public APIs
- Prefer functions over classes
- Use context managers for connections
Don't¶
- Create Strategy, Factory, Builder patterns without clear need
- Add abstraction layers for hypothetical flexibility
- Create deep directory nesting (max 3 levels)
- Use complex inheritance hierarchies
FIA Quick Reference¶
Full details in fia_technical_context.md
EVALID System¶
6-digit codes (SSYYTT) for statistically valid plot groupings.
with FIA("data/nfi_south.duckdb") as db:
db.clip_by_state(37) # North Carolina
db.clip_most_recent(eval_type="EXPVOL") # Most recent volume evaluation
results = volume(db)
Evaluation Types¶
EXPALL: Area estimation →area()EXPVOL: Volume/biomass →volume(),biomass(),tpa()EXPMORT/EXPGROW: Mortality and growth
Critical Rules¶
- Never mix EVALIDs
- Match eval_type to estimation function
- Always filter before estimation
Domain Filtering¶
volume(db, tree_domain="STATUSCD == 1") # Live trees
area(db, area_domain="SLOPE < 30") # Low slope areas
tpa(db, tree_domain="DIA >= 10.0") # Large trees
Variance Calculation¶
pyFIA implements the stratified domain total variance formula from Bechtold & Patterson (2005):
Key implementation details:
- All plots included: Include plots with zero values in variance calculation
- Per-acre SE: Calculated as SE_total / total_area
- Single-plot strata: Excluded (variance undefined with n=1)
The calculate_domain_total_variance() function in variance.py implements this formula and matches EVALIDator output within 1-3%.
Testing Patterns¶
- Use real FIA data when possible (georgia.duckdb, nfi_south.duckdb)
- Mock databases must include complete table structures (including GRM tables)
- Property-based tests for statistical accuracy
- Validate against EVALIDator results
Documentation Standards¶
All public API functions use NumPy-style docstrings. The mortality() function is the reference implementation.
Required Sections¶
- Summary line
- Extended summary
- Parameters (with types and valid values)
- Returns (with column descriptions)
- See Also
- Notes
- Examples
Refactoring Guidelines¶
When to Simplify¶
- Deep nesting (>3 levels)
- Unnecessary patterns without clear benefit
- Pass-through layers
- Complex configs for simple parameters
How to Simplify¶
- Replace class hierarchies with functions
- Use direct parameters instead of config objects
- Flatten directory structures
- Remove abstraction layers that don't add value
Performance Notes¶
- DuckDB provides 10-100x faster queries than SQLite
- 5-6x compression ratio
- Polars LazyFrame enables memory-efficient streaming