Skip to content

Area Estimation

Estimate forest area by land type and various categories.

Overview

The area() function calculates forest area estimates with proper variance estimation.

import pyfia

db = pyfia.FIA("georgia.duckdb")
db.clip_by_state("GA")

# Total forest area
total = pyfia.area(db, land_type="forest")

# Area by forest type
by_type = pyfia.area(db, land_type="forest", grp_by="FORTYPGRPCD")

Function Reference

area

area(db: Union[str, FIA], grp_by: Optional[Union[str, List[str]]] = None, land_type: str = 'forest', area_domain: Optional[str] = None, plot_domain: Optional[str] = None, most_recent: bool = False, eval_type: Optional[str] = None, variance: bool = False, totals: bool = True) -> DataFrame

Estimate forest area from FIA data.

Calculates area estimates using FIA's design-based estimation methods with proper expansion factors and stratification. Automatically handles EVALID selection to prevent overcounting from multiple evaluations.

PARAMETER DESCRIPTION
db

Database connection or path to FIA database. Can be either a path string to a DuckDB/SQLite file or an existing FIA connection object.

TYPE: Union[str, FIA]

grp_by

Column name(s) to group results by. Can be any column from the PLOT and COND tables. Common grouping columns include:

Ownership and Management: - 'OWNGRPCD': Ownership group (10=National Forest, 20=Other Federal, 30=State/Local, 40=Private) - 'OWNCD': Detailed ownership code (see REF_RESEARCH_STATION) - 'ADFORCD': Administrative forest code - 'RESERVCD': Reserved status (0=Not reserved, 1=Reserved)

Forest Characteristics: - 'FORTYPCD': Forest type code (see REF_FOREST_TYPE) - 'STDSZCD': Stand size class (1=Large diameter, 2=Medium diameter, 3=Small diameter, 4=Seedling/sapling, 5=Nonstocked) - 'STDORGCD': Stand origin (0=Natural, 1=Planted) - 'STDAGE': Stand age in years

Site Characteristics: - 'SITECLCD': Site productivity class (1=225+ cu ft/ac/yr, 2=165-224, 3=120-164, 4=85-119, 5=50-84, 6=20-49, 7=0-19) - 'PHYSCLCD': Physiographic class code

Location: - 'STATECD': State FIPS code - 'UNITCD': FIA survey unit code - 'COUNTYCD': County code - 'INVYR': Inventory year

Disturbance and Treatment: - 'DSTRBCD1', 'DSTRBCD2', 'DSTRBCD3': Disturbance codes - 'TRTCD1', 'TRTCD2', 'TRTCD3': Treatment codes

For complete column descriptions, see USDA FIA Database User Guide.

TYPE: str or list of str DEFAULT: None

land_type

Land type to include in estimation:

  • 'forest': All forestland (COND_STATUS_CD = 1)
  • 'timber': Timberland only (unreserved, productive forestland)
  • 'all': All land types including non-forest

TYPE: (forest, timber, all) DEFAULT: 'forest'

area_domain

SQL-like filter expression for COND-level attributes. Examples:

  • "STDAGE > 50": Stands older than 50 years
  • "FORTYPCD IN (161, 162)": Specific forest types
  • "OWNGRPCD == 10": National Forest lands only
  • "PHYSCLCD == 31 AND STDSZCD == 1": Xeric sites with large trees

TYPE: str DEFAULT: None

plot_domain

SQL-like filter expression for PLOT-level attributes. This parameter enables filtering by plot location and attributes that are not available in the COND table. Examples:

Location filtering: - "COUNTYCD == 183": Wake County, NC (single county) - "COUNTYCD IN (183, 185, 187)": Multiple counties - "UNITCD == 1": Survey unit 1

Geographic filtering: - "LAT >= 35.0 AND LAT <= 36.0": Latitude range - "LON >= -80.0 AND LON <= -79.0": Longitude range - "ELEV > 2000": Elevation above 2000 feet

Temporal filtering: - "INVYR == 2019": Inventory year - "MEASYEAR >= 2015": Measured since 2015

Note: plot_domain filters apply to PLOT table columns only. For condition-level attributes (ownership, forest type, etc.), use area_domain instead.

TYPE: str DEFAULT: None

most_recent

If True, automatically select the most recent evaluation for each state/region. Equivalent to calling db.clip_most_recent() first.

TYPE: bool DEFAULT: False

eval_type

Evaluation type to select if most_recent=True. Options: 'ALL', 'VOL', 'GROW', 'MORT', 'REMV', 'CHANGE', 'DWM', 'INV'. Default is 'ALL' for area estimation.

TYPE: str DEFAULT: None

variance

If True, return variance instead of standard error.

TYPE: bool DEFAULT: False

totals

If True, include total area estimates expanded to population level. If False, only return per-acre values.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
DataFrame

Area estimates with the following columns:

  • YEAR : int Inventory year
  • [grouping columns] : varies Any columns specified in grp_by parameter
  • AREA_PCT : float Percentage of total area
  • AREA_SE : float (if variance=False) Standard error of area percentage
  • AREA_VAR : float (if variance=True) Variance of area percentage
  • N_PLOTS : int Number of plots in estimate
  • AREA : float (if totals=True) Total area in acres
  • AREA_TOTAL_SE : float (if totals=True and variance=False) Standard error of total area
See Also

pyfia.volume : Estimate tree volume pyfia.biomass : Estimate tree biomass pyfia.tpa : Estimate trees per acre pyfia.constants.ForestTypes : Forest type code definitions pyfia.constants.StateCodes : State FIPS code definitions

Notes

The area estimation follows USDA FIA's design-based estimation procedures as described in Bechtold & Patterson (2005). The basic formula is:

Area = Σ(CONDPROP_UNADJ × ADJ_FACTOR × EXPNS)

Where: - CONDPROP_UNADJ: Proportion of plot in the condition - ADJ_FACTOR: Adjustment factor based on PROP_BASIS - EXPNS: Expansion factor from stratification

EVALID Handling: If no EVALID is specified, the function automatically selects the most recent EXPALL evaluation to prevent overcounting from multiple evaluations. For explicit control, use db.clip_by_evalid() before calling area().

Valid Grouping Columns: The function loads comprehensive sets of columns from COND and PLOT tables. Not all columns are suitable for grouping - continuous variables like LAT, LON, ELEV should not be used. The function will error if a requested grouping column is not available in the loaded data.

NULL Value Handling: Some grouping columns may contain NULL values (e.g., PHYSCLCD ~18% NULL, DSTRBCD1 ~22% NULL). NULL values are handled safely by Polars and will appear as a separate group in results if present.

Examples:

Basic forest area estimation:

>>> from pyfia import FIA, area
>>> with FIA("path/to/fia.duckdb") as db:
...     db.clip_by_state(37)  # North Carolina
...     results = area(db, land_type="forest")

Area by ownership group:

>>> results = area(db, grp_by="OWNGRPCD")
>>> # Results will show area for each ownership category

Timber area by forest type for stands over 50 years:

>>> results = area(
...     db,
...     grp_by="FORTYPCD",
...     land_type="timber",
...     area_domain="STDAGE > 50"
... )

Multiple grouping variables:

>>> results = area(
...     db,
...     grp_by=["STATECD", "OWNGRPCD", "STDSZCD"],
...     land_type="forest"
... )

Area by disturbance type:

>>> results = area(
...     db,
...     grp_by="DSTRBCD1",
...     area_domain="DSTRBCD1 > 0"  # Only disturbed areas
... )

Filter by county using plot_domain:

>>> results = area(
...     db,
...     plot_domain="COUNTYCD == 183",  # Wake County, NC
...     land_type="forest"
... )

Combine plot and area domain filters:

>>> results = area(
...     db,
...     plot_domain="COUNTYCD IN (183, 185, 187)",  # Multiple counties
...     area_domain="OWNGRPCD == 40",  # Private land only
...     grp_by="FORTYPCD"
... )

Geographic filtering with plot_domain:

>>> results = area(
...     db,
...     plot_domain="LAT >= 35.0 AND LAT <= 36.0 AND ELEV > 1000",
...     land_type="forest"
... )
Source code in src/pyfia/estimation/estimators/area.py
def area(
    db: Union[str, FIA],
    grp_by: Optional[Union[str, List[str]]] = None,
    land_type: str = "forest",
    area_domain: Optional[str] = None,
    plot_domain: Optional[str] = None,
    most_recent: bool = False,
    eval_type: Optional[str] = None,
    variance: bool = False,
    totals: bool = True,
) -> pl.DataFrame:
    """
    Estimate forest area from FIA data.

    Calculates area estimates using FIA's design-based estimation methods
    with proper expansion factors and stratification. Automatically handles
    EVALID selection to prevent overcounting from multiple evaluations.

    Parameters
    ----------
    db : Union[str, FIA]
        Database connection or path to FIA database. Can be either a path
        string to a DuckDB/SQLite file or an existing FIA connection object.
    grp_by : str or list of str, optional
        Column name(s) to group results by. Can be any column from the
        PLOT and COND tables. Common grouping columns include:

        **Ownership and Management:**
        - 'OWNGRPCD': Ownership group (10=National Forest, 20=Other Federal,
          30=State/Local, 40=Private)
        - 'OWNCD': Detailed ownership code (see REF_RESEARCH_STATION)
        - 'ADFORCD': Administrative forest code
        - 'RESERVCD': Reserved status (0=Not reserved, 1=Reserved)

        **Forest Characteristics:**
        - 'FORTYPCD': Forest type code (see REF_FOREST_TYPE)
        - 'STDSZCD': Stand size class (1=Large diameter, 2=Medium diameter,
          3=Small diameter, 4=Seedling/sapling, 5=Nonstocked)
        - 'STDORGCD': Stand origin (0=Natural, 1=Planted)
        - 'STDAGE': Stand age in years

        **Site Characteristics:**
        - 'SITECLCD': Site productivity class (1=225+ cu ft/ac/yr,
          2=165-224, 3=120-164, 4=85-119, 5=50-84, 6=20-49, 7=0-19)
        - 'PHYSCLCD': Physiographic class code

        **Location:**
        - 'STATECD': State FIPS code
        - 'UNITCD': FIA survey unit code
        - 'COUNTYCD': County code
        - 'INVYR': Inventory year

        **Disturbance and Treatment:**
        - 'DSTRBCD1', 'DSTRBCD2', 'DSTRBCD3': Disturbance codes
        - 'TRTCD1', 'TRTCD2', 'TRTCD3': Treatment codes

        For complete column descriptions, see USDA FIA Database User Guide.
    land_type : {'forest', 'timber', 'all'}, default 'forest'
        Land type to include in estimation:

        - 'forest': All forestland (COND_STATUS_CD = 1)
        - 'timber': Timberland only (unreserved, productive forestland)
        - 'all': All land types including non-forest
    area_domain : str, optional
        SQL-like filter expression for COND-level attributes. Examples:

        - "STDAGE > 50": Stands older than 50 years
        - "FORTYPCD IN (161, 162)": Specific forest types
        - "OWNGRPCD == 10": National Forest lands only
        - "PHYSCLCD == 31 AND STDSZCD == 1": Xeric sites with large trees
    plot_domain : str, optional
        SQL-like filter expression for PLOT-level attributes. This parameter
        enables filtering by plot location and attributes that are not available
        in the COND table. Examples:

        **Location filtering:**
        - "COUNTYCD == 183": Wake County, NC (single county)
        - "COUNTYCD IN (183, 185, 187)": Multiple counties
        - "UNITCD == 1": Survey unit 1

        **Geographic filtering:**
        - "LAT >= 35.0 AND LAT <= 36.0": Latitude range
        - "LON >= -80.0 AND LON <= -79.0": Longitude range
        - "ELEV > 2000": Elevation above 2000 feet

        **Temporal filtering:**
        - "INVYR == 2019": Inventory year
        - "MEASYEAR >= 2015": Measured since 2015

        Note: plot_domain filters apply to PLOT table columns only. For
        condition-level attributes (ownership, forest type, etc.), use
        area_domain instead.
    most_recent : bool, default False
        If True, automatically select the most recent evaluation for each
        state/region. Equivalent to calling db.clip_most_recent() first.
    eval_type : str, optional
        Evaluation type to select if most_recent=True. Options:
        'ALL', 'VOL', 'GROW', 'MORT', 'REMV', 'CHANGE', 'DWM', 'INV'.
        Default is 'ALL' for area estimation.
    variance : bool, default False
        If True, return variance instead of standard error.
    totals : bool, default True
        If True, include total area estimates expanded to population level.
        If False, only return per-acre values.

    Returns
    -------
    pl.DataFrame
        Area estimates with the following columns:

        - **YEAR** : int
            Inventory year
        - **[grouping columns]** : varies
            Any columns specified in grp_by parameter
        - **AREA_PCT** : float
            Percentage of total area
        - **AREA_SE** : float (if variance=False)
            Standard error of area percentage
        - **AREA_VAR** : float (if variance=True)
            Variance of area percentage
        - **N_PLOTS** : int
            Number of plots in estimate
        - **AREA** : float (if totals=True)
            Total area in acres
        - **AREA_TOTAL_SE** : float (if totals=True and variance=False)
            Standard error of total area

    See Also
    --------
    pyfia.volume : Estimate tree volume
    pyfia.biomass : Estimate tree biomass
    pyfia.tpa : Estimate trees per acre
    pyfia.constants.ForestTypes : Forest type code definitions
    pyfia.constants.StateCodes : State FIPS code definitions

    Notes
    -----
    The area estimation follows USDA FIA's design-based estimation procedures
    as described in Bechtold & Patterson (2005). The basic formula is:

    Area = Σ(CONDPROP_UNADJ × ADJ_FACTOR × EXPNS)

    Where:
    - CONDPROP_UNADJ: Proportion of plot in the condition
    - ADJ_FACTOR: Adjustment factor based on PROP_BASIS
    - EXPNS: Expansion factor from stratification

    **EVALID Handling:**
    If no EVALID is specified, the function automatically selects the most
    recent EXPALL evaluation to prevent overcounting from multiple evaluations.
    For explicit control, use db.clip_by_evalid() before calling area().

    **Valid Grouping Columns:**
    The function loads comprehensive sets of columns from COND and PLOT tables.
    Not all columns are suitable for grouping - continuous variables like
    LAT, LON, ELEV should not be used. The function will error if a requested
    grouping column is not available in the loaded data.

    **NULL Value Handling:**
    Some grouping columns may contain NULL values (e.g., PHYSCLCD ~18% NULL,
    DSTRBCD1 ~22% NULL). NULL values are handled safely by Polars and will
    appear as a separate group in results if present.

    Examples
    --------
    Basic forest area estimation:

    >>> from pyfia import FIA, area
    >>> with FIA("path/to/fia.duckdb") as db:
    ...     db.clip_by_state(37)  # North Carolina
    ...     results = area(db, land_type="forest")

    Area by ownership group:

    >>> results = area(db, grp_by="OWNGRPCD")
    >>> # Results will show area for each ownership category

    Timber area by forest type for stands over 50 years:

    >>> results = area(
    ...     db,
    ...     grp_by="FORTYPCD",
    ...     land_type="timber",
    ...     area_domain="STDAGE > 50"
    ... )

    Multiple grouping variables:

    >>> results = area(
    ...     db,
    ...     grp_by=["STATECD", "OWNGRPCD", "STDSZCD"],
    ...     land_type="forest"
    ... )

    Area by disturbance type:

    >>> results = area(
    ...     db,
    ...     grp_by="DSTRBCD1",
    ...     area_domain="DSTRBCD1 > 0"  # Only disturbed areas
    ... )

    Filter by county using plot_domain:

    >>> results = area(
    ...     db,
    ...     plot_domain="COUNTYCD == 183",  # Wake County, NC
    ...     land_type="forest"
    ... )

    Combine plot and area domain filters:

    >>> results = area(
    ...     db,
    ...     plot_domain="COUNTYCD IN (183, 185, 187)",  # Multiple counties
    ...     area_domain="OWNGRPCD == 40",  # Private land only
    ...     grp_by="FORTYPCD"
    ... )

    Geographic filtering with plot_domain:

    >>> results = area(
    ...     db,
    ...     plot_domain="LAT >= 35.0 AND LAT <= 36.0 AND ELEV > 1000",
    ...     land_type="forest"
    ... )
    """
    # Import validation functions
    from ...validation import (
        validate_boolean,
        validate_domain_expression,
        validate_grp_by,
        validate_land_type,
    )

    # Validate inputs
    land_type = validate_land_type(land_type)
    grp_by = validate_grp_by(grp_by)
    area_domain = validate_domain_expression(area_domain, "area_domain")
    plot_domain = validate_domain_expression(plot_domain, "plot_domain")
    variance = validate_boolean(variance, "variance")
    totals = validate_boolean(totals, "totals")
    most_recent = validate_boolean(most_recent, "most_recent")

    # Ensure db is a FIA instance
    if isinstance(db, str):
        db = FIA(db)
        owns_db = True
    else:
        owns_db = False

    # CRITICAL: If no EVALID is set, automatically select most recent EXPALL
    # This prevents massive overcounting from including all historical evaluations
    if db.evalid is None:
        import warnings

        warnings.warn(
            "No EVALID specified. Automatically selecting most recent EXPALL evaluations. "
            "For explicit control, use db.clip_most_recent() or db.clip_by_evalid() before calling area()."
        )
        db.clip_most_recent(
            eval_type="ALL"
        )  # Use "ALL" not "EXPALL" per line 159-160 in fia.py

        # If still no EVALID (no EXPALL evaluations), try without filtering but warn strongly
        if db.evalid is None:
            warnings.warn(
                "WARNING: No EXPALL evaluations found. Results may be incorrect due to "
                "inclusion of multiple overlapping evaluations. Consider using db.clip_by_evalid() "
                "to explicitly select appropriate EVALIDs."
            )

    # Create simple config dict
    config = {
        "grp_by": grp_by,
        "land_type": land_type,
        "area_domain": area_domain,
        "plot_domain": plot_domain,
        "most_recent": most_recent,
        "eval_type": eval_type,
        "variance": variance,
        "totals": totals,
    }

    try:
        # Create estimator and run
        estimator = AreaEstimator(db, config)
        return estimator.estimate()
    finally:
        # Clean up if we created the db
        if owns_db and hasattr(db, "close"):
            db.close()

Examples

Total Forest Area

result = pyfia.area(db, land_type="forest")
print(f"Forest Area: {result['estimate'][0]:,.0f} acres")
print(f"SE: {result['se'][0]:,.0f} acres")

Timberland Area

result = pyfia.area(db, land_type="timber")
print(f"Timberland: {result['estimate'][0]:,.0f} acres")

Area by Ownership

result = pyfia.area(db, land_type="forest", grp_by="OWNGRPCD")
print(result)

Area by Forest Type Group

result = pyfia.area(db, land_type="forest", grp_by="FORTYPGRPCD")
result = pyfia.join_forest_type_names(result, db)
print(result.sort("estimate", descending=True))