Disaggregate data

generic disaggregation development Split steps into:

ETL
disagg (also an ETL op)

to be rebalanced with the remind_coupling package

`add_possible_techs_to_paidoff(paidoff, tech_groups)`

Add possible PyPSA technologies to the paid off capacities DataFrame. The paidoff capacities are grouped in case the Remind-PyPSA tecg mapping is not 1:1 but the network needs to add PyPSA techs. A constraint is added so the paid off caps per group are not exceeded.

Parameters:

Name	Type	Description	Default
`paidoff`	`DataFrame`	DataFrame with paid off capacities	required

Returns: pd.DataFrame: paid off techs with list of PyPSA technologies Example: >> tech_groups PyPSA_tech, group coal CHP, coal coal, coal >> add_possible_techs_to_paidoff(paidoff, tech_groups) >> paidoff tech_group, paid_off_capacity, techs coal, 1000, ['coal CHP', 'coal']

Source code in workflow/scripts/remind_coupling/disaggregate_data.py

def add_possible_techs_to_paidoff(paidoff: pd.DataFrame, tech_groups: pd.Series) -> pd.DataFrame:
    """Add possible PyPSA technologies to the paid off capacities DataFrame.
    The paidoff capacities are grouped in case the Remind-PyPSA tecg mapping is not 1:1
    but the network needs to add PyPSA techs.
    A constraint is added so the paid off caps per group are not exceeded.

    Args:
        paidoff (pd.DataFrame): DataFrame with paid off capacities
    Returns:
        pd.DataFrame: paid off techs with list of PyPSA technologies
    Example:
        >> tech_groups
            PyPSA_tech, group
            coal CHP, coal
            coal, coal
        >> add_possible_techs_to_paidoff(paidoff, tech_groups)
        >> paidoff
            tech_group, paid_off_capacity, techs
            coal, 1000, ['coal CHP', 'coal']
    """
    df = tech_groups.reset_index()
    possibilities = df.groupby("group").PyPSA_tech.apply(lambda x: list(x.unique()))
    paidoff["techs"] = paidoff.tech_group.map(possibilities)
    return paidoff

`disagg_load_using_ref(data, reference_data, reference_year, sector_coupling_enabled=False)`

Spatially disaggregate the load using regional/nodal reference data.

Automatically chooses between single-sector (electric-only) and multi-sector disaggregation based on sector_coupling_enabled parameter.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	DataFrame containing the load data	required
`reference_data`	`DataFrame`	DataFrame containing the reference data	required
`reference_year`	`int \| str`	Year to use for disaggregation	required
`sector_coupling_enabled`	`bool`	Whether to use multi-sector disaggregation	`False`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Disaggregated load data (Region x Year) or (Province x Sector x Year)

Source code in workflow/scripts/remind_coupling/disaggregate_data.py

@register_etl("disagg_load_ref")
def disagg_load_using_ref(
    data: pd.DataFrame,
    reference_data: pd.DataFrame,
    reference_year: int | str,
    sector_coupling_enabled: bool = False,
) -> pd.DataFrame:
    """Spatially disaggregate the load using regional/nodal reference data.

    Automatically chooses between single-sector (electric-only) and multi-sector disaggregation
    based on sector_coupling_enabled parameter.

    Args:
        data (pd.DataFrame): DataFrame containing the load data
        reference_data (pd.DataFrame): DataFrame containing the reference data
        reference_year (int | str): Year to use for disaggregation
        sector_coupling_enabled (bool): Whether to use multi-sector disaggregation

    Returns:
        pd.DataFrame: Disaggregated load data (Region x Year) or (Province x Sector x Year)
    """

    if sector_coupling_enabled:
        logger.info("Sector coupling enabled - using multi-sector disaggregation")
        return _disagg_multisector_load(data, reference_data, reference_year)
    else:
        logger.info("Sector coupling disabled - using total electricity load disaggregation")
        return _disagg_total_load(data, reference_data, reference_year)