Capacities etl

Extract, Transform, Load (ETL) operations for REMIND (pre-invetment) generation capacities

The aim is to translate the REMIND pre-investment capacities into pypsa brownfield capacities. PyPSA workflows already come with their own bronwfield data (e.g. from powerplantmatching) assigned to nodes/clusters. This capacity needs to be adjusted to the REMIND capacities.

Harmonisation of REMIND and PypSA Capacities

In case the REMIND capacities are smaller than the pypsa brownfield capacities, the pypsa capacities are scaled down by tech.

In case the REMIND capacities are larger, the pypsa brownfield capacities are kept and an additional paid-off component is added to the pypsa model as a max (paid-off ie free) capacity constraint. The constraint is REMIND REGION wide so that pypsa determines the optimal location of the REMIND-built capacity.

Workflow integration

The constraints and data are exported as files made available to the pypsa workflow. - use the ETL transformations convert_remind_capacities to prpeare the data - use build_tech_map to creat tech_groups from the technoeconomic mapping.csv - merge the pypsa capacities data with the tech_groups - idem for the converted remind capacities data - use the harmonize_capacities ETL to harmonize the capacities - finally use the calc_paidoff_capacity ETL to calculate the paid-off capacities

`calc_paidoff_capacity(remind_capacities, harmonized_pypsa_caps)`

Calculate the aditional paid off capacity available to pypsa from REMIND investment decisions. The paid off capacity is the difference between the REMIND capacities and the harmonized pypsa capacities. The paid off capacity is available to pypsa as a zero-capex tech.

Parameters:

Name	Type	Description	Default
`remind_capacities`	`DataFrame`	DataFrame with remind capacities in MW.	required
`harmonized_pypsa_caps`	`dict[str, DataFrame]`	Dictionary with harmonized pypsa capacities by year (capped to REMIND cap)	required

Returns: pd.DataFrame: DataFrame with the available paid off capacity by tech group.

Source code in src/rpycpl/capacities_etl.py

def calc_paidoff_capacity(
    remind_capacities: pd.DataFrame, harmonized_pypsa_caps: dict[str, pd.DataFrame]
) -> pd.DataFrame:
    """
    Calculate the aditional paid off capacity available to pypsa from REMIND investment decisions.
    The paid off capacity is the difference between the REMIND capacities and the harmonized
    pypsa capacities. The paid off capacity is available to pypsa as a zero-capex tech.

    Args:
        remind_capacities (pd.DataFrame): DataFrame with remind capacities in MW.
        harmonized_pypsa_caps (dict[str, pd.DataFrame]): Dictionary with harmonized
            pypsa capacities by year (capped to REMIND cap)
    Returns:
        pd.DataFrame: DataFrame with the available paid off capacity by tech group.
    """

    # merge all years of harmonized capacities into a single DataFrame
    def grp(df, yr):
        return df.groupby("tech_group").apply(
            lambda x: pd.Series(
                {"capacity": x.Capacity.sum(), "year": yr, "techs": ",".join(x.Tech)}
            )
        )

    pypsa_caps = pd.concat(
        [grp(df, yr) for yr, df in harmonized_pypsa_caps.items() if not df.empty]
    )
    pypsa_caps.year = pypsa_caps.year.astype(int)
    remind_caps = remind_capacities.groupby(["tech_group", "year"]).capacity.sum().reset_index()
    merged = pd.merge(
        remind_caps,
        pypsa_caps,
        how="left",
        on=["year", "tech_group"],
        suffixes=("_remind", "_pypsa"),
    ).fillna(0)
    # TODO check for nans and raise warnings
    merged["paid_off"] = merged.capacity_remind - merged.capacity_pypsa
    if (merged.paid_off < -1e-6).any():
        raise ValueError(
            "Found negative Paid off capacities. This indicates that the harmonized PyPSA capacities "
            "exceed the REMIND capacities. Please check the harmonization step."
        )

    return (
        merged.groupby(["tech_group", "year"])
        .paid_off.sum()
        .clip(lower=0)
        .reset_index()
        .rename(columns={"paid_off": "Capacity"})
    )

`scale_down_capacities(to_scale, reference)`

Scale down the target (existing pypsa) capacities to not exceed the refernce (remind) capacities by tech group. The target capacities can have a higher spatial resolution. This function can be used to harmonize the capacities between REMIND and PyPSA. Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

Parameters:

Name	Type	Description	Default
`to_scale`	`DataFrame`	DataFrame with the target (pypsa) capacities for a single year.	required
`reference`	`DataFrame`	DataFrame with the ref (remind) capacities by tech group.	required

Returns: pd.DataFrame: DataFrame with capacities clipped to the reference for each tech group. Example: remind_caps = pd.DataFrame({"technology": ["wind", "hydro"], "capacity": [300, 200]}) data = {'hydro': {('Capacity', 'node1'): 240, ('Capacity', 'node2'): 360}, 'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) pypsa_caps = pd.DataFrame.from_dict(d, orient="index") # poweplantmatching scaled_caps = scale_down_capacities(pypsa_caps, remind_caps, tech_groupings = {"hydro": "hydro", "wind": "wind"}) >> {'hydro': {('Capacity', 'node1'): 120, ('Capacity', 'node2'): 180}, # scaled down 'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) # untouched

Source code in src/rpycpl/capacities_etl.py

def scale_down_capacities(to_scale: pd.DataFrame, reference: pd.DataFrame) -> pd.DataFrame:
    """
    Scale down the target (existing pypsa) capacities to not exceed the refernce (remind)
        capacities by tech group. The target capacities can have a higher spatial resolution.
        This function can be used to harmonize the capacities between REMIND and PyPSA.
        Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

    Args:
        to_scale (pd.DataFrame): DataFrame with the target (pypsa) capacities for a single year.
        reference (pd.DataFrame): DataFrame with the ref (remind) capacities by tech group.
    Returns:
        pd.DataFrame: DataFrame with capacities clipped to the reference for each tech group.
    Example:
        remind_caps = pd.DataFrame({"technology": ["wind", "hydro"], "capacity": [300, 200]})
        data = {'hydro': {('Capacity', 'node1'): 240, ('Capacity', 'node2'): 360},
                'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}})
        pypsa_caps = pd.DataFrame.from_dict(d, orient="index") # poweplantmatching
        scaled_caps = scale_down_capacities(pypsa_caps, remind_caps,
                            tech_groupings = {"hydro": "hydro", "wind": "wind"})
        >> {'hydro': {('Capacity', 'node1'): 120, ('Capacity', 'node2'): 180}, # scaled down
                'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) # untouched
    """
    if reference.year.nunique() > 1:
        raise ValueError("The reference capacities should be for a single year")
    # group the target & ref capacities by tech group
    group_totals_ref = reference.groupby(["tech_group"]).capacity.sum()
    to_scale.loc[:, "group_fraction"] = (
        to_scale.groupby("tech_group").Capacity.transform(lambda x: x / x.sum()).values
    )

    missing = to_scale.query("tech_group == ''")[["Fueltype", "Tech"]].drop_duplicates()
    if not missing.empty:
        logger.warning(
            "Some technologies are not assigned to a tech group. "
            f"Missing from tech groups: {missing}"
        )
        to_scale = to_scale.query("tech_group != ''")

    # set missing tech groups to zero in reference
    not_in_ref = set(to_scale.tech_group.unique()).difference(set(group_totals_ref.index))
    if not_in_ref:
        group_totals_ref = pd.concat([group_totals_ref, pd.Series(0, index=not_in_ref)])

    to_scale.rename(columns={"Capacity": "original_capacity"}, inplace=True)
    # perform the scaling (normalised target capacities * ref capacities)
    logger.info("applying scaling to capacities")
    to_scale.loc[:, "Capacity"] = to_scale.groupby("tech_group").group_fraction.transform(
        lambda x: x * group_totals_ref[x.name]
    )

    return to_scale

`scale_down_pypsa_caps(merged_caps, pypsa_caps, tech_groupings)`

Scale down the pypsa capacities to match the remind capacities by tech group. Does not scale up the pypsa capacities.

Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

Parameters:

Name	Type	Description	Default
`merged_caps`	`DataFrame`	DataFrame with the merged remind and pypsa capacities by tech group.	required
`pypsa_caps`	`DataFrame`	DataFrame with the pypsa capacities.	required
`tech_groupings`	`DataFrame`	DataFrame with the pypsa tech group names.	required

Source code in src/rpycpl/capacities_etl.py

def scale_down_pypsa_caps(
    merged_caps: pd.DataFrame, pypsa_caps: pd.DataFrame, tech_groupings: pd.DataFrame
) -> pd.DataFrame:
    """
    Scale down the pypsa capacities to match the remind capacities by tech group.
    Does not scale up the pypsa capacities.

    Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

    Args:
        merged_caps (pd.DataFrame): DataFrame with the merged remind and pypsa capacities
             by tech group.
        pypsa_caps (pd.DataFrame): DataFrame with the pypsa capacities.
        tech_groupings (pd.DataFrame): DataFrame with the  pypsa tech group names.
    """
    merged_caps["fraction"] = merged_caps.capacity_remind / merged_caps.capacity_pypsa

    scalings = merged_caps.copy()
    # do not touch cases where remind capacity is larger than pypsa capacity
    scalings["fraction"] = scalings["fraction"].clip(upper=1)
    scalings.dropna(subset=["fraction"], inplace=True)

    pypsa_caps["tech_group"] = pypsa_caps.Tech.map(tech_groupings.group.to_dict())
    pypsa_caps = pypsa_caps.merge(
        scalings[["tech_group", "fraction"]],
        how="left",
        on="tech_group",
        suffixes=("", "_scaling"),
    )
    pypsa_caps.Capacity = pypsa_caps.Capacity * pypsa_caps.fraction
    return pypsa_caps