Skip to content

Capacities etl

Extract, Transform, Load (ETL) operations for REMIND (pre-invetment) generation capacities

The aim is to translate the REMIND pre-investment capacities into pypsa brownfield capacities. PyPSA workflows already come with their own bronwfield data (e.g. from powerplantmatching) assigned to nodes/clusters. This capacity needs to be adjusted to the REMIND capacities.

Harmonisation of REMIND and PypSA Capacities

In case the REMIND capacities are smaller than the pypsa brownfield capacities, the pypsa capacities are scaled down by tech.

In case the REMIND capacities are larger, the pypsa brownfield capacities are kept and an additional paid-off component is added to the pypsa model as a max (paid-off ie free) capacity constraint. The constraint is REMIND REGION wide so that pypsa determines the optimal location of the REMIND-built capacity.

Workflow integration

The constraints and data are exported as files made available to the pypsa workflow. - use the ETL transformations convert_remind_capacities to prpeare the data - use build_tech_map to creat tech_groups from the technoeconomic mapping.csv - merge the pypsa capacities data with the tech_groups - idem for the converted remind capacities data - use the harmonize_capacities ETL to harmonize the capacities - finally use the calc_paidoff_capacity ETL to calculate the paid-off capacities

calc_paidoff_capacity(remind_capacities, harmonized_pypsa_caps)

Calculate the aditional paid off capacity available to pypsa from REMIND investment decisions. The paid off capacity is the difference between the REMIND capacities and the harmonized pypsa capacities. The paid off capacity is available to pypsa as a zero-capex tech.

Parameters:

Name Type Description Default
remind_capacities DataFrame

DataFrame with remind capacities in MW.

required
harmonized_pypsa_caps dict[str, DataFrame]

Dictionary with harmonized pypsa capacities by year (capped to REMIND cap)

required

Returns: pd.DataFrame: DataFrame with the available paid off capacity by tech group.

Source code in src/rpycpl/capacities_etl.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def calc_paidoff_capacity(
    remind_capacities: pd.DataFrame, harmonized_pypsa_caps: dict[str, pd.DataFrame]
) -> pd.DataFrame:
    """
    Calculate the aditional paid off capacity available to pypsa from REMIND investment decisions.
    The paid off capacity is the difference between the REMIND capacities and the harmonized
    pypsa capacities. The paid off capacity is available to pypsa as a zero-capex tech.

    Args:
        remind_capacities (pd.DataFrame): DataFrame with remind capacities in MW.
        harmonized_pypsa_caps (dict[str, pd.DataFrame]): Dictionary with harmonized
            pypsa capacities by year (capped to REMIND cap)
    Returns:
        pd.DataFrame: DataFrame with the available paid off capacity by tech group.
    """

    # merge all years of harmonized capacities into a single DataFrame
    def grp(df, yr):
        return df.groupby("tech_group").apply(
            lambda x: pd.Series(
                {"capacity": x.Capacity.sum(), "year": yr, "techs": ",".join(x.Tech)}
            )
        )

    pypsa_caps = pd.concat(
        [grp(df, yr) for yr, df in harmonized_pypsa_caps.items() if not df.empty]
    )
    pypsa_caps.year = pypsa_caps.year.astype(int)
    remind_caps = remind_capacities.groupby(["tech_group", "year"]).capacity.sum().reset_index()
    merged = pd.merge(
        remind_caps,
        pypsa_caps,
        how="left",
        on=["year", "tech_group"],
        suffixes=("_remind", "_pypsa"),
    ).fillna(0)
    # TODO check for nans and raise warnings
    merged["paid_off"] = merged.capacity_remind - merged.capacity_pypsa
    if (merged.paid_off < -1e-6).any():
        raise ValueError(
            "Found negative Paid off capacities. This indicates that the harmonized PyPSA capacities "
            "exceed the REMIND capacities. Please check the harmonization step."
        )

    return (
        merged.groupby(["tech_group", "year"])
        .paid_off.sum()
        .clip(lower=0)
        .reset_index()
        .rename(columns={"paid_off": "Capacity"})
    )

scale_down_capacities(to_scale, reference)

Scale down the target (existing pypsa) capacities to not exceed the refernce (remind) capacities by tech group. The target capacities can have a higher spatial resolution. This function can be used to harmonize the capacities between REMIND and PyPSA. Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

Parameters:

Name Type Description Default
to_scale DataFrame

DataFrame with the target (pypsa) capacities for a single year.

required
reference DataFrame

DataFrame with the ref (remind) capacities by tech group.

required

Returns: pd.DataFrame: DataFrame with capacities clipped to the reference for each tech group. Example: remind_caps = pd.DataFrame({"technology": ["wind", "hydro"], "capacity": [300, 200]}) data = {'hydro': {('Capacity', 'node1'): 240, ('Capacity', 'node2'): 360}, 'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) pypsa_caps = pd.DataFrame.from_dict(d, orient="index") # poweplantmatching scaled_caps = scale_down_capacities(pypsa_caps, remind_caps, tech_groupings = {"hydro": "hydro", "wind": "wind"}) >> {'hydro': {('Capacity', 'node1'): 120, ('Capacity', 'node2'): 180}, # scaled down 'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) # untouched

Source code in src/rpycpl/capacities_etl.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def scale_down_capacities(to_scale: pd.DataFrame, reference: pd.DataFrame) -> pd.DataFrame:
    """
    Scale down the target (existing pypsa) capacities to not exceed the refernce (remind)
        capacities by tech group. The target capacities can have a higher spatial resolution.
        This function can be used to harmonize the capacities between REMIND and PyPSA.
        Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

    Args:
        to_scale (pd.DataFrame): DataFrame with the target (pypsa) capacities for a single year.
        reference (pd.DataFrame): DataFrame with the ref (remind) capacities by tech group.
    Returns:
        pd.DataFrame: DataFrame with capacities clipped to the reference for each tech group.
    Example:
        remind_caps = pd.DataFrame({"technology": ["wind", "hydro"], "capacity": [300, 200]})
        data = {'hydro': {('Capacity', 'node1'): 240, ('Capacity', 'node2'): 360},
                'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}})
        pypsa_caps = pd.DataFrame.from_dict(d, orient="index") # poweplantmatching
        scaled_caps = scale_down_capacities(pypsa_caps, remind_caps,
                            tech_groupings = {"hydro": "hydro", "wind": "wind"})
        >> {'hydro': {('Capacity', 'node1'): 120, ('Capacity', 'node2'): 180}, # scaled down
                'wind': {('Capacity', 'node1'): 20, ('Capacity', 'node2'): 120}}) # untouched
    """
    if reference.year.nunique() > 1:
        raise ValueError("The reference capacities should be for a single year")
    # group the target & ref capacities by tech group
    group_totals_ref = reference.groupby(["tech_group"]).capacity.sum()
    to_scale.loc[:, "group_fraction"] = (
        to_scale.groupby("tech_group").Capacity.transform(lambda x: x / x.sum()).values
    )

    missing = to_scale.query("tech_group == ''")[["Fueltype", "Tech"]].drop_duplicates()
    if not missing.empty:
        logger.warning(
            "Some technologies are not assigned to a tech group. "
            f"Missing from tech groups: {missing}"
        )
        to_scale = to_scale.query("tech_group != ''")

    # set missing tech groups to zero in reference
    not_in_ref = set(to_scale.tech_group.unique()).difference(set(group_totals_ref.index))
    if not_in_ref:
        group_totals_ref = pd.concat([group_totals_ref, pd.Series(0, index=not_in_ref)])

    to_scale.rename(columns={"Capacity": "original_capacity"}, inplace=True)
    # perform the scaling (normalised target capacities * ref capacities)
    logger.info("applying scaling to capacities")
    to_scale.loc[:, "Capacity"] = to_scale.groupby("tech_group").group_fraction.transform(
        lambda x: x * group_totals_ref[x.name]
    )

    return to_scale

scale_down_pypsa_caps(merged_caps, pypsa_caps, tech_groupings)

Scale down the pypsa capacities to match the remind capacities by tech group. Does not scale up the pypsa capacities.

Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

Parameters:

Name Type Description Default
merged_caps DataFrame

DataFrame with the merged remind and pypsa capacities by tech group.

required
pypsa_caps DataFrame

DataFrame with the pypsa capacities.

required
tech_groupings DataFrame

DataFrame with the pypsa tech group names.

required
Source code in src/rpycpl/capacities_etl.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def scale_down_pypsa_caps(
    merged_caps: pd.DataFrame, pypsa_caps: pd.DataFrame, tech_groupings: pd.DataFrame
) -> pd.DataFrame:
    """
    Scale down the pypsa capacities to match the remind capacities by tech group.
    Does not scale up the pypsa capacities.

    Scaling is done by groups of techs, which allows n:1 mapping of remind to pypsa techs.

    Args:
        merged_caps (pd.DataFrame): DataFrame with the merged remind and pypsa capacities
             by tech group.
        pypsa_caps (pd.DataFrame): DataFrame with the pypsa capacities.
        tech_groupings (pd.DataFrame): DataFrame with the  pypsa tech group names.
    """
    merged_caps["fraction"] = merged_caps.capacity_remind / merged_caps.capacity_pypsa

    scalings = merged_caps.copy()
    # do not touch cases where remind capacity is larger than pypsa capacity
    scalings["fraction"] = scalings["fraction"].clip(upper=1)
    scalings.dropna(subset=["fraction"], inplace=True)

    pypsa_caps["tech_group"] = pypsa_caps.Tech.map(tech_groupings.group.to_dict())
    pypsa_caps = pypsa_caps.merge(
        scalings[["tech_group", "fraction"]],
        how="left",
        on="tech_group",
        suffixes=("", "_scaling"),
    )
    pypsa_caps.Capacity = pypsa_caps.Capacity * pypsa_caps.fraction
    return pypsa_caps