Generic etl
generic etl development, to be rebalanced with the remind_coupling package
The ETL operations are governed by the config file. Allowed fields are defined by the rpycpl.etl.Transformation class and are name: str method: Optional[str] frames: Dict[str, Any] params: Dict[str, Any] filters: Dict[str, Any kwargs: Dict[str, Any] dependencies: Dict[str, Any]
The sequence of operations matters: Dependencies represents previous step outputs.
ETLRunner
Container class to execute ETL steps.
Source code in workflow/scripts/remind_coupling/generic_etl.py
run(step, frames, previous_outputs=None, **kwargs)
staticmethod
Run the ETL step using the provided frames and extra arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step
|
Transformation
|
The ETL step to run. |
required |
frames
|
dict
|
Dictionary of loaded frames. |
required |
previous_outputs
|
dict
|
Dictionary of outputs from previous steps that can be used as inputs. |
None
|
**kwargs
|
Additional arguments for the ETL method. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The result of the ETL step. |
Source code in workflow/scripts/remind_coupling/generic_etl.py
RemindLoader
Load Remind symbol tables from csvs or gdx
Source code in workflow/scripts/remind_coupling/generic_etl.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
auto_load(frames, filters=None)
Automatically load, merge, and filter frames in one step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frames
|
dict[str, str]
|
Dictionary mapping parameter names to REMIND symbol names. |
required |
filters
|
dict[str, str]
|
Optional dictionary of filter expressions to apply to frames. |
None
|
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
Dictionary of processed DataFrames ready for transformation. |
Source code in workflow/scripts/remind_coupling/generic_etl.py
load_frames_csv(frames)
Remind Frames to read Args: frames (dict): (param: remind_symbol_name) to read Returns: dict[str, pd.DataFrame]: dictionary (param: dataframe)
Source code in workflow/scripts/remind_coupling/generic_etl.py
load_frames_gdx(frames, gdx_file)
Load frames from GDX file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frames
|
dict[str, str]
|
Dictionary mapping parameter names to REMIND symbol names. |
required |
gdx_file
|
PathLike
|
Path to the GDX file. |
required |
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
dict[str, pd.DataFrame]: Dictionary of loaded DataFrames. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
GDX loading not implemented yet. |
Source code in workflow/scripts/remind_coupling/generic_etl.py
merge_split_frames(frames)
In case several REMIND parameters are needed, group them by their base name Args: frames (dict): Dictionary with all dataframes Example: frames = {eta: 'pm_dataeta', eta_part2: 'pm_eta_conv'} merge_split_frames(frames) >> {eta: pd.concat([pm_dataeta, pm_eta_conv], axis=0).drop_duplicates()}