Generic etl
generic etl development, to be rebalanced with the remind_coupling package
The ETL operations are governed by the config file. Allowed fields are defined by the rpycpl.etl.Transformation class and are name: str method: Optional[str] frames: Dict[str, Any] params: Dict[str, Any] filters: Dict[str, Any kwargs: Dict[str, Any] dependencies: Dict[str, Any]
The sequence of operations matters: Dependencies represents previous step outputs.
ETLRunner
Container class to execute ETL steps.
Source code in workflow/scripts/remind_coupling/generic_etl.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
run(step, frames, previous_outputs=None, **kwargs)
staticmethod
Run the ETL step using the provided frames and extra arguments. Args: step (Transformation): The ETL step to run. frames (dict): Dictionary of loaded frames. previous_outputs (dict, optional): Dictionary of outputs from previous steps that can be used as inputs. **kwargs: Additional arguments for the ETL method. Returns: pd.DataFrame: The result of the ETL step.
Source code in workflow/scripts/remind_coupling/generic_etl.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | |
RemindLoader
Load Remind symbol tables from csvs or gdx
Source code in workflow/scripts/remind_coupling/generic_etl.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
auto_load(frames, filters=None)
Automatically load, merge, and filter frames in one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
dict[str, str]
|
Dictionary mapping parameter names to REMIND symbol names. |
required |
filters
|
dict[str, str]
|
Optional dictionary of filter expressions to apply to frames. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dictionary of processed DataFrames ready for transformation. |
Source code in workflow/scripts/remind_coupling/generic_etl.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
load_frames_csv(frames)
Remind Frames to read Args: frames (dict): (param: remind_symbol_name) to read Returns: dict[str, pd.DataFrame]: dictionary (param: dataframe)
Source code in workflow/scripts/remind_coupling/generic_etl.py
64 65 66 67 68 69 70 71 72 | |
merge_split_frames(frames)
In case several REMIND parameters are needed, group them by their base name Args: frames (dict): Dictionary with all dataframes Example: frames = {eta: 'pm_dataeta', eta_part2: 'pm_eta_conv'} merge_split_frames(frames) >> {eta: pd.concat([pm_dataeta, pm_eta_conv], axis=0).drop_duplicates()}
Source code in workflow/scripts/remind_coupling/generic_etl.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |