Skip to content

New data loading#307

Open
cormacmlynch wants to merge 11 commits into
mainfrom
new-data-loading-2
Open

New data loading#307
cormacmlynch wants to merge 11 commits into
mainfrom
new-data-loading-2

Conversation

@cormacmlynch
Copy link
Copy Markdown
Collaborator

This pull request restructures how data is stored and read in to the model. The main changes are in input_functions.py.

Excel-based master files and per region per variable csv files have been replaced by a single csv file for each input variable. The benefits of this method are enhanced speed of data loading (no need for the xlsx to csv step and significantly fewer files to read) and better machine-friendly data format (for version control). The Polars library is also used in place of Pandas for speed gains.

The new data loading is only implemented for the power, heat, transport, and freight models. Data files for just S0 are included for now. Scenarios work as they did before -- a full set of variable csvs is not required for a scenario, only files different from S0 (e.g., MEWR in a scenario with regulation on coal).

First pass at implementing a new way to store and read in data files.

This method stores variables as CSV files (one for each variable). Users are expected to edit CSV files directly. CSVs are then read into the model, removing the xlsx->csv step. Polars is also used in place of Pandas for speed gains (not clear yet how much the gain in speed is due to this swap).

Also included is a script (convert_inputs_to_new_format.py) which can be used to convert old style files to the new format.

Model output has been tested for Tr, P, Fr, H. Shares results the same as before.
no longer needed as part of new data loading
Removing some old text left over from prototyping
@Femkemilene
Copy link
Copy Markdown
Contributor

@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder

@Femkemilene Femkemilene marked this pull request as ready for review May 14, 2026 09:12
Agent-Logs-Url: https://github.com/cpmodel/FTT_StandAlone/sessions/c7808ca3-18df-4bbb-b204-6c66dc5ff412

Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

@copilot: can you review all the .py files? No need to look at the file changes in the Inputs folder

Reviewed all changed .py files outside Inputs/ and applied two fixes in commit a0b0003: (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py, and (2) restored explicit FileNotFoundError in SourceCode/support/dimensions_functions.py when VariableListing.csv is missing. No UI changes were made (so no screenshot applies).

Copilot AI requested a review from Femkemilene May 14, 2026 09:18
@Femkemilene
Copy link
Copy Markdown
Contributor

@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

@copilot: can you explain what you did here and why? (1) removed sys.path mutation and restored package import in SourceCode/support/convert_masterfiles_to_csv.py

In a0b0003, I removed the runtime sys.path edits because they make imports depend on execution context and can accidentally shadow modules. I replaced from titles_functions import load_titles with from SourceCode.support.titles_functions import load_titles so import resolution is explicit, stable, and consistent with package-style imports used elsewhere in the repo. This reduces environment-specific import bugs without changing conversion logic.

@Femkemilene Femkemilene linked an issue May 18, 2026 that may be closed by this pull request
@Femkemilene
Copy link
Copy Markdown
Contributor

@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR?

# Conflicts:
#	SourceCode/model_class.py
#	SourceCode/support/input_functions.py

Co-authored-by: Femkemilene <26096675+Femkemilene@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

@copilot: I've recently merged a PR into main. Can you merge main into this branch, so it's easier to assess that everything works together before I start reviewing the PR?

Done — I merged main into this branch and pushed the merge commit 82ff0c0. There were conflicts in SourceCode/model_class.py and SourceCode/support/input_functions.py, and I resolved them by keeping this branch’s versions in those overlapping sections.

@Femkemilene
Copy link
Copy Markdown
Contributor

Femkemilene commented May 26, 2026

@cormacmlynch: In the previous version, we had a lot of error messages when things went wrong. Now, the data_loading was more finicky before, so maybe it's not as urgent, but I was wondering if it was a conscious choice to get rid of this functionality. Otherwise, it's lovely. I've tidied up slightly and hopefully increased the stability of the GUI (it gave a weird warning that Claude fixed).

I think the type of data errors we need to check for:

  1. Missing data (?)
  2. Incorrection dimensions (too many or too few rows and columns).

I think that should be fairly clean to check for. My first attempt with Claude led to low-quality vibe code, so have not committed this. Would you have time to reintroduce these two checks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up import of input files

3 participants