Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 41 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ shift_excel_dates(
patient_sheet="patients",
patient_id_col="patient_id",
sheet_configs=sheet_configs,
min_shift_days=-15, # Lower range
max_shift_days=15, # Uperr range
seed=42, # For reproducibility
date_format="YYYY-MM-DD", # Output date format
min_shift_days=-15, # Lower range
max_shift_days=15, # Upper range
seed=42, # For reproducibility
date_format="YYYY-MM-DD",
)
```

Expand Down Expand Up @@ -76,6 +76,26 @@ shift_excel_dates(
)
```

### Preserving formatting with `shift_excel_dates_inplace`

If your workbook has rich formatting (cell styles, column widths, conditional formatting, etc.) use `shift_excel_dates_inplace` instead. It copies the input file and modifies date cells directly via openpyxl, so all formatting is preserved exactly.

```python
from nuh_helper import shift_excel_dates_inplace

shift_excel_dates_inplace(
input_file="input.xlsx",
output_file="output.xlsx",
patient_sheet="patients",
patient_id_col="patient_id",
sheet_configs=sheet_configs,
seed=42,
linking_table_output="shift_mappings.csv",
)
```

The function accepts the same parameters as `shift_excel_dates` except `date_format` (not needed — the original cell format is preserved). External links and named ranges are removed from the output to avoid Excel repair dialogs.

### Key parameters (date shifting)

- `input_file`: Path to input Excel file
Expand All @@ -85,17 +105,31 @@ shift_excel_dates(
- `sheet_configs`: Dictionary mapping sheet names to configuration dicts with:
- `patient_id_col`: Patient ID column name in that sheet
- `date_columns`: List of date column names to shift
- `header_row`: (Optional) Zero-based row index for column names
- `header_row`: (Optional) Zero-based row index for the row that contains column names
- `skip_rows_after_header`: (Optional) List of zero-based row indices to exclude from data (e.g. a data-type row immediately below the header)
- `patient_header_row`: (Optional) Zero-based header row for the patient sheet (default: 0). If the patient sheet is in `sheet_configs`, that sheet’s `header_row` is used instead.
- `patient_skip_rows`: (Optional) Zero-based row indices to exclude from patient data (e.g. a data-type row). If the patient sheet is in `sheet_configs`, that sheet’s `skip_rows_after_header` is used instead.
- `min_shift_days` / `max_shift_days`: Range of days to shift (default: -15 to 15)
- `linking_table_path`: (Optional) Path to existing linking table CSV for reproducibility
- `linking_table_output`: (Optional) Path to save the linking table CSV
- `seed`: (Optional) Random seed for generating shifts
- `date_format`: (Optional) Excel date format string (e.g., 'YYYY-MM-DD')
- `date_format`: (Optional, `shift_excel_dates` only) Excel date format string (e.g., ‘YYYY-MM-DD’)

### Excel layout (header row and merged cells)

Sheets can have a non-standard layout: e.g. a merged title row, then a description row, then the actual column names, then a data-type row. Configure as follows:

- Set `header_row` to the **zero-based index of the row that contains the column names** (the row you use for config: `patient_id_col`, `date_columns`).
- Set `skip_rows_after_header` to the indices of any rows **below the header** that should not be treated as data (e.g. a data-type row).
- **Merged cells**: The library reads the header row via openpyxl and resolves merged cells (value taken from the top-left of each merge), so column names are correct even when the sheet has merged cells. Merged ranges in the description area (rows above the header) are preserved when writing the output.

### Date shifting features

- Shifts dates consistently across multiple Excel sheets
- Preserves Excel structure (description rows)
- `shift_excel_dates_inplace`: full formatting preservation (cell styles, column widths, conditional formatting, etc.)
- Preserves Excel structure (description rows and merged cells in that area)
- Correct header detection with merged cells (openpyxl-based resolution)
- Optional skip of rows after the header (e.g. data-type row) via `skip_rows_after_header`
- Supports flexible date parsing (handles various formats and placeholders like "Unknown")
- Reproducible shifts via linking tables

Expand Down
2 changes: 2 additions & 0 deletions nuh_helper/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@
generate_shift_mappings,
load_shift_mappings,
shift_excel_dates,
shift_excel_dates_inplace,
)
from nuh_helper.profile import generate_scan_report # noqa: E402

__all__ = [
"shift_excel_dates",
"shift_excel_dates_inplace",
"apply_date_shifts",
"generate_shift_mappings",
"load_shift_mappings",
Expand Down
Loading
Loading