Skip to content

Commit cf6bf64

Browse files
EliEli
authored andcommitted
Added strict_priority flag to ts_merge to tile series without interior nan filling.
1 parent 97853ed commit cf6bf64

File tree

3 files changed

+294
-177
lines changed

3 files changed

+294
-177
lines changed

docsrc/notebooks/merge_splice.ipynb

Lines changed: 90 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"# Merging and Splicing Time Series\n",
99
"This tutorial demonstrates the usage and difference between `ts_merge` and `ts_splice`, two methods for folding together time series into a combined data structure.\n",
1010
"\n",
11-
"- **`ts_merge`** blends multiple time series together based on priority, filling missing values. It potentiallyu uses all the input series at all timestamps.\n",
11+
"- **`ts_merge`** blends multiple time series together based on priority, optionally filling missing values in higher priority series with entries from lower priority. It potentially uses all the input series at all timestamps. See the [`strict_priority`](#ts_merge-strict-priority-option) option below for advanced control over nan-filling between priorities.\n",
1212
"- **`ts_splice`** stitches together time series in sequential time **blocks** without mixing values.\n",
1313
"\n",
1414
"We will describe the effect on regularly sampled series (which have the `freq` attribute) and on irregular. We will also explore the **`names`** argument, which controls how columns are selected or renamed in the merging/splicing process. There is a file-level command line tools for this as well in the `dms_datastore` package.\n",
@@ -1012,6 +1012,95 @@
10121012
"\n",
10131013
"This notebook provides a clear comparison to help you decide which method best suits your use case.\n"
10141014
]
1015+
},
1016+
{
1017+
"cell_type": "markdown",
1018+
"metadata": {},
1019+
"source": [
1020+
"# `ts_merge`: strict priority option\n",
1021+
"**New option**: `strict_priority` (default `False`) enforces that a higher‑priority series dominates between its `first_valid_index` and `last_valid_index`.\n",
1022+
"\n",
1023+
"**Semantics**\n",
1024+
"- Per **column**, define the dominance window as `[first_valid_index, last_valid_index]`.\n",
1025+
"- Within that window, lower‑priority series are **masked**, even if the higher‑priority value is `NaN`.\n",
1026+
"- Outside those windows, merging is unchanged and lower priority may contribute.\n",
1027+
"- With irregular inputs, timestamps that exist **only** in lower‑priority series **and** are fully masked inside a dominance window are dropped; timestamps from the top series' index are preserved even if all‑`NaN`.\n",
1028+
"\n",
1029+
"**`names` behavior** is unchanged.\n",
1030+
"### Example 1 — Series with interior `NaN`\n",
1031+
"\n",
1032+
"```python\n",
1033+
"import numpy as np, pandas as pd\n",
1034+
"from vtools.functions.merge import ts_merge\n",
1035+
"\n",
1036+
"idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n",
1037+
"idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n",
1038+
"s1 = pd.Series([1, 2, np.nan, 4, 5], index=idx1, name=\"A\")\n",
1039+
"s2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name=\"A\")\n",
1040+
"\n",
1041+
"ts_merge((s1, s2)) # default\n",
1042+
"ts_merge((s1, s2), strict_priority=True)\n",
1043+
"```\n",
1044+
"### Example 2 — Two columns, per‑column dominance\n",
1045+
"\n",
1046+
"```python\n",
1047+
"idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n",
1048+
"idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n",
1049+
"df1 = pd.DataFrame({\"A\":[1., np.nan, 3., 4., 5.]}, index=idx1)\n",
1050+
"df1[\"B\"] = df1[\"A\"]\n",
1051+
"df1.loc[idx1[2], \"B\"] = np.nan # interior NaN in high‑priority B\n",
1052+
"df2 = pd.DataFrame({\"A\":[10., 20., np.nan, 40., 50.]}, index=idx2)\n",
1053+
"df2[\"B\"] = df2[\"A\"]\n",
1054+
"\n",
1055+
"ts_merge((df1, df2), strict_priority=True)[[\"A\",\"B\"]]\n",
1056+
"```\n",
1057+
"### Example 3 — Irregular inputs\n",
1058+
"\n",
1059+
"```python\n",
1060+
"idx1 = pd.to_datetime([\"2023-01-01\",\"2023-01-03\",\"2023-01-07\",\"2023-01-10\"])\n",
1061+
"idx2 = pd.to_datetime([\"2023-01-02\",\"2023-01-04\",\"2023-01-08\",\"2023-01-11\"])\n",
1062+
"s1 = pd.Series([1.,2.,3.,4.], index=idx1, name=\"A\")\n",
1063+
"s2 = pd.Series([10.,20.,30.,40.], index=idx2, name=\"A\")\n",
1064+
"\n",
1065+
"ts_merge((s1, s2), strict_priority=True)\n",
1066+
"```\n"
1067+
]
1068+
},
1069+
{
1070+
"cell_type": "code",
1071+
"execution_count": null,
1072+
"metadata": {},
1073+
"outputs": [],
1074+
"source": [
1075+
"import numpy as np, pandas as pd\n",
1076+
"from vtools.functions.merge import ts_merge\n",
1077+
"\n",
1078+
"# Example 1\n",
1079+
"idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n",
1080+
"idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n",
1081+
"s1 = pd.Series([1, 2, np.nan, 4, 5], index=idx1, name=\"A\")\n",
1082+
"s2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name=\"A\")\n",
1083+
"print(\"Example 1 strict=False:\")\n",
1084+
"print(ts_merge((s1, s2)))\n",
1085+
"print(\"Example 1 strict=True:\")\n",
1086+
"print(ts_merge((s1, s2), strict_priority=True))\n",
1087+
"\n",
1088+
"# Example 2\n",
1089+
"df1 = pd.DataFrame({\"A\":[1., np.nan, 3., 4., 5.]}, index=idx1)\n",
1090+
"df1[\"B\"] = df1[\"A\"]; df1.loc[idx1[2], \"B\"] = np.nan\n",
1091+
"df2 = pd.DataFrame({\"A\":[10., 20., np.nan, 40., 50.]}, index=idx2)\n",
1092+
"df2[\"B\"] = df2[\"A\"]\n",
1093+
"print(\"\\nExample 2 strict=True:\")\n",
1094+
"print(ts_merge((df1, df2), strict_priority=True)[[\"A\",\"B\"]])\n",
1095+
"\n",
1096+
"# Example 3\n",
1097+
"idx1i = pd.to_datetime([\"2023-01-01\",\"2023-01-03\",\"2023-01-07\",\"2023-01-10\"])\n",
1098+
"idx2i = pd.to_datetime([\"2023-01-02\",\"2023-01-04\",\"2023-01-08\",\"2023-01-11\"])\n",
1099+
"s1i = pd.Series([1.,2.,3.,4.], index=idx1i, name=\"A\")\n",
1100+
"s2i = pd.Series([10.,20.,30.,40.], index=idx2i, name=\"A\")\n",
1101+
"print(\"\\nExample 3 strict=True:\")\n",
1102+
"print(ts_merge((s1i, s2i), strict_priority=True))\n"
1103+
]
10151104
}
10161105
],
10171106
"metadata": {

tests/test_merge_splice.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,3 +301,42 @@ def test_non_datetime_index(self):
301301
df2 = pd.DataFrame({"A": [4, 5, 6]}, index=[2, 3, 4])
302302
with pytest.raises(ValueError, match="All input series must have a DatetimeIndex."):
303303
ts_merge((df1, df2))
304+
305+
306+
# ----------------------------------------------------------------------
307+
# Additional tests for strict_priority behavior in ts_merge
308+
# ----------------------------------------------------------------------
309+
310+
def test_ts_merge_strict_priority_series_window(sample_data):
311+
s1, s2 = sample_data["series1"], sample_data["series2"]
312+
# s1 dominates Jan1..Jan5; its NaN on Jan3 remains NaN; s2 cannot fill it.
313+
result = ts_merge((s1, s2), strict_priority=True)
314+
expected_index = s1.index.union(s2.index, sort=False).sort_values()
315+
expected = pd.Series([1., 2., np.nan, 4., 5., np.nan, 50.], index=expected_index, name="A")
316+
pd.testing.assert_series_equal(result, expected)
317+
318+
def test_ts_merge_strict_priority_dataframe_per_column(sample_data):
319+
df1, df2 = sample_data["df1"], sample_data["df2"]
320+
# Create multi-column frames to exercise per-column dominance
321+
df1m = pd.concat([df1, df1.rename(columns={"A": "B"})], axis=1)
322+
df2m = pd.concat([sample_data["df2"], sample_data["df2"].rename(columns={"A": "B"})], axis=1)
323+
# Insert an interior NaN in higher-priority B column to ensure NaN is not backfilled
324+
df1m.loc[df1m.index[2], "B"] = np.nan
325+
result = ts_merge((df1m, df2m), strict_priority=True)
326+
expected_index = df1m.index.union(df2m.index, sort=False).sort_values()
327+
exp = pd.DataFrame(index=expected_index, columns=["A", "B"], dtype=float)
328+
# Column A: df1 covers first window fully; df2 only contributes after the window
329+
exp["A"] = [1., np.nan, 3., 4., 5., 40., 50.]
330+
# Column B: an interior NaN in df1's window must remain NaN
331+
exp["B"] = [1., np.nan, np.nan, 4., 5., 40., 50.]
332+
pd.testing.assert_frame_equal(result[["A", "B"]], exp)
333+
334+
def test_ts_merge_strict_priority_irregular(irregular_sample_data):
335+
s1 = irregular_sample_data["series1"]
336+
s2 = irregular_sample_data["series2"]
337+
# s1 window [first_valid, last_valid] excludes s2 within; s2 contributes only after.
338+
result = ts_merge((s1, s2), strict_priority=True)
339+
expected = pd.Series([1., 2., 3., 4., 40.],
340+
index=pd.to_datetime(["2023-01-01","2023-01-03","2023-01-07","2023-01-10","2023-01-11"]),
341+
name="A")
342+
pd.testing.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)