|
8 | 8 | "# Merging and Splicing Time Series\n", |
9 | 9 | "This tutorial demonstrates the usage and difference between `ts_merge` and `ts_splice`, two methods for folding together time series into a combined data structure.\n", |
10 | 10 | "\n", |
11 | | - "- **`ts_merge`** blends multiple time series together based on priority, filling missing values. It potentiallyu uses all the input series at all timestamps.\n", |
| 11 | + "- **`ts_merge`** blends multiple time series together based on priority, optionally filling missing values in higher priority series with entries from lower priority. It potentially uses all the input series at all timestamps. See the [`strict_priority`](#ts_merge-strict-priority-option) option below for advanced control over nan-filling between priorities.\n", |
12 | 12 | "- **`ts_splice`** stitches together time series in sequential time **blocks** without mixing values.\n", |
13 | 13 | "\n", |
14 | 14 | "We will describe the effect on regularly sampled series (which have the `freq` attribute) and on irregular. We will also explore the **`names`** argument, which controls how columns are selected or renamed in the merging/splicing process. There is a file-level command line tools for this as well in the `dms_datastore` package.\n", |
|
1012 | 1012 | "\n", |
1013 | 1013 | "This notebook provides a clear comparison to help you decide which method best suits your use case.\n" |
1014 | 1014 | ] |
| 1015 | + }, |
| 1016 | + { |
| 1017 | + "cell_type": "markdown", |
| 1018 | + "metadata": {}, |
| 1019 | + "source": [ |
| 1020 | + "# `ts_merge`: strict priority option\n", |
| 1021 | + "**New option**: `strict_priority` (default `False`) enforces that a higher‑priority series dominates between its `first_valid_index` and `last_valid_index`.\n", |
| 1022 | + "\n", |
| 1023 | + "**Semantics**\n", |
| 1024 | + "- Per **column**, define the dominance window as `[first_valid_index, last_valid_index]`.\n", |
| 1025 | + "- Within that window, lower‑priority series are **masked**, even if the higher‑priority value is `NaN`.\n", |
| 1026 | + "- Outside those windows, merging is unchanged and lower priority may contribute.\n", |
| 1027 | + "- With irregular inputs, timestamps that exist **only** in lower‑priority series **and** are fully masked inside a dominance window are dropped; timestamps from the top series' index are preserved even if all‑`NaN`.\n", |
| 1028 | + "\n", |
| 1029 | + "**`names` behavior** is unchanged.\n", |
| 1030 | + "### Example 1 — Series with interior `NaN`\n", |
| 1031 | + "\n", |
| 1032 | + "```python\n", |
| 1033 | + "import numpy as np, pandas as pd\n", |
| 1034 | + "from vtools.functions.merge import ts_merge\n", |
| 1035 | + "\n", |
| 1036 | + "idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n", |
| 1037 | + "idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n", |
| 1038 | + "s1 = pd.Series([1, 2, np.nan, 4, 5], index=idx1, name=\"A\")\n", |
| 1039 | + "s2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name=\"A\")\n", |
| 1040 | + "\n", |
| 1041 | + "ts_merge((s1, s2)) # default\n", |
| 1042 | + "ts_merge((s1, s2), strict_priority=True)\n", |
| 1043 | + "```\n", |
| 1044 | + "### Example 2 — Two columns, per‑column dominance\n", |
| 1045 | + "\n", |
| 1046 | + "```python\n", |
| 1047 | + "idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n", |
| 1048 | + "idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n", |
| 1049 | + "df1 = pd.DataFrame({\"A\":[1., np.nan, 3., 4., 5.]}, index=idx1)\n", |
| 1050 | + "df1[\"B\"] = df1[\"A\"]\n", |
| 1051 | + "df1.loc[idx1[2], \"B\"] = np.nan # interior NaN in high‑priority B\n", |
| 1052 | + "df2 = pd.DataFrame({\"A\":[10., 20., np.nan, 40., 50.]}, index=idx2)\n", |
| 1053 | + "df2[\"B\"] = df2[\"A\"]\n", |
| 1054 | + "\n", |
| 1055 | + "ts_merge((df1, df2), strict_priority=True)[[\"A\",\"B\"]]\n", |
| 1056 | + "```\n", |
| 1057 | + "### Example 3 — Irregular inputs\n", |
| 1058 | + "\n", |
| 1059 | + "```python\n", |
| 1060 | + "idx1 = pd.to_datetime([\"2023-01-01\",\"2023-01-03\",\"2023-01-07\",\"2023-01-10\"])\n", |
| 1061 | + "idx2 = pd.to_datetime([\"2023-01-02\",\"2023-01-04\",\"2023-01-08\",\"2023-01-11\"])\n", |
| 1062 | + "s1 = pd.Series([1.,2.,3.,4.], index=idx1, name=\"A\")\n", |
| 1063 | + "s2 = pd.Series([10.,20.,30.,40.], index=idx2, name=\"A\")\n", |
| 1064 | + "\n", |
| 1065 | + "ts_merge((s1, s2), strict_priority=True)\n", |
| 1066 | + "```\n" |
| 1067 | + ] |
| 1068 | + }, |
| 1069 | + { |
| 1070 | + "cell_type": "code", |
| 1071 | + "execution_count": null, |
| 1072 | + "metadata": {}, |
| 1073 | + "outputs": [], |
| 1074 | + "source": [ |
| 1075 | + "import numpy as np, pandas as pd\n", |
| 1076 | + "from vtools.functions.merge import ts_merge\n", |
| 1077 | + "\n", |
| 1078 | + "# Example 1\n", |
| 1079 | + "idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"D\")\n", |
| 1080 | + "idx2 = pd.date_range(\"2023-01-03\", periods=5, freq=\"D\")\n", |
| 1081 | + "s1 = pd.Series([1, 2, np.nan, 4, 5], index=idx1, name=\"A\")\n", |
| 1082 | + "s2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name=\"A\")\n", |
| 1083 | + "print(\"Example 1 strict=False:\")\n", |
| 1084 | + "print(ts_merge((s1, s2)))\n", |
| 1085 | + "print(\"Example 1 strict=True:\")\n", |
| 1086 | + "print(ts_merge((s1, s2), strict_priority=True))\n", |
| 1087 | + "\n", |
| 1088 | + "# Example 2\n", |
| 1089 | + "df1 = pd.DataFrame({\"A\":[1., np.nan, 3., 4., 5.]}, index=idx1)\n", |
| 1090 | + "df1[\"B\"] = df1[\"A\"]; df1.loc[idx1[2], \"B\"] = np.nan\n", |
| 1091 | + "df2 = pd.DataFrame({\"A\":[10., 20., np.nan, 40., 50.]}, index=idx2)\n", |
| 1092 | + "df2[\"B\"] = df2[\"A\"]\n", |
| 1093 | + "print(\"\\nExample 2 strict=True:\")\n", |
| 1094 | + "print(ts_merge((df1, df2), strict_priority=True)[[\"A\",\"B\"]])\n", |
| 1095 | + "\n", |
| 1096 | + "# Example 3\n", |
| 1097 | + "idx1i = pd.to_datetime([\"2023-01-01\",\"2023-01-03\",\"2023-01-07\",\"2023-01-10\"])\n", |
| 1098 | + "idx2i = pd.to_datetime([\"2023-01-02\",\"2023-01-04\",\"2023-01-08\",\"2023-01-11\"])\n", |
| 1099 | + "s1i = pd.Series([1.,2.,3.,4.], index=idx1i, name=\"A\")\n", |
| 1100 | + "s2i = pd.Series([10.,20.,30.,40.], index=idx2i, name=\"A\")\n", |
| 1101 | + "print(\"\\nExample 3 strict=True:\")\n", |
| 1102 | + "print(ts_merge((s1i, s2i), strict_priority=True))\n" |
| 1103 | + ] |
1015 | 1104 | } |
1016 | 1105 | ], |
1017 | 1106 | "metadata": { |
|
0 commit comments