Commit 987fe18
feat!(ingest): Add host field for INSDC sequences during ingest (#6534)
resolves #6533
#6295
This PR modifies ingest to ensure host organism information is processed
uniformly for INSDC-ingested sequences and direct submissions. Because
of this, we can also remove INSDC specific behaviour from preprocessing.
This will be done in a separate PR once we know this ingest change is
correct and stable.
Concretely, ingest now collapses submissions with `hostTaxonId` and/or
`hostNameScientific` into a single `host` field (removing `hostTaxonId`
and`hostNameScientific`). The `hostNameCommon` field is also removed.
## Breaking changes
Not strictly breaking, but when rolling out this PR, this DB surgery
needs to be run to make sure existing INSDC submissions fit the new
model:
<details>
<summary>An old command with a WHERE clause that was too strict
</summary>
The command to run:
```sql
UPDATE sequence_entries
SET unprocessed_data = jsonb_set(
unprocessed_data,
'{metadata}',
((unprocessed_data -> 'metadata') - 'hostTaxonId' - 'hostNameScientific' - 'hostNameCommon')
|| CASE
WHEN COALESCE(
NULLIF(unprocessed_data -> 'metadata' ->> 'hostTaxonId', ''),
NULLIF(unprocessed_data -> 'metadata' ->> 'hostNameScientific', '')
) IS NOT NULL
THEN jsonb_build_object('host', COALESCE(
NULLIF(unprocessed_data -> 'metadata' ->> 'hostTaxonId', ''),
NULLIF(unprocessed_data -> 'metadata' ->> 'hostNameScientific', '')))
ELSE '{}'::jsonb
END
)
WHERE unprocessed_data IS NOT NULL
AND (unprocessed_data -> 'metadata' ? 'hostTaxonId'
OR unprocessed_data -> 'metadata' ? 'hostNameScientific'
OR unprocessed_data -> 'metadata' ? 'hostNameCommon')
AND NOT (unprocessed_data -> 'metadata' ? 'host');
```
</details>
```sql
UPDATE sequence_entries
SET unprocessed_data = jsonb_set(
unprocessed_data,
'{metadata}',
((unprocessed_data -> 'metadata') - 'hostTaxonId' - 'hostNameScientific' - 'hostNameCommon')
|| CASE
WHEN COALESCE(
NULLIF(unprocessed_data -> 'metadata' ->> 'hostTaxonId', ''),
NULLIF(unprocessed_data -> 'metadata' ->> 'hostNameScientific', '')
) IS NOT NULL
THEN jsonb_build_object('host', COALESCE(
NULLIF(unprocessed_data -> 'metadata' ->> 'hostTaxonId', ''),
NULLIF(unprocessed_data -> 'metadata' ->> 'hostNameScientific', '')))
ELSE '{}'::jsonb
END
)
WHERE unprocessed_data IS NOT NULL
AND (unprocessed_data -> 'metadata' ? 'hostTaxonId'
OR unprocessed_data -> 'metadata' ? 'hostNameScientific'
OR unprocessed_data -> 'metadata' ? 'hostNameCommon')
AND NULLIF(unprocessed_data -> 'metadata' ->> 'host', '') IS NULL;
```
### Screenshot
### PR Checklist
- [x] All necessary documentation has been adapted.
- [x] The implemented feature is covered by appropriate, automated
tests.
- [x] Any manual testing that has been done is documented (i.e. what
exactly was tested?)
🚀 Preview: Add `preview` label to enable
---------
Co-authored-by: anna-parker <50943381+anna-parker@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>1 parent cfe4f07 commit 987fe18
5 files changed
Lines changed: 39 additions & 9 deletions
File tree
- ingest
- scripts
- tests/expected_output_cchf
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
267 | 268 | | |
268 | 269 | | |
269 | 270 | | |
| 271 | + | |
| 272 | + | |
270 | 273 | | |
271 | 274 | | |
272 | 275 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| |||
170 | 171 | | |
171 | 172 | | |
172 | 173 | | |
173 | | - | |
| 174 | + | |
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
| |||
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
239 | | - | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
240 | 243 | | |
241 | 244 | | |
242 | 245 | | |
| |||
334 | 337 | | |
335 | 338 | | |
336 | 339 | | |
337 | | - | |
| 340 | + | |
338 | 341 | | |
339 | 342 | | |
340 | 343 | | |
341 | 344 | | |
342 | 345 | | |
343 | | - | |
| 346 | + | |
344 | 347 | | |
345 | 348 | | |
346 | 349 | | |
| |||
355 | 358 | | |
356 | 359 | | |
357 | 360 | | |
358 | | - | |
| 361 | + | |
359 | 362 | | |
360 | 363 | | |
361 | 364 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
39 | 58 | | |
40 | 59 | | |
41 | 60 | | |
| |||
151 | 170 | | |
152 | 171 | | |
153 | 172 | | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
154 | 178 | | |
155 | 179 | | |
156 | 180 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
0 commit comments