feat(python-notebook-migration): add database tables for Notebook Migration tool#5055
feat(python-notebook-migration): add database tables for Notebook Migration tool#5055zyratlo wants to merge 7 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5055 +/- ##
============================================
- Coverage 47.13% 46.02% -1.11%
+ Complexity 2344 2342 -2
============================================
Files 1042 1046 +4
Lines 39989 40037 +48
Branches 4260 4262 +2
============================================
- Hits 18849 18429 -420
- Misses 20015 20491 +476
+ Partials 1125 1117 -8
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please check this pr: #4401, for database changes |
I have addressed this issue |
| DROP TABLE IF EXISTS notebook CASCADE; | ||
| DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE; |
There was a problem hiding this comment.
Re-run safety: the DROP TABLE IF EXISTS notebook CASCADE + DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE at the top of the migration silently destroys data if this file is ever applied a second time (manual psql -f, a DB reset that forgets the Liquibase DATABASECHANGELOG, or a future edit that changes the MD5 and prompts a re-run). The other update files (21.sql, 22.sql, etc.) are pure ALTER / CREATE and don't carry this footgun. Could we drop the DROPs and use CREATE TABLE IF NOT EXISTS for the two new tables instead?
| CREATE TABLE IF NOT EXISTS notebook | ||
| ( | ||
| nid SERIAL NOT NULL PRIMARY KEY, | ||
| wid INT NOT NULL, | ||
| notebook JSONB NOT NULL, | ||
| FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
Cardinality question: nothing in the schema prevents inserting two notebook rows for the same wid, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a UNIQUE (wid) on notebook (or making wid the PK in place of nid) be safer than relying on application code to enforce it?
| CREATE TABLE IF NOT EXISTS notebook | ||
| ( | ||
| nid SERIAL NOT NULL PRIMARY KEY, | ||
| wid INT NOT NULL, | ||
| notebook JSONB NOT NULL, | ||
| FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
The main read pattern for notebook seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a CREATE INDEX ON notebook(wid)? (If a UNIQUE(wid) is added per the previous comment, that already creates an index and this is moot.)
What changes were proposed in this PR?
This PR adds two new tables to the database,
notebookandworkflow_notebook_mapping. These tables will be used in the new Python Notebook Migration tool to store the user-uploaded notebook and the generated mapping between the notebook and workflow.Any related issues, documentation, discussions?
Closes #5054
The parent issue is #4301
Schema
How was this PR tested?
New tables were manually confirmed to be created successfully and usable.
Was this PR authored or co-authored using generative AI tooling?
No