Skip to content

Commit e334fb1

Browse files
feliciaschulzenryHgithub-actions[bot]Copilot
authored
Drift correction (#58)
* Adding metabolomics filtering feature * Metabolomics example data set * Adding test for metabolomics filtering function * Filter functions added. * Updating and cleaning up filtering docstrings * Filtering example api jupyter notebook added. * Changed filtering function name. * ✨ Feat: Add script and docs for QC-based LOESS drift correction module * ✏️ Fix: prevent division by zero, edit documentation * Adding metabolomics example data for drift correction function testing * Editing example aradopsis metabolomics data (removing faulty row). * 📝 Adding api example notebook for drift correction * Adding jupytext file for drift correction api example. * ✨ Start collecting dataframe schemas for input and output DataFrames (#29) * 🚧 start collecting dataframe schemas with enrichment analysis result * ✨ ensure valid numeric omics data functionality * 🚧 add schema validation to examples - can be added to function call using `pa.check_types` * 🐛 add missing dependency * 🐛 bool and boolean with NANs are not the same for pandera. * 🐛 fix tests (convert dtypes expect bool) * 🚧 Add and align differential analysis schema - ANOVA descriptive statistis seems to have errors - should missing values for p-values be allowed? * 🐛 leave strings as they are * 🚧 build schema on the fly for numeric data - could be used to give specific feedback on user provided columns * 🎨 add pandera to mapped types of other libraries * 🚧 add Schemas to functions - anova has two return types depending on number of groups... * 🚧 add two schemas to anova function Schemas should be unified eventually * 📝 all commands for local execution of docs ... as integration test * 🐛 remove testing shell script to find errors in action runs (for automation) * ✨ construct types for exploratory analysis - create a separate PR to refactor exploratory analysis module * 🎨 add Sebastians hint on BaseModel visualization * 🎨 clean-up types - formatting was not done properly in docs - not used as a composite type as of now * ✨ custom PR template for new module (#39) * ✨ custom PR template for new module * 🎨 highlight only folder name * 📝 propose to use built-in virtual environment * 📝 some more design hints * 📝 any docstring type for functions is fine * 🎨 remove intermediate heading * 📝 add hint on using example data for api examples * Update .github/workflows/PULL_REQUEST_TEMPLATE/module.md --------- Co-authored-by: feliciaschulz <112621625+feliciaschulz@users.noreply.github.com> * 🔧 Update CI (#45) * 🔧 remove constraint on numpy as latest inmoose does nto require it anymore * 🔧 align github actions to python package template ... as for other librarier * 🎨 format toml with even better toml * 🔧 isort and ruff configuration - turn ruff configuration on in a separate PR * 🎨 format src folder * 🎨 apply ruff check autofixes * 🔥 remove tox artefacts * 🚚 rename actions yaml file * 🎨 isort tests * 🚚 separate batch correction from normalization (#46) * 🚚 separate batch correction from normalization - fix Move batch correction outside Normalisation 💄 Fixes #22 * 🐛 sync normalization nb and update test imports * 🎨 format and remove unused import * 🐛 sync nb * 🐛 add batch correction to website index * 🎨 link module api * Documentation updates (#47) * 🎨 move from rst to md * 🚧 Close clean up conf.py Fixes #21 - remove duplication - sort code as in other packages - double check if additional entries wer needed * 🎨 consistent scaling and naming (#48) * 🎨 consistent scaling and naming - apply standardization before PCA (leads to minor change in axis) - talk about batch correction, not normalization * 🎨 shorten titles * 🎨 order more meaningfully * 📝 add Contributing.md as reference to PR template (#49) * 📝 add Contributing.md as reference to PR template * 🐛 make it an url * Metabolomics example data (#50) * 🚧 process metabolomics example data * 🎨 format * ✨ tested data with ANOVA, finish processing of data * Adding function for visualising example of the LOESS drift correction. * Adding functions for alternative drift correction method: CPCA * Adding CPCA drift correction to api examples and refining code. * Adding example data for metabolomics * Creating jupytext file for drift correction api example * 🎨 format tutorials and code (black and isort) * 🎨 apply ruff suggestions * docs: update markdown reference * 🎨 Updating index.md to add drift correction api example. * 🎨 put drift correction plotting functions from src to api examples * 🎨 implement logger in drift correction functions to replace print statements * remove excel lock file * adding synced jupytext drift_correction.py * Changing import order for ruff checks * docs: update markdown reference * 🎨 removing unused imports Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * 🎨 Typo in documentation Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * 🎨 Typo in documentation Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * 🎨 Typo in documentation Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * 🎨 fixing error in documentation Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs: update markdown reference * 🎨 ensuring x_qc is sorted so CubicSpline works correctly Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * 🎨 fixing linting errors from ruff * 🎨 filtering issue: opposite functionality to what the documentation explained. fixed. * docs: update markdown reference * 🎨 implementing changes suggested by copilot: NaN value handling, removing temp name column before returning df, raising ValueError with wrong QC threshold * 🎨 isort on multiple files to improve order of inputs. * 🎨 format notebooks a bit more and add filtering to index - hide long source cells - few typos found by copilot - split long lines using rewrap tool * 🎨 clear outputs and hide installation panel output * 🎨 remove unnamed column on integer indices in display * 🎨 Changing example data for loess-based drift correction api example. Minor other changes in documentation of notebook. * 🎨 fixing black errors * 🎨 improvements in visibility * 🐛 set defaults for parameters - anyway not optional, so just set defaults * docs: update markdown reference * 🎨 saveguard string and switch to logging * 🎨 format * 🐛 remove outdated parameter from call --------- Co-authored-by: Henry Webel <heweb@dtu.dk> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 parent 5d85430 commit e334fb1

25 files changed

Lines changed: 31523 additions & 21 deletions

docs/api_examples/batch_correction.ipynb

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "27b8d53e",
5+
"id": "2b2c0d3a",
66
"metadata": {
77
"lines_to_next_cell": 2
88
},
@@ -19,7 +19,7 @@
1919
{
2020
"cell_type": "code",
2121
"execution_count": null,
22-
"id": "dcc1505c",
22+
"id": "e452352c",
2323
"metadata": {
2424
"tags": [
2525
"hide-output"
@@ -33,7 +33,7 @@
3333
{
3434
"cell_type": "code",
3535
"execution_count": null,
36-
"id": "563697dc",
36+
"id": "c7399c16",
3737
"metadata": {
3838
"tags": [
3939
"hide-input"
@@ -107,7 +107,7 @@
107107
},
108108
{
109109
"cell_type": "markdown",
110-
"id": "60ee225e",
110+
"id": "cd6853a6",
111111
"metadata": {},
112112
"source": [
113113
"\n",
@@ -117,7 +117,7 @@
117117
{
118118
"cell_type": "code",
119119
"execution_count": null,
120-
"id": "a32698b4",
120+
"id": "cf489def",
121121
"metadata": {
122122
"tags": [
123123
"parameters"
@@ -141,7 +141,7 @@
141141
},
142142
{
143143
"cell_type": "markdown",
144-
"id": "9bd7687c",
144+
"id": "14907bce",
145145
"metadata": {},
146146
"source": [
147147
"## Data loading\n",
@@ -151,7 +151,7 @@
151151
{
152152
"cell_type": "code",
153153
"execution_count": null,
154-
"id": "ced30460",
154+
"id": "fde7c5dd",
155155
"metadata": {
156156
"tags": [
157157
"hide-input"
@@ -169,7 +169,7 @@
169169
},
170170
{
171171
"cell_type": "markdown",
172-
"id": "55b42bc2",
172+
"id": "2113619e",
173173
"metadata": {},
174174
"source": [
175175
"Metadata here is of type integer. All floats are proteomics measurements."
@@ -178,7 +178,7 @@
178178
{
179179
"cell_type": "code",
180180
"execution_count": null,
181-
"id": "857b0522",
181+
"id": "afd00cfd",
182182
"metadata": {
183183
"tags": [
184184
"hide-input"
@@ -192,7 +192,7 @@
192192
{
193193
"cell_type": "code",
194194
"execution_count": null,
195-
"id": "23657991",
195+
"id": "29a05170",
196196
"metadata": {
197197
"tags": [
198198
"hide-input"
@@ -206,7 +206,7 @@
206206
{
207207
"cell_type": "code",
208208
"execution_count": null,
209-
"id": "8bef0df5",
209+
"id": "86a0d548",
210210
"metadata": {},
211211
"outputs": [],
212212
"source": [
@@ -216,7 +216,7 @@
216216
},
217217
{
218218
"cell_type": "markdown",
219-
"id": "713a7b35",
219+
"id": "7bf84ee6",
220220
"metadata": {},
221221
"source": [
222222
"## Before batch correction\n",
@@ -226,7 +226,7 @@
226226
{
227227
"cell_type": "code",
228228
"execution_count": null,
229-
"id": "986f7cab",
229+
"id": "0b0c854c",
230230
"metadata": {
231231
"tags": [
232232
"hide-input"
@@ -242,7 +242,7 @@
242242
},
243243
{
244244
"cell_type": "markdown",
245-
"id": "2bf8ca74",
245+
"id": "426ee866",
246246
"metadata": {
247247
"lines_to_next_cell": 0
248248
},
@@ -258,7 +258,7 @@
258258
{
259259
"cell_type": "code",
260260
"execution_count": null,
261-
"id": "1cf6f58b",
261+
"id": "f9ba51bd",
262262
"metadata": {},
263263
"outputs": [],
264264
"source": [
@@ -273,7 +273,7 @@
273273
},
274274
{
275275
"cell_type": "markdown",
276-
"id": "ca46af9e",
276+
"id": "1aa67514",
277277
"metadata": {},
278278
"source": [
279279
"Plot PCA and UMAP after batch correction on standard normalized data"
@@ -282,7 +282,7 @@
282282
{
283283
"cell_type": "code",
284284
"execution_count": null,
285-
"id": "cdc4afa2",
285+
"id": "907ed6c7",
286286
"metadata": {
287287
"tags": [
288288
"hide-input"
@@ -296,7 +296,7 @@
296296
},
297297
{
298298
"cell_type": "markdown",
299-
"id": "9fda9a50",
299+
"id": "a530c6ba",
300300
"metadata": {},
301301
"source": [
302302
"See change by substracting combat corrected data from original data.\n",
@@ -306,7 +306,7 @@
306306
{
307307
"cell_type": "code",
308308
"execution_count": null,
309-
"id": "e9cecdd8",
309+
"id": "88ab0d76",
310310
"metadata": {
311311
"tags": [
312312
"hide-input"
@@ -319,7 +319,7 @@
319319
},
320320
{
321321
"cell_type": "markdown",
322-
"id": "d27e6f10",
322+
"id": "84ad3f0d",
323323
"metadata": {},
324324
"source": [
325325
"Done."

0 commit comments

Comments
 (0)