Skip to content

[FEA] tl.rank_genes_groups_logreg #411

@MPebworthEpana

Description

@MPebworthEpana

Is your feature request related to a problem? Please describe.
Finding cluster markers requires either downsampling or pushing the entire normalized matrix into GPU memory.
Downsampling at this point isn't feasible - we're drastically undersampling.

We need the ability to identify cluster markers using Dask arrays.

Describe the solution you'd like
Allow the function to accept Dask arrays. It would be ideal if we could include other control variables too in the regression. Like identify cell cluster markers, but control for batch (it may be controlled for in the embedding, but uneven cluster composition could drive some batch effects for example).

Solution:
Logistic regression for dask arrays is implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LogisticRegression.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions