Skip to content

pre-submision query about a python package i've created. #306

@bjthorpe

Description

@bjthorpe

Submitting Author: Ben Thorpe @bjthorpe
Package Name: ml_toolkit
One-Line Description of Package: Python wrapper for handling Apptainer software containers. For use with installing various AI/ML models and dealing with the dependency hell that can cause.
Repository Link (if existing): https://github.com/bjthorpe/Bede_containers
Docs: https://bede-container-docs.readthedocs.io/en/latest/
EiC: TBD


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

I have written a python package called ml_toolkit and I'm wondering if you feel it is in Scope for pyOpenSci? It's a python wrapper for Apptainer. Originally this was part of an intervention aimed at improving access to AI for research, funded by the UK N8 CIR. The basic premise is that the myriad ml/AI models that exist all have their own, often incompatible, dependencies and managing these can difficult and frustrating, often requiring significant technical expertise. This can create a significant burden to using AI and is a serious barrier to making use of AI in real word research.

Thus I created ml_toolkit to help streamline this process, using a number of predefined Apptainer container definitions that have been setup to install the exact packages and dependencies required for each model. This then allows us to present users with a simplified interface consisting of 4 basic commands: build $MODELNAME, start $MODELNAME, stop $MODELNAME, and run $MODELNAME

It was primarily designed for use with N8 supercomputer Bede. However, the package is generic and can theoretically work on any Linux machine since it relies only on Apptainer and python. So we made the decision to release it as an open source project licened under GPL3. Thus anyone can use and contribute to it moving forward. Currently there are a bunch of definition files for popular models for machine learned atomic potentials, for use in materials modeling, my main research area. However, the software is intentionally generic and (hopefully) easy to extend. The ultimate aim is to expand its use and allow the research community to submit definitions for popular tools and models. For example, we already have working example implementations of lamma.cpp and Ollama for use with llms.

So yeah I don't know if this is really in scope as it only uses python as an interface but thought it was worth an ask.

For more info there a simple website to advertise the project and the read the docs site

Thanks

Ben Thorpe

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

  • Please indicate which category or categories this package falls under:

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:

  • Workflow automation: we automate the install and setup process?

  • Scientific software wrappers: its a wrapper to apptainer so i think this is fairly self-explanatory.

  • Who is the target audience and what are the scientific applications of this package?

Researchers who want to make use of AI/ML tools but don't know where to start. Also software engineers who have AI/ML tools and would like to make them reproducible and thus easier access/use.

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?

Not that I'm are aware of.

  • Any other questions or issues we should be aware of:

P.S. Have feedback/comments about our review process? Leave a comment on our discourse forum

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    pre-submission

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions