Skip to content

Replace pybis with a smaller, typed openbis client tailored for masterdata ops #283

@JosePizarro3

Description

@JosePizarro3

bam-masterdata currently depends directly on pybis for loging, masterdata sync, inventory access, parser uploads, and some dataset ops (mainly, saving in collection/object). After reviewing pybis, I think we could replace the subset of operations that bam-masterdata uses for synchronization and minimal client connectivity, without replacing fully pybis.

The main issues by relying too much on pybis happen due to some inconsistencies with the releases ecosystem, but also because:

  • pybis is highly dynamic and difficult to reason about statically (e.g., when doing masterdata syncs)
  • transport, auth, request building, entity state, and business logic are tightly coupled (not very good OOP practices)
  • we only use a small portion of the available functionalities
  • testing would be easy with a small, mocked-up internal client interface

This is then a super-issue to track the changes needed to gradually reduce the dependency on pybis usage with a cleaner internal openBIS client built around Python and OOP/SOLID practices. In short:

Goals

  • Introduce a small internal openBIS integration layer with explicit responsibilities.
  • Reduce dynamic/magic behavior in the integration boundary.
  • Improve typing, testability, and long-term maintainability.
  • Allow incremental migration instead of a risky full rewrite.

Not goals

  • Rebuilding the full pybis feature set.
  • Changing user-facing CLI behavior unless needed for the migration.
  • Refactoring unrelated metadata/domain code in the same effort.

ChatGPT suggests the following architecture design:

  • A transport/auth layer for JSON-RPC requests and session handling.
  • Focused service clients such as MasterdataClient, InventoryClient, and later DatasetClient.
  • Explicit models for the openBIS concepts used by this project.
  • A compatibility boundary so the rest of bam-masterdata depends on our interface, not on pybis.

TODOs

  • Define replacement boundary and current pybis usage map
    Create a clear inventory of which pybis methods are actually used in bam-masterdata, group them by concern, and define the first supported internal interface.

  • Introduce minimal transport and authentication client
    Implement a small internal client for session management, JSON-RPC requests, error handling, and configuration loading, without yet changing the higher-level workflows.

  • Implement typed masterdata operations
    Add internal support for the masterdata flows currently used here: property types, object types, dataset types, collection types, vocabularies, and vocabulary terms.

  • Migrate read/export flows away from direct pybis usage
    Refactor the current openBIS read-side integration such as entity fetching and masterdata export/import helpers to use the new internal client.

  • Migrate parser inventory/eln object creation workflows
    Refactor parser-related operations such as space/project/collection lookup, object creation/update, and parent-child relationship handling to use the new interface.

  • Decide dataset storage strategy and remove pybis dependency completely
    Evaluate whether dataset upload/download should also be migrated now, deferred, or isolated behind a separate adapter, then remove the remaining direct pybis dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementEnhancements on the code baserefactorRefactoring some existing API functionalities

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions