Skip to content

Latest commit

 

History

History
71 lines (55 loc) · 3.06 KB

File metadata and controls

71 lines (55 loc) · 3.06 KB

ARC Store — Design

Module Overview

ArcStore (arc_store/__init__.py) is an abstract base class that defines the Git persistence interface. Two implementations exist:

  • GitRepo (arc_store/git_repo.py) — primary implementation. Clones the repository via SSH or HTTPS using the Git CLI (GitPython), writes the ISA file structure generated by arctrl, and pushes the result. Works with any Git-compatible server.
  • GitlabApi (arc_store/gitlab_api.py) — deprecated. Used the GitLab REST API to write files directly without a local clone. Retained for backwards compatibility only; new deployments should use GitRepo.

The caller (ArcManager.sync_to_gitlab) is responsible for parsing the ARC from JSON, selecting the configured backend, and recording CouchDB events. The store itself only handles Git.

ArcManager.sync_to_gitlab(rdi, arc_json_string)
    ├─→ ARC.from_rocrate_json_string(arc_json_string)  ← arctrl parse
    └─→ ArcStore.create_or_update(arc_id, arc_obj)
            └─→ GitRepo  (or GitlabApi — deprecated)
                    ├─→ clone / pull
                    ├─→ write ISA files via arctrl WriteAsync
                    └─→ commit + push

GitRepo Implementation

GitRepo uses GitPython to manage a temporary local clone:

  1. Clone or pull the remote repository to a temp directory.
  2. Call arctrl.ARC.WriteAsync to write the ISA/ARC file structure.
  3. Stage all changes, commit, and push.
  4. Clean up the temp directory.

SSH and HTTPS authentication are both supported via RemoteGitProvider (arc_store/remote_git_provider.py), which injects credentials into the remote URL or SSH command.

Git errors are classified at push time:

  • is_transient_git_error(exc) → raise ArcStoreTransientError (network, 50x)
  • is_soft_git_error(exc) → repo or branch not found; treated as permanent
  • All other GitCommandError → permanent

Key Decisions

  1. ArcStoreTransientError vs permanent errors — Callers (ArcManager) need to distinguish retryable failures from permanent ones to decide whether to schedule a Celery retry. The store raises ArcStoreTransientError for network and availability issues; all other exceptions are treated as permanent by the caller.

  2. GitRepo preferred over GitlabApi — The REST API approach required chunking file actions and had limits on commit size. A real Git clone-and-push is simpler, more reliable, and server-agnostic. GitlabApi is kept only to not break existing deployments and will be removed in a future release.

  3. Temporary local clone, not persistent workspace — Each sync operation clones to a fresh temp directory and deletes it afterwards. This avoids stale state from concurrent workers or failed previous runs.

  4. RemoteGitProvider injects credentials — Credential injection is isolated in RemoteGitProvider so that GitRepo itself has no knowledge of authentication schemes. SSH and HTTPS credential formats differ; the provider abstracts that difference.