Skip to content

dvc.api.read: Fails to parse Windows-path for repo argument if script is not on Proj Dir #10127

@Eve-ning

Description

@Eve-ning

Bug Report

❗Temporary Fix at the bottom ❗

Description

dvc.api.read will fail to read files if the repo argument is Windows AND the script execution path is not on the Project Dir.

An error would be raised:

dvc.exceptions.PathMissingError: The path `<PARENT_DIR>/<FILE>` does not exist in the target repository `<PARENT_DIR>/<FILE>` neither as a DVC output nor as a Git-tracked file.

Reproduce

  1. I followed the tutorial: https://dvc.org/doc/start/data-management/data-versioning?tab=Windows-Cmd-
dvc init
dvc get https://github.com/iterative/dataset-registry \
          get-started/data.xml -o data/data.xml
dvc add data/data.xml
  1. Add Python script src/test.py

it must be in a folder, it works fine if test.py is in the proj path

The Windows filesys should be like this now in

dvctest/
  .dvc/
  data/
    .gitignore 
    data.xml 
    data.xml.dvc
  src/
    test.py 
  venv/  
  .dvcignore

test.py

from pathlib import Path

import dvc.api

PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
with dvc.api.open("data/data.xml",
             repo=PROJ_DIR.as_posix(),
             mode="rb") as f:
    print(f.read())

test.py (also causes this issue)

from pathlib import Path

import dvc.api

PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
dvc.api.read("data/data.xml",
             repo=PROJ_DIR.as_posix(),
             mode="rb")

Both of them raise the error:

dvc.exceptions.PathMissingError: The path `src/data.xml' does not exist in the target repository 'src/data.xml' neither as a DVC output nor as a Git-tracked file.

Expected

It should work regardless of where I put the file

dvctest/
  .dvc/
  data/
    .gitignore 
    data.xml 
    data.xml.dvc
  test.py  <---- Putting it here works
  venv/  
  .dvcignore

In test.py, Path(__file__).parents[1] should now be Path(__file__).parents[0] or Path(file).parent`.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.30.3 (pip)
-------------------------
Platform: Python 3.11.6 on Windows-10-10.0.22621-SP0
Subprojects:
        dvc_data = 2.22.3
        dvc_objects = 1.3.0
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.5.0
Supports:
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3)
Config:
        Global: C:\Users\JOHN.CHANGRQ\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\50d38bc72608938a5da29ea637ac44ee

❗ Temporary Fix

For the repo argument, prepend a file://.

dvc.api.read(
    path="path/to/file.txt",
    repo="file://C:/.../my-proj", # previously C:/.../my-proj
)

Im not sure why this doesn't fix the above minimal reproducible example, but it worked for my project

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: apiRelated to the dvc.apiP: windowsRelated to the Platform: WindowsbugDid we break something?p3-nice-to-haveIt should be done this or next sprint

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions