Bug Report
Description
When calling
dvc.api.get_url(path,
repo,
config={"core": {"no_scm": True}}
)
dvc still attempts to find a Git repository and raises an exception when it doesn't find it.
We have a mono repo with dvc repositories tracked by git on different paths. During interactive work and development, users interact with dvc and source control management. In some cases, tests and applications are required to run in an isolated environment that does not contain git information; the isolated environment contains all the required dvc configuration and internal files.
In such cases, we would like our code to access dvc information programatically with the API, e.g. using dvc.api.get_url() function to get the s3 path to the remote file. Given that the isolated environment no longer depends on git, but the .dvc/config file is kept and does not contain no_scm = True, we attempted to use the config parameter to request that no SCM be expected (by using config={"core": {"no_scm": True}}).
However, even though config={"core": {"no_scm": True}} is instructing dvc.api.get_url() to avoid checking for SCM, it still fails with:
dvc.scm.SCMError: /tmp/test_repo is not a git repository
Reproduce
cd into folder under SCM. e.g. /path/to/test_repo
dvc init --subdir
dvc config core.remote s3
dvc remote add -d s3 "s3://fake-bucket/path"
touch test_file.txt
dvc add test_file.txt
- Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/path/to/test_repo/")
print(url)
prints s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e
8. Move repository to untracked folder, mv /path/to/test_repo/ /tmp/
9. Test with python:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/")
print(url)
this raises SCMError: /tmp/test_repo is not a git repository
10. Try to avoid SCM check:
import dvc.api
url = dvc.api.get_url("test_file.txt", "/tmp/test_repo/", config={"core": {"no_scm": True}})
print(url)
this still raises SCMError: /tmp/test_repo is not a git repository
Expected
Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.
Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require git to do so.
Diagnosis and possible fix
During the execution of dvc.api.get_url(), there is a call to Repo.open() to which all provided parameters are passed; including config, as well as two fixed parameters subrepos=True and uninitialized=True.
Repo.repo() then has a call to _get_remote_config(url) which internally calls Repo(url), and this last call tries to find the SCM.
The call to _get_remote_config(url) ignores any parameters being considered by dvc.api.get_url(). Re-establishing these parameters (e.g. calling _get_remote_config(url, *args, **kwargs)) appears to fix the problem (submitting a fix here #10719 ).
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 3.59.1 (conda)
---------------------------
Platform: Python 3.10.13 on macOS-15.3.2-x86_64-i386-64bit
Subprojects:
dvc_data = 3.16.9
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.10
Supports:
gdrive (pydrive2 = 1.15.3),
http (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
https (aiohttp = 3.11.13, aiohttp-retry = 2.8.3),
s3 (s3fs = 2025.3.0, boto3 = 1.37.1),
ssh (sshfs = 2023.4.1)
Additional Information (if any):
Bug Report
Description
When calling
dvcstill attempts to find a Git repository and raises an exception when it doesn't find it.We have a mono repo with
dvcrepositories tracked bygiton different paths. During interactive work and development, users interact withdvcand source control management. In some cases, tests and applications are required to run in an isolated environment that does not containgitinformation; the isolated environment contains all the requireddvcconfiguration and internal files.In such cases, we would like our code to access
dvcinformation programatically with the API, e.g. usingdvc.api.get_url()function to get thes3path to the remote file. Given that the isolated environment no longer depends ongit, but the.dvc/configfile is kept and does not containno_scm = True, we attempted to use theconfigparameter to request that no SCM be expected (by usingconfig={"core": {"no_scm": True}}).However, even though
config={"core": {"no_scm": True}}is instructingdvc.api.get_url()to avoid checking for SCM, it still fails with:Reproduce
cdinto folder under SCM. e.g./path/to/test_repodvc init --subdirdvc config core.remote s3dvc remote add -d s3 "s3://fake-bucket/path"touch test_file.txtdvc add test_file.txtprints
s3://fake-bucket/path/files/md5/d4/1d8cd98f00b204e9800998ecf8427e8. Move repository to untracked folder,
mv /path/to/test_repo/ /tmp/9. Test with python:
this raises
SCMError: /tmp/test_repo is not a git repository10. Try to avoid SCM check:
this still raises
SCMError: /tmp/test_repo is not a git repositoryExpected
Step 10 in the reproduction above should successfully avoid checking for SCM and return the corresponding file path as in step 7.
Step 9 in the reproduction above should likely still return the URL of the file, given that it doesn't require
gitto do so.Diagnosis and possible fix
During the execution of
dvc.api.get_url(), there is a call toRepo.open()to which all provided parameters are passed; includingconfig, as well as two fixed parameterssubrepos=Trueanduninitialized=True.Repo.repo()then has a call to_get_remote_config(url)which internally callsRepo(url), and this last call tries to find the SCM.The call to
_get_remote_config(url)ignores any parameters being considered bydvc.api.get_url(). Re-establishing these parameters (e.g. calling_get_remote_config(url, *args, **kwargs)) appears to fix the problem (submitting a fix here #10719 ).Environment information
Output of
dvc doctor:Additional Information (if any):