Skip to content

Commit 60d4de5

Browse files
workspace import-dir: default-exclude .git, .databricks, node_modules (#5118)
## Summary `databricks workspace import-dir` walks the source tree and copies every entry into the workspace verbatim — it has no awareness of `.gitignore` or default exclusions. This change adds a name-based skip for `.git`, `.databricks`, and `node_modules` directories during the walk. `.gitignore` and other dotfiles at the root remain copied. If a user explicitly passes `.git` (or any of the others) as the source root, that root is still copied — the skip rule applies to entries encountered during recursion. ## Motivation: align `import-dir` with `sync`'s existing defaults `databricks sync` already hard-codes skips for the same two directories that cause the most trouble: - `libs/git/repository.go` — `// Always ignore root .git directory.` adds `.git` to the default ignore rules unconditionally. - `libs/git/view.go` (`SetupDefaults`) — `// Hard code .databricks ignore pattern so that we never sync it (irrespective of .gitignore patterns)`. So `sync` and `import-dir` currently produce different workspace contents for the same source tree: `sync` skips `.git/` and `.databricks/`, `import-dir` copies them. This PR closes that gap for `import-dir` so the two commands behave consistently. `node_modules` is the one entry that goes beyond what `sync` does by default. For any project with a typical `.gitignore`, `sync` would already skip it via gitignore rules; `import-dir` ignores `.gitignore` entirely, so adding it to the name-based skip list keeps the behavior aligned with what users get from `sync`. ## Why this matters in practice `databricks workspace import-dir` is commonly reached for as the inverse of `databricks workspace export-dir`. Without these defaults, the imported tree carries: 1. The local repo's `.git/` directory, including its config and history. 2. The local `.databricks/` bundle cache, which can clobber state that bundle commands maintain remotely. 3. `node_modules/` for JS/TS projects — large, slow to upload, and recreated by the runtime install step anyway. The canonical answer is `databricks sync`, which respects `.gitignore` and already excludes the first two by default. This PR is not a substitute for `sync` — it just brings `import-dir`'s defaults into line for users who reach for it anyway. ## Test plan - [x] Unit tests covering: root `.git/` skipped, nested `.git/` skipped, `.databricks/` skipped, `node_modules/` skipped, `.gitignore` file kept, explicit `.git` root copied (escape hatch). - [x] `go test ./cmd/workspace/workspace/` — pass - [x] `golangci-lint run ./cmd/workspace/workspace/` — clean - [ ] Existing integration `TestImportDir` — unchanged, no `.git` in its testdata so behavior is identical. This pull request and its description were written by Isaac. --------- Co-authored-by: simon <4305831+simonfaltum@users.noreply.github.com> Co-authored-by: simon <simon.faltum@databricks.com>
1 parent 209fa87 commit 60d4de5

3 files changed

Lines changed: 186 additions & 0 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
* Added `databricks aitools` command group for installing Databricks skills into your coding agents (Claude Code, Cursor, Codex CLI, OpenCode, GitHub Copilot, Antigravity). Skills are fetched from [github.com/databricks/databricks-agent-skills](https://github.com/databricks/databricks-agent-skills) and either symlinked into each agent's skills directory or copied into the current project. Use `databricks aitools install` to set up, `update` to pull newer versions, `list` to see what's available, and `uninstall` to remove them. Pick where they go with `--scope=project|global` (`--scope=both` is accepted on `update` and `list`).
1212
* `[__settings__].default_profile` is now consulted as a fallback by `databricks api`, `databricks auth token`, and bundle commands when neither `--profile` nor `DATABRICKS_CONFIG_PROFILE` is set. `databricks auth token` continues to give precedence to `DATABRICKS_HOST` over `default_profile`. For bundle commands, `default_profile` only applies when the bundle does not pin its own `workspace.host`.
13+
* `databricks workspace import-dir` now skips `.git`, `.databricks`, and `node_modules` directories during recursive imports. To import one of these directories deliberately, pass it as `SOURCE_PATH` ([#5118](https://github.com/databricks/cli/pull/5118)).
1314
* `databricks postgres create-role --help` now documents the `--json` body shape and rejects the common mistake of wrapping the body in `{"role": ...}` client-side with a hint pointing at the correct shape ([#5111](https://github.com/databricks/cli/pull/5111)).
1415

1516
### Bundles

cmd/workspace/workspace/import_dir.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,22 @@ type importDirOptions struct {
2323
overwrite bool
2424
}
2525

26+
// defaultSkipDirs are directory names skipped when walking the source tree.
27+
// The previous behavior copied these verbatim into the workspace, which:
28+
// - leaks .git/config (often containing template-repo origin URLs and
29+
// occasionally cached credentials) into deployed app source trees
30+
// - copies the local bundle cache (.databricks) on top of any remote one
31+
// - uploads node_modules/ for JS/TS apps, which is large and gets
32+
// reinstalled in the runtime anyway
33+
//
34+
// Reported as DEPLOY-04 #2 in the EMEA Apps gaps doc; users have been
35+
// working around it by post-deploy scrubbing scripts.
36+
var defaultSkipDirs = map[string]struct{}{
37+
".git": {},
38+
".databricks": {},
39+
"node_modules": {},
40+
}
41+
2642
// The callback function imports the file specified at sourcePath. This function is
2743
// meant to be used in conjunction with fs.WalkDir
2844
//
@@ -48,6 +64,15 @@ func (opts importDirOptions) callback(ctx context.Context, workspaceFiler filer.
4864
return err
4965
}
5066

67+
// Skip default-excluded directories (e.g. .git, .databricks). The check
68+
// excludes the explicit root so a user who passes ".git" as the source
69+
// can still copy it deliberately.
70+
if d.IsDir() && sourcePath != sourceDir {
71+
if _, skip := defaultSkipDirs[d.Name()]; skip {
72+
return fs.SkipDir
73+
}
74+
}
75+
5176
// localName is the name for the file in the local file system
5277
localName, err := filepath.Rel(sourceDir, sourcePath)
5378
if err != nil {
@@ -117,6 +142,10 @@ func newImportDir() *cobra.Command {
117142
cmd.Long = `
118143
Import a directory recursively from the local file system to a Databricks workspace.
119144
Notebooks will have their extensions (one of .scala, .py, .sql, .ipynb, .r) stripped
145+
146+
By default, .git, .databricks, and node_modules directories encountered during
147+
the recursive import are skipped. To import one of these directories deliberately,
148+
pass it as SOURCE_PATH.
120149
`
121150

122151
cmd.Annotations = make(map[string]string)
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
package workspace
2+
3+
import (
4+
"context"
5+
"io"
6+
"io/fs"
7+
"os"
8+
"path/filepath"
9+
"slices"
10+
"testing"
11+
12+
"github.com/databricks/cli/libs/cmdio"
13+
"github.com/databricks/cli/libs/filer"
14+
"github.com/databricks/cli/libs/flags"
15+
"github.com/stretchr/testify/assert"
16+
"github.com/stretchr/testify/require"
17+
)
18+
19+
// recordingFiler captures Mkdir and Write calls so a test can assert which
20+
// paths were visited by the import-dir walker.
21+
type recordingFiler struct {
22+
dirs []string
23+
files []string
24+
}
25+
26+
func (r *recordingFiler) Mkdir(ctx context.Context, p string) error {
27+
r.dirs = append(r.dirs, p)
28+
return nil
29+
}
30+
31+
func (r *recordingFiler) Write(ctx context.Context, p string, reader io.Reader, mode ...filer.WriteMode) error {
32+
r.files = append(r.files, p)
33+
return nil
34+
}
35+
36+
func (r *recordingFiler) Read(ctx context.Context, p string) (io.ReadCloser, error) {
37+
return nil, fs.ErrNotExist
38+
}
39+
40+
func (r *recordingFiler) Delete(ctx context.Context, p string, mode ...filer.DeleteMode) error {
41+
return nil
42+
}
43+
44+
func (r *recordingFiler) ReadDir(ctx context.Context, p string) ([]fs.DirEntry, error) {
45+
return nil, fs.ErrNotExist
46+
}
47+
48+
func (r *recordingFiler) Stat(ctx context.Context, name string) (fs.FileInfo, error) {
49+
return nil, fs.ErrNotExist
50+
}
51+
52+
func writeFile(t *testing.T, root, rel, contents string) {
53+
t.Helper()
54+
full := filepath.Join(root, rel)
55+
require.NoError(t, os.MkdirAll(filepath.Dir(full), 0o755))
56+
require.NoError(t, os.WriteFile(full, []byte(contents), 0o644))
57+
}
58+
59+
func runWalk(t *testing.T, sourceDir string) *recordingFiler {
60+
t.Helper()
61+
rec := &recordingFiler{}
62+
ctx := cmdio.InContext(t.Context(),
63+
cmdio.NewIO(t.Context(), flags.OutputText, nil, io.Discard, io.Discard, "", ""))
64+
opts := importDirOptions{sourceDir: sourceDir, targetDir: "/Workspace/x", overwrite: true}
65+
cb := opts.callback(ctx, rec)
66+
require.NoError(t, filepath.WalkDir(sourceDir, cb))
67+
return rec
68+
}
69+
70+
func TestImportDirSkipsGitDirectory(t *testing.T) {
71+
src := t.TempDir()
72+
writeFile(t, src, "app.py", "print('hi')")
73+
writeFile(t, src, ".git/config", "[remote]\n url = git@github.com:org/template.git")
74+
writeFile(t, src, ".git/HEAD", "ref: refs/heads/main")
75+
writeFile(t, src, ".git/objects/abc123", "binary")
76+
77+
rec := runWalk(t, src)
78+
79+
slices.Sort(rec.files)
80+
assert.Equal(t, []string{"app.py"}, rec.files)
81+
for _, d := range rec.dirs {
82+
assert.NotContains(t, d, ".git", "no .git directory should be created in the workspace")
83+
}
84+
}
85+
86+
func TestImportDirSkipsNestedGitDirectory(t *testing.T) {
87+
src := t.TempDir()
88+
writeFile(t, src, "app.py", "print('hi')")
89+
writeFile(t, src, "vendor/sub/.git/config", "[remote]\n url = ...")
90+
writeFile(t, src, "vendor/sub/lib.py", "def f(): pass")
91+
92+
rec := runWalk(t, src)
93+
94+
slices.Sort(rec.files)
95+
assert.Equal(t, []string{"app.py", filepath.ToSlash("vendor/sub/lib.py")}, rec.files)
96+
for _, d := range rec.dirs {
97+
assert.NotContains(t, d, ".git")
98+
}
99+
}
100+
101+
func TestImportDirSkipsDatabricksCacheDirectory(t *testing.T) {
102+
src := t.TempDir()
103+
writeFile(t, src, "databricks.yml", "bundle:\n name: x")
104+
writeFile(t, src, ".databricks/bundle/state.json", "{}")
105+
106+
rec := runWalk(t, src)
107+
108+
slices.Sort(rec.files)
109+
assert.Equal(t, []string{"databricks.yml"}, rec.files)
110+
for _, d := range rec.dirs {
111+
assert.NotContains(t, d, ".databricks")
112+
}
113+
}
114+
115+
func TestImportDirSkipsNodeModulesDirectory(t *testing.T) {
116+
src := t.TempDir()
117+
writeFile(t, src, "package.json", "{}")
118+
writeFile(t, src, "app.js", "console.log('hi')")
119+
writeFile(t, src, "node_modules/react/index.js", "module.exports = {}")
120+
writeFile(t, src, "node_modules/.package-lock.json", "{}")
121+
122+
rec := runWalk(t, src)
123+
124+
slices.Sort(rec.files)
125+
assert.Equal(t, []string{"app.js", "package.json"}, rec.files)
126+
for _, d := range rec.dirs {
127+
assert.NotContains(t, d, "node_modules")
128+
}
129+
}
130+
131+
func TestImportDirCopiesGitignoreFile(t *testing.T) {
132+
src := t.TempDir()
133+
writeFile(t, src, ".gitignore", "*.pyc\n")
134+
writeFile(t, src, "app.py", "print('hi')")
135+
136+
rec := runWalk(t, src)
137+
138+
slices.Sort(rec.files)
139+
assert.Equal(t, []string{".gitignore", "app.py"}, rec.files)
140+
}
141+
142+
func TestImportDirAllowsExplicitGitRoot(t *testing.T) {
143+
// If a user explicitly passes a .git directory as the source root, copy
144+
// it: the skip rule applies to .git dirs encountered during the walk,
145+
// not to a deliberately-named root.
146+
src := t.TempDir()
147+
gitRoot := filepath.Join(src, ".git")
148+
require.NoError(t, os.MkdirAll(gitRoot, 0o755))
149+
writeFile(t, gitRoot, "HEAD", "ref: refs/heads/main")
150+
writeFile(t, gitRoot, "config", "[core]\n")
151+
152+
rec := runWalk(t, gitRoot)
153+
154+
slices.Sort(rec.files)
155+
assert.Equal(t, []string{"HEAD", "config"}, rec.files)
156+
}

0 commit comments

Comments
 (0)