Skip to content

Commit 7ee64bb

Browse files
fix: clean up leftover gh-pages branch files on deploy (#204) (#207)
* fix: clean up leftover gh-pages branch files (dotfiles, submodules) gh-pages@6.3.0's "Removing files" step calls globby without `dot: true`, so dotfiles (.gitignore, .gitmodules, .github/) and submodule gitlinks from the gh-pages branch are not removed before our dist is copied on top. They then get re-committed and leak into the deploy. Upstream fix tschaub/gh-pages#612 (merged 2025-08-09) adds both `remove: '**/*'` and `dot: true`, but is unreleased as of v6.3.0. We can't reach the globby call from options alone. Workaround: register a `beforeAdd` hook that runs after gh-pages' broken remove + our file copy. The hook asks git what it still has indexed (`git ls-files -z`), diffs against the set of files in our dist, and `git rm`s the leftovers. `git rm` handles submodule gitlinks correctly, so the `build` gitlink from the reporter's scenario is also cleaned up. Skipped when the user opts into `add: true` (additive mode explicitly preserves existing files). Adds a spawn-level regression test in engine.gh-pages-behavior.spec.ts that mocks `git ls-files` to return leftover dotfiles + a submodule gitlink, runs engine.run() end-to-end through real gh-pages, and asserts `git rm` targets the leftovers while leaving dist files alone. Fixes #204 * test: add end-to-end real-git integration test for #204 cleanup Prior tests covered the cleanup hook at the spawn-mock level only — they verified that the right git commands get issued, but not that the resulting gh-pages commit is actually clean. Add a real-filesystem, real-git test that: 1. Creates a local bare repo. 2. Seeds a gh-pages branch containing the exact conditions from #204: dotfiles (.gitignore, .gitmodules), nested dot-directory contents (.github/workflows/deploy.yml), a submodule gitlink, and a stale non-dot file. 3. Runs engine.run() (no mocks of child_process, fs, or gh-pages). 4. Inspects `git ls-tree -r gh-pages` on the bare repo and asserts the final tree contains ONLY index.html. Also adds a baseline test that calls gh-pages.publish() directly (bypassing our hook) and asserts the dotfiles + submodule DO leak — i.e. the upstream bug is real on the installed gh-pages version. If a future gh-pages release fixes the bug, that baseline test will fail as a clear signal that our workaround can be removed. No production changes; the hook implementation and its PR #206-era callback wrapper are exercised end-to-end by the new tests.
1 parent ada6fa9 commit 7ee64bb

7 files changed

Lines changed: 367 additions & 6 deletions

src/engine/engine.gh-pages-behavior.spec.ts

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ function createMockChildProcess(): Partial<ChildProcess> {
5353
* This is intentional - we want to know about any new git operations.
5454
*/
5555
const EXPECTED_GIT_COMMANDS = [
56-
'clone', 'clean', 'fetch', 'checkout', 'ls-remote', 'reset',
56+
'clone', 'clean', 'fetch', 'checkout', 'ls-remote', 'ls-files', 'reset',
5757
'rm', 'add', 'config', 'diff-index', 'commit', 'tag', 'push', 'update-ref'
5858
];
5959

@@ -1066,4 +1066,87 @@ describe('gh-pages v6.3.0 - behavioral snapshot', () => {
10661066
expect(spawnCalls[0]?.args[0]).toBe('clone');
10671067
});
10681068
});
1069+
1070+
/**
1071+
* End-to-end regression for issue #204: gh-pages@6.3.0's "Removing files" step
1072+
* skips dotfiles and submodule gitlinks. Our beforeAdd hook should catch them
1073+
* via `git ls-files` and `git rm` the leftovers while leaving dist files alone.
1074+
*/
1075+
describe('submodule / dotfile cleanup regression (issue #204)', () => {
1076+
const engine = require('./engine');
1077+
const { cleanupMonkeypatch } = require('./engine.prepare-options-helpers');
1078+
const { logging } = require('@angular-devkit/core');
1079+
1080+
beforeEach(() => {
1081+
cleanupMonkeypatch();
1082+
});
1083+
1084+
it('engine.run() should git rm leftover gh-pages branch files not present in dist', async () => {
1085+
const repo = 'https://github.com/owner/leftover-test.git';
1086+
currentTestContext.repo = repo;
1087+
1088+
const leftoverFiles = [
1089+
'.github/workflows/deploy.yml',
1090+
'.gitignore',
1091+
'.gitmodules',
1092+
'build' // submodule gitlink
1093+
];
1094+
1095+
// Reconfigure the default mockSpawn so `git ls-files` returns our leftover set.
1096+
// Other git commands keep the outer describe's default behavior.
1097+
mockSpawn.mockImplementation((cmd: string, args: string[] | undefined, opts: unknown) => {
1098+
const capturedArgs = args || [];
1099+
spawnCalls.push({ cmd, args: capturedArgs, options: opts });
1100+
1101+
if (cmd === 'git' && capturedArgs[0] && !EXPECTED_GIT_COMMANDS.includes(capturedArgs[0])) {
1102+
throw new Error(`Unexpected git command: ${capturedArgs[0]}. Add to whitelist if intentional.`);
1103+
}
1104+
1105+
const child = createMockChildProcess();
1106+
setImmediate(() => {
1107+
let output = '';
1108+
if (cmd === 'git' && capturedArgs[0] === 'ls-files') {
1109+
output = leftoverFiles.join('\0') + '\0';
1110+
} else if (cmd === 'git' && capturedArgs[0] === 'config' &&
1111+
capturedArgs[1] === '--get' && capturedArgs[2]?.startsWith('remote.')) {
1112+
output = repo;
1113+
} else if (cmd === 'git' && capturedArgs[0] === 'ls-remote') {
1114+
output = 'refs/heads/gh-pages';
1115+
} else if (cmd === 'git' && capturedArgs[0] === 'diff-index') {
1116+
child.emit!('close', 1);
1117+
return;
1118+
}
1119+
child.stdout!.emit('data', Buffer.from(output));
1120+
child.emit!('close', 0);
1121+
});
1122+
return child;
1123+
});
1124+
1125+
const options = {
1126+
repo,
1127+
branch: 'gh-pages',
1128+
dotfiles: true,
1129+
notfound: false,
1130+
nojekyll: false
1131+
};
1132+
1133+
await engine.run(basePath, options, new logging.NullLogger());
1134+
1135+
// Our hook should have issued `git ls-files -z`.
1136+
const lsFilesCall = spawnCalls.find(c => c.cmd === 'git' && c.args[0] === 'ls-files');
1137+
expect(lsFilesCall).toBeDefined();
1138+
expect(lsFilesCall!.args).toContain('-z');
1139+
1140+
// And a subsequent `git rm` that targets exactly the leftovers.
1141+
const rmCalls = spawnCalls.filter(c => c.cmd === 'git' && c.args[0] === 'rm');
1142+
const cleanupCall = rmCalls.find(c => leftoverFiles.every(f => c.args.includes(f)));
1143+
expect(cleanupCall).toBeDefined();
1144+
1145+
// Dist files (from the outer beforeAll: index.html, main.js, styles.css, .htaccess)
1146+
// must NOT be in the cleanup call — those are files we just copied.
1147+
for (const distFile of ['index.html', 'main.js', 'styles.css', '.htaccess']) {
1148+
expect(cleanupCall!.args).not.toContain(distFile);
1149+
}
1150+
});
1151+
});
10691152
});
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
/**
2+
* End-to-end integration test for issue #204 — REAL git, REAL gh-pages, REAL filesystem.
3+
*
4+
* No mocks. This test proves the `beforeAdd` cleanup hook actually produces a clean
5+
* gh-pages commit when the branch previously contained dotfiles and submodule gitlinks.
6+
*
7+
* Flow:
8+
* 1. Create a local bare git repo (serves as "remote").
9+
* 2. Seed a gh-pages branch containing:
10+
* - .github/workflows/deploy.yml (dot-directory, nested)
11+
* - .gitignore (dotfile)
12+
* - .gitmodules (dotfile)
13+
* - a submodule gitlink `build` (mode 160000)
14+
* - stale.html (regular file gh-pages' own remove would catch)
15+
* 3. Run `engine.run()` against the bare repo with a dist containing only index.html.
16+
* 4. `git ls-tree -r gh-pages` on the bare repo → assert ONLY `index.html` landed.
17+
*
18+
* A companion test demonstrates the upstream bug itself by calling `gh-pages.publish()`
19+
* directly (bypassing our hook) and observing the leftovers leak into the commit. That
20+
* test will start failing once tschaub/gh-pages ships PR #612 in a release — which is
21+
* fine; it's the signal that our workaround can be removed.
22+
*/
23+
24+
import * as path from 'path';
25+
import * as fs from 'fs/promises';
26+
import * as os from 'os';
27+
import { execSync } from 'child_process';
28+
29+
import { logging } from '@angular-devkit/core';
30+
31+
import * as engine from './engine';
32+
import { cleanupMonkeypatch } from './engine.prepare-options-helpers';
33+
34+
// NO MOCKS — we want real gh-pages behavior end-to-end
35+
const ghPages = require('gh-pages');
36+
37+
interface GitOptions {
38+
readonly cwd: string;
39+
}
40+
41+
function git(args: string, options: GitOptions): string {
42+
return execSync(`git -C "${options.cwd}" ${args}`, { stdio: ['pipe', 'pipe', 'pipe'] })
43+
.toString()
44+
.trim();
45+
}
46+
47+
async function seedGhPagesBranch(workDir: string, bareRepoPath: string): Promise<void> {
48+
await fs.mkdir(workDir, { recursive: true });
49+
execSync(`git init "${workDir}"`, { stdio: 'pipe' });
50+
git('config user.email "seed@test.com"', { cwd: workDir });
51+
git('config user.name "Seed"', { cwd: workDir });
52+
git('checkout -b gh-pages', { cwd: workDir });
53+
54+
// Dotfiles and dot-directory (what gh-pages' broken remove step misses)
55+
await fs.mkdir(path.join(workDir, '.github', 'workflows'), { recursive: true });
56+
await fs.writeFile(path.join(workDir, '.github', 'workflows', 'deploy.yml'), 'name: deploy\n');
57+
await fs.writeFile(path.join(workDir, '.gitignore'), 'node_modules\n');
58+
await fs.writeFile(
59+
path.join(workDir, '.gitmodules'),
60+
'[submodule "build"]\n\tpath = build\n\turl = https://example.invalid/build.git\n'
61+
);
62+
// A regular non-dot file — gh-pages' own remove step WILL catch this one.
63+
await fs.writeFile(path.join(workDir, 'stale.html'), '<html>stale</html>\n');
64+
65+
git('add .github .gitignore .gitmodules stale.html', { cwd: workDir });
66+
git('commit -m "seed dotfiles and stale content"', { cwd: workDir });
67+
68+
// Add a submodule gitlink without needing a real subrepo.
69+
// update-index --cacheinfo creates the 160000 entry directly in the index.
70+
const seedSha = git('rev-parse HEAD', { cwd: workDir });
71+
git(`update-index --add --cacheinfo 160000,${seedSha},build`, { cwd: workDir });
72+
git('commit -m "add submodule gitlink"', { cwd: workDir });
73+
74+
git(`remote add origin "${bareRepoPath}"`, { cwd: workDir });
75+
git('push origin gh-pages', { cwd: workDir });
76+
}
77+
78+
describe('end-to-end cleanup regression (issue #204, real git)', () => {
79+
let tempDir: string;
80+
let distDir: string;
81+
let bareRepoPath: string;
82+
let originalEnv: NodeJS.ProcessEnv;
83+
84+
// Unique suffix so parallel-ish reruns never collide in /tmp.
85+
const testRunId = `${Date.now()}-${Math.random().toString(36).substring(7)}`;
86+
87+
beforeEach(async () => {
88+
cleanupMonkeypatch();
89+
90+
// Isolate from ambient CI envs and tokens so engine.prepareOptions doesn't
91+
// rewrite the repo URL or append CI metadata during tests.
92+
originalEnv = { ...process.env };
93+
delete process.env.TRAVIS;
94+
delete process.env.CIRCLECI;
95+
delete process.env.GITHUB_ACTIONS;
96+
delete process.env.GH_TOKEN;
97+
delete process.env.PERSONAL_TOKEN;
98+
delete process.env.GITHUB_TOKEN;
99+
100+
tempDir = await fs.mkdtemp(path.join(os.tmpdir(), `ghp-204-${testRunId}-`));
101+
102+
distDir = path.join(tempDir, 'dist');
103+
await fs.mkdir(distDir, { recursive: true });
104+
await fs.writeFile(path.join(distDir, 'index.html'), '<html>fresh dist</html>\n');
105+
106+
bareRepoPath = path.join(tempDir, 'bare.git');
107+
execSync(`git init --bare "${bareRepoPath}"`, { stdio: 'pipe' });
108+
109+
await seedGhPagesBranch(path.join(tempDir, 'seed'), bareRepoPath);
110+
111+
// Clear gh-pages' cache so each test starts from a clean clone.
112+
ghPages.clean();
113+
});
114+
115+
afterEach(async () => {
116+
ghPages.clean();
117+
cleanupMonkeypatch();
118+
process.env = originalEnv;
119+
await fs.rm(tempDir, { recursive: true, force: true });
120+
});
121+
122+
it('engine.run() produces a gh-pages commit containing ONLY dist files (no dotfiles, no submodule)', async () => {
123+
await engine.run(
124+
distDir,
125+
{
126+
repo: bareRepoPath,
127+
branch: 'gh-pages',
128+
dotfiles: true,
129+
// Keep assertions simple: don't let our own 404.html / .nojekyll machinery
130+
// add files to the expected set.
131+
notfound: false,
132+
nojekyll: false,
133+
name: 'Test',
134+
email: 'test@test.com',
135+
message: 'test deploy'
136+
},
137+
new logging.NullLogger()
138+
);
139+
140+
// Inspect what actually landed on gh-pages in the bare repo.
141+
const tree = git('ls-tree -r gh-pages --name-only', { cwd: bareRepoPath })
142+
.split('\n')
143+
.filter(Boolean)
144+
.sort();
145+
146+
expect(tree).toEqual(['index.html']);
147+
148+
// Explicit negative checks — makes regressions very readable on failure.
149+
expect(tree).not.toContain('.gitignore');
150+
expect(tree).not.toContain('.gitmodules');
151+
expect(tree).not.toContain('.github/workflows/deploy.yml');
152+
expect(tree).not.toContain('build');
153+
expect(tree).not.toContain('stale.html');
154+
}, 30_000);
155+
156+
it('baseline: without our hook, gh-pages alone leaks dotfiles and submodule gitlinks (demonstrates the upstream bug)', async () => {
157+
// Call gh-pages.publish() directly — no engine.run(), no beforeAdd hook.
158+
// This is exactly what angular-cli-ghpages v3 did before our fix.
159+
await new Promise<void>((resolve, reject) => {
160+
ghPages.publish(
161+
distDir,
162+
{
163+
repo: bareRepoPath,
164+
branch: 'gh-pages',
165+
dotfiles: true,
166+
user: { name: 'Test', email: 'test@test.com' },
167+
message: 'baseline: unhooked gh-pages publish'
168+
},
169+
(err: Error | null) => {
170+
if (err) reject(err);
171+
else resolve();
172+
}
173+
);
174+
});
175+
176+
const tree = git('ls-tree -r gh-pages --name-only', { cwd: bareRepoPath })
177+
.split('\n')
178+
.filter(Boolean);
179+
180+
// gh-pages' broken remove step missed all of these:
181+
expect(tree).toContain('.gitignore');
182+
expect(tree).toContain('.gitmodules');
183+
expect(tree).toContain('.github/workflows/deploy.yml');
184+
expect(tree).toContain('build');
185+
// Our dist file did land:
186+
expect(tree).toContain('index.html');
187+
// And the non-dot stale file DID get removed by gh-pages' (partial) remove step:
188+
expect(tree).not.toContain('stale.html');
189+
}, 30_000);
190+
});

src/engine/engine.prepare-options-helpers.ts

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
*/
77

88
import { logging } from '@angular-devkit/core';
9+
import * as fs from 'fs/promises';
10+
import * as path from 'path';
911
import * as util from 'util';
1012

1113
import { Schema } from '../deploy/schema';
@@ -225,6 +227,86 @@ export async function injectTokenIntoRepoUrl(options: PreparedOptions): Promise<
225227
}
226228
}
227229

230+
/**
231+
* Minimal subset of the gh-pages internal Git class that our cleanup hook relies on.
232+
* See src/node_modules/gh-pages/lib/git.js — `Git.prototype.exec`, `Git.prototype.rm`, `cwd`, `output`.
233+
*/
234+
export interface GhPagesGit {
235+
cwd: string;
236+
output: string;
237+
exec(...args: string[]): Promise<GhPagesGit>;
238+
rm(files: string[]): Promise<GhPagesGit>;
239+
}
240+
241+
/**
242+
* Recursively walk a directory and return the set of relative file paths,
243+
* using POSIX separators so the output matches what `git ls-files` prints.
244+
* Honors the same dotfile-inclusion semantics as gh-pages' `options.dotfiles`.
245+
*/
246+
export async function collectDistFiles(
247+
baseDir: string,
248+
includeDotfiles: boolean
249+
): Promise<Set<string>> {
250+
const files = new Set<string>();
251+
252+
async function walk(relDir: string): Promise<void> {
253+
const fullDir = relDir ? path.join(baseDir, relDir) : baseDir;
254+
const entries = await fs.readdir(fullDir, { withFileTypes: true });
255+
for (const entry of entries) {
256+
if (!includeDotfiles && entry.name.startsWith('.')) {
257+
continue;
258+
}
259+
const relPath = relDir ? `${relDir}/${entry.name}` : entry.name;
260+
if (entry.isDirectory()) {
261+
await walk(relPath);
262+
} else if (entry.isFile()) {
263+
files.add(relPath);
264+
}
265+
}
266+
}
267+
268+
await walk('');
269+
return files;
270+
}
271+
272+
/**
273+
* Create a `beforeAdd` hook that removes leftover files from the gh-pages branch
274+
* before gh-pages stages our dist for commit.
275+
*
276+
* Why this exists (issue #204):
277+
* gh-pages@6.3.0's "Removing files" step calls globby without `dot: true`, so
278+
* dotfiles (.gitignore, .gitmodules, .github/…) and submodule gitlinks from the
279+
* gh-pages branch are NOT removed before our dist is copied on top. They then
280+
* get re-committed and leak into the deploy.
281+
*
282+
* Fix: after gh-pages' broken remove + our file copy, ask git what it still has
283+
* indexed (`git ls-files -z`), diff against the set of files in our dist, and
284+
* `git rm` the leftovers. `git rm` correctly handles submodule gitlinks too.
285+
*
286+
* Upstream fix: tschaub/gh-pages#612 (merged 2025-08-09, unreleased as of
287+
* gh-pages@6.3.0). When a release containing that PR lands, this hook becomes
288+
* redundant and can be removed.
289+
*/
290+
export function createCleanupBeforeAddHook(
291+
distDir: string,
292+
dotfiles: boolean,
293+
logger: logging.LoggerApi
294+
): (git: GhPagesGit) => Promise<void> {
295+
return async (git) => {
296+
const distFiles = await collectDistFiles(distDir, dotfiles);
297+
await git.exec('ls-files', '-z');
298+
const tracked = (git.output || '').split('\0').filter(Boolean);
299+
const toRemove = tracked.filter((f) => !distFiles.has(f));
300+
if (toRemove.length === 0) {
301+
return;
302+
}
303+
logger.info(
304+
`Removing ${toRemove.length} leftover file(s) from gh-pages branch not in dist: ${toRemove.join(', ')}`
305+
);
306+
await git.rm(toRemove);
307+
};
308+
}
309+
228310
/**
229311
* Get the remote URL from the git repository
230312
*

0 commit comments

Comments
 (0)