Conversation
- next-sitemap 4.2.3 추가 - next-sitemap.config.js: stage/prod 환경 분기, 비공개 경로 exclude 및 robots.txt Disallow 정책 정의 (auth, webview, report, callvan, timetable/modify, clubs/new|edit, lost-item/edit|report, store/review, monitoring) - package.json: yarn build 체인에 next-sitemap 추가 (Yarn 4 는 postbuild 라이프사이클 훅을 자동 실행하지 않음) - .gitignore: public/sitemap*.xml, public/robots.txt 무시 (빌드 산출물) - public/robots.txt: 기존 정적 파일 제거 (빌드 시 자동 생성으로 대체)
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughNext.js 빌드 파이프라인에 next-sitemap을 통합해 환경별 siteUrl을 선택하고 PRIVATE_PATHS를 제외한 sitemap과 robots.txt를 생성하며, 관련 생성물을 .gitignore에 추가합니다. ChangesSitemap 및 Robots.txt 생성 통합
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10분 Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@next-sitemap.config.js`:
- Around line 43-65: The robotsTxtOptions.policies[0].disallow array is missing
specific patterns and non-trailing-slash variants so crawlers can still reach
private routes; update the disallow list used by robotsTxtOptions in
next-sitemap.config.js (and ensure PRIVATE_PATHS is honored) to add the wildcard
patterns '/store/review/edit/*' and '/monitoring/*' and include both trailing
and non-trailing variants for the listed prefixes (e.g., add '/auth' in addition
to '/auth/', '/webview' in addition to '/webview/', and '/report' in addition to
'/report/') so all exact and nested paths are disallowed by the policy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f28896ab-d791-446d-8843-945f31dc1532
⛔ Files ignored due to path filters (6)
.pnp.cjsis excluded by!.pnp.cjsand included by**.yarn/cache/@corex-deepmerge-npm-4.0.43-390ed1175e-c9ac6163e9.zipis excluded by!**/.yarn/**,!**/*.zip,!.yarn/**and included by**.yarn/cache/@next-env-npm-13.5.11-7bdb393acd-2d34ec742e.zipis excluded by!**/.yarn/**,!**/*.zip,!.yarn/**and included by**.yarn/cache/next-sitemap-npm-4.2.3-463254c2cb-8e88c941b5.zipis excluded by!**/.yarn/**,!**/*.zip,!.yarn/**and included by**public/robots.txtis excluded by!public/**and included by**yarn.lockis excluded by!**/yarn.lock,!**/*.lock,!yarn.lockand included by**
📒 Files selected for processing (3)
.gitignorenext-sitemap.config.jspackage.json
| exclude: ['/404', '/_error', '/_app', '/_document', ...PRIVATE_PATHS], | ||
| robotsTxtOptions: { | ||
| policies: [ | ||
| { | ||
| userAgent: '*', | ||
| allow: '/', | ||
| disallow: [ | ||
| '/auth/', | ||
| '/webview/', | ||
| '/report/', | ||
| '/callvan/add', | ||
| '/callvan/chat/', | ||
| '/callvan/notifications', | ||
| '/timetable/modify', | ||
| '/clubs/new', | ||
| '/clubs/edit/', | ||
| '/clubs/recruitment/edit/', | ||
| '/lost-item/edit/', | ||
| '/lost-item/report/', | ||
| '/lost-item/chat', | ||
| '/store/review/', | ||
| '/monitoring', | ||
| ], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "next-sitemap.config.js" -o -name "*sitemap*" -type f | head -20Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 1409
🏁 Script executed:
fd -e js -e ts -e json | grep -i "sitemap\|next.config" | head -20Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 90
🏁 Script executed:
git ls-files | grep -i sitemapRepository: BCSDLab/KOIN_WEB_RECODE
Length of output: 151
🌐 Web query:
next-sitemap exclude robotsTxtOptions disallow behavior configuration
💡 Result:
In next-sitemap, the 'exclude' configuration and 'robotsTxtOptions' (specifically its 'policies' with 'disallow') serve distinct purposes with no direct interaction between them. The 'exclude' option is an array of relative paths (supporting wildcards like '/private/') that prevents specific paths from being included in the generated sitemap XML files (sitemap.xml, sitemap-.xml). It does not affect robots.txt generation.[1][2][3] Separately, 'robotsTxtOptions.policies' defines crawl policies for the generated robots.txt file. Each policy has a 'userAgent', optional 'allow' (string | string[]), and optional 'disallow' (string | string[]) directives. These follow standard robots.txt syntax to instruct crawlers on allowed/disallowed paths. For example: robotsTxtOptions: { policies: [ { userAgent: '', allow: '/', disallow: ['/admin/', '/private/'] // Crawlers should not crawl these paths }, { userAgent: 'bad-bot', disallow: '/' } ] } This generates robots.txt entries like: User-agent: * Allow: / Disallow: /admin/* Disallow: /private/ # bad-bot User-agent: bad-bot Disallow: /[3][2] 'robotsTxtOptions.additionalSitemaps' adds extra sitemap URLs to robots.txt (e.g., custom server-side sitemaps), but these may inadvertently appear in the main sitemap index despite 'exclude', due to known issues (e.g., #823, #481). Workarounds include custom 'transform' functions or disabling index sitemaps.[4][5][3] Disallowing paths in robots.txt does not exclude them from sitemaps; crawlers use sitemaps to discover URLs but respect robots.txt for crawling/indexing. Conversely, excluding from sitemaps hides from discovery but does not prevent crawling if linked elsewhere.[1][6]
Citations:
- 1: https://github.com/iamvishnusankar/next-sitemap/blob/master/packages/next-sitemap/src/interface.ts
- 2: https://www.npmjs.com/package/next-sitemap
- 3: https://github.com/iamvishnusankar/next-sitemap
- 4: not possible to exclude sitemap from "main" sitemap while including it in robots.txt iamvishnusankar/next-sitemap#823
- 5: Adding a sitemap to the robotsTxtOptions.additionalSitemaps list adds it to the sitemap index iamvishnusankar/next-sitemap#481
- 6: https://stackoverflow.com/questions/79455802/how-to-set-up-next-sitemap-to-properly-generate-robots-txt-and-sitemap-xml-for-n
🌐 Web query:
robots.txt trailing slash pattern matching /auth vs /auth/
💡 Result:
In robots.txt path patterns, rules are matched by “path prefix” semantics: a crawler checks whether the URL path starts with the pattern from an Allow/Disallow rule, and the “most specific” (longest matching) rule wins. That means /auth and /auth/ are not equivalent patterns; they differ in what they match around the immediate next character after auth. What matches with Disallow: /auth - It matches any URL path that starts with /auth, including /auth itself and any path that continues with anything after auth (for example /auth/anything and /authfoo). This follows from prefix matching of the path value in Allow/Disallow rules [1]. What matches with Disallow: /auth/ - It matches only URL paths that start with /auth/ (note the trailing slash), i.e., it matches /auth/ and subpaths like /auth/login, but it does NOT match /auth (the version without the slash), because that path does not start with /auth/. This is consistent with the example prefix matching behavior and the “longest specific match” rule in RFC 9309 [2]. Practical guidance - If you intend to block a directory and its contents, the standard approach is to include the trailing slash in the directory path pattern (e.g., Disallow: /auth/) [3]. - If you instead use Disallow: /auth, you may accidentally block URLs you didn’t mean to (e.g., /auth.html-like paths or other paths that merely begin with /auth but aren’t under /auth/). This risk is explicitly noted for prefix-style matching behavior in explanations of robots.txt path matching [4]. Answer to your specific comparison - Disallow: /auth blocks both /auth and /auth/… (and also anything else that has prefix /auth). - Disallow: /auth/ blocks /auth/… but does not block /auth (no trailing slash).
Citations:
- 1: https://datatracker.ietf.org/doc/html/rfc9309
- 2: https://rfc-editor.org/rfc/rfc9309.txt
- 3: https://developers.google.com/crawling/docs/robots-txt/useful-robots-txt-rules
- 4: https://stackoverflow.com/questions/72105880/should-i-use-a-trailing-slash-when-disallowing-a-directory-in-robots-txt
🏁 Script executed:
cat -n ./next-sitemap.config.jsRepository: BCSDLab/KOIN_WEB_RECODE
Length of output: 2207
🏁 Script executed:
wc -l ./next-sitemap.config.jsRepository: BCSDLab/KOIN_WEB_RECODE
Length of output: 95
robots.txt disallow 설정이 누락되어 크롤러가 비공개 경로에 접근할 수 있습니다.
robots.txt의 disallow 목록(49-65줄)에서 다음 경로들이 누락되었습니다:
/store/review/edit/*(PRIVATE_PATHS에는 있음)/monitoring/*(PRIVATE_PATHS에는 있음)
또한 /auth/, /webview/, /report/는 trailing slash 규칙으로만 설정되어 /auth, /webview, /report 정확 경로는 차단되지 않습니다. 크롤러가 이 경로들에 접근할 수 있게 됩니다.
🔧 제안 수정안
disallow: [
- '/auth/',
- '/webview/',
- '/report/',
+ '/auth',
+ '/webview',
+ '/report',
'/callvan/add',
- '/callvan/chat/',
+ '/callvan/chat',
'/callvan/notifications',
'/timetable/modify',
'/clubs/new',
- '/clubs/edit/',
+ '/clubs/edit',
- '/clubs/recruitment/edit/',
+ '/clubs/recruitment/edit',
- '/lost-item/edit/',
- '/lost-item/report/',
+ '/lost-item/edit',
+ '/lost-item/report',
'/lost-item/chat',
- '/store/review/',
+ '/store/review',
+ '/store/review/edit',
+ '/monitoring',🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@next-sitemap.config.js` around lines 43 - 65, The
robotsTxtOptions.policies[0].disallow array is missing specific patterns and
non-trailing-slash variants so crawlers can still reach private routes; update
the disallow list used by robotsTxtOptions in next-sitemap.config.js (and ensure
PRIVATE_PATHS is honored) to add the wildcard patterns '/store/review/edit/*'
and '/monitoring/*' and include both trailing and non-trailing variants for the
listed prefixes (e.g., add '/auth' in addition to '/auth/', '/webview' in
addition to '/webview/', and '/report' in addition to '/report/') so all exact
and nested paths are disallowed by the policy.
| '/lost-item/report', | ||
| '/lost-item/chat', | ||
| '/store/review', | ||
| '/monitoring', |
What is this PR? 🔍
Changes 📝
삭제
Disallow 대상
ScreenShot 📷
N/A
Test CheckList ✅
Precaution
✔️ Please check if the PR fulfills these requirements
developbranch unconditionally?main?yarn lintSummary by CodeRabbit