-
Notifications
You must be signed in to change notification settings - Fork 3
[공통] sitemap.xml 자동 생성 및 robots.txt 강화 #1258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dooohun
wants to merge
3
commits into
develop
Choose a base branch
from
feat/#1256/seo-sitemap
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| /** | ||
| * next-sitemap 설정 | ||
| * 빌드 후 postbuild 스크립트로 실행되어 public/sitemap*.xml 과 public/robots.txt 를 생성한다. | ||
| * stage/prod 환경 분기는 NEXT_PUBLIC_API_PATH 에 'stage' 포함 여부로 판단 (src/static/url.ts 와 동일 규칙). | ||
| */ | ||
|
|
||
| const IS_STAGE = process.env.NEXT_PUBLIC_API_PATH?.includes('stage'); | ||
| const SITE_URL = IS_STAGE ? 'https://stage.koreatech.in' : 'https://koreatech.in'; | ||
|
|
||
| const PRIVATE_PATHS = [ | ||
| '/auth', | ||
| '/auth/*', | ||
| '/webview/*', | ||
| '/report/*', | ||
| '/callvan/add', | ||
| '/callvan/chat/*', | ||
| '/callvan/notifications', | ||
| '/callvan/*/participants', | ||
| '/callvan/*/report/*', | ||
| '/timetable/modify', | ||
| '/clubs/new', | ||
| '/clubs/edit/*', | ||
| '/clubs/recruitment/edit/*', | ||
| '/clubs/*/event/edit/*', | ||
| '/lost-item/edit/*', | ||
| '/lost-item/report/*', | ||
| '/lost-item/chat', | ||
| '/store/review/*', | ||
| '/store/review/edit/**', | ||
| ]; | ||
|
|
||
| /** @type {import('next-sitemap').IConfig} */ | ||
| export default { | ||
| siteUrl: SITE_URL, | ||
| generateRobotsTxt: true, | ||
| sitemapSize: 5000, | ||
| changefreq: 'daily', | ||
| priority: 0.7, | ||
| exclude: ['/404', '/_error', '/_app', '/_document', ...PRIVATE_PATHS], | ||
| robotsTxtOptions: { | ||
| policies: [ | ||
| { | ||
| userAgent: '*', | ||
| allow: '/', | ||
| disallow: [ | ||
| '/auth', | ||
| '/webview', | ||
| '/report', | ||
| '/callvan/add', | ||
| '/callvan/chat', | ||
| '/callvan/notifications', | ||
| '/timetable/modify', | ||
| '/clubs/new', | ||
| '/clubs/edit', | ||
| '/clubs/recruitment/edit', | ||
| '/lost-item/edit', | ||
| '/lost-item/report', | ||
| '/lost-item/chat', | ||
| '/store/review', | ||
| ], | ||
| }, | ||
| ], | ||
| }, | ||
| }; | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 1409
🏁 Script executed:
Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 90
🏁 Script executed:
git ls-files | grep -i sitemapRepository: BCSDLab/KOIN_WEB_RECODE
Length of output: 151
🌐 Web query:
next-sitemap exclude robotsTxtOptions disallow behavior configuration💡 Result:
In next-sitemap, the 'exclude' configuration and 'robotsTxtOptions' (specifically its 'policies' with 'disallow') serve distinct purposes with no direct interaction between them. The 'exclude' option is an array of relative paths (supporting wildcards like '/private/') that prevents specific paths from being included in the generated sitemap XML files (sitemap.xml, sitemap-.xml). It does not affect robots.txt generation.[1][2][3] Separately, 'robotsTxtOptions.policies' defines crawl policies for the generated robots.txt file. Each policy has a 'userAgent', optional 'allow' (string | string[]), and optional 'disallow' (string | string[]) directives. These follow standard robots.txt syntax to instruct crawlers on allowed/disallowed paths. For example: robotsTxtOptions: { policies: [ { userAgent: '', allow: '/', disallow: ['/admin/', '/private/'] // Crawlers should not crawl these paths }, { userAgent: 'bad-bot', disallow: '/' } ] } This generates robots.txt entries like: User-agent: * Allow: / Disallow: /admin/* Disallow: /private/ # bad-bot User-agent: bad-bot Disallow: /[3][2] 'robotsTxtOptions.additionalSitemaps' adds extra sitemap URLs to robots.txt (e.g., custom server-side sitemaps), but these may inadvertently appear in the main sitemap index despite 'exclude', due to known issues (e.g.,
#823,#481). Workarounds include custom 'transform' functions or disabling index sitemaps.[4][5][3] Disallowing paths in robots.txt does not exclude them from sitemaps; crawlers use sitemaps to discover URLs but respect robots.txt for crawling/indexing. Conversely, excluding from sitemaps hides from discovery but does not prevent crawling if linked elsewhere.[1][6]Citations:
🌐 Web query:
robots.txt trailing slash pattern matching /auth vs /auth/💡 Result:
In robots.txt path patterns, rules are matched by “path prefix” semantics: a crawler checks whether the URL path starts with the pattern from an Allow/Disallow rule, and the “most specific” (longest matching) rule wins. That means
/authand/auth/are not equivalent patterns; they differ in what they match around the immediate next character afterauth. What matches with Disallow: /auth - It matches any URL path that starts with/auth, including/authitself and any path that continues with anything afterauth(for example/auth/anythingand/authfoo). This follows from prefix matching of the path value in Allow/Disallow rules [1]. What matches with Disallow: /auth/ - It matches only URL paths that start with/auth/(note the trailing slash), i.e., it matches/auth/and subpaths like/auth/login, but it does NOT match/auth(the version without the slash), because that path does not start with/auth/. This is consistent with the example prefix matching behavior and the “longest specific match” rule in RFC 9309 [2]. Practical guidance - If you intend to block a directory and its contents, the standard approach is to include the trailing slash in the directory path pattern (e.g.,Disallow: /auth/) [3]. - If you instead useDisallow: /auth, you may accidentally block URLs you didn’t mean to (e.g.,/auth.html-like paths or other paths that merely begin with/authbut aren’t under/auth/). This risk is explicitly noted for prefix-style matching behavior in explanations of robots.txt path matching [4]. Answer to your specific comparison -Disallow: /authblocks both/authand/auth/…(and also anything else that has prefix/auth). -Disallow: /auth/blocks/auth/…but does not block/auth(no trailing slash).Citations:
🏁 Script executed:
Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 2207
🏁 Script executed:
Repository: BCSDLab/KOIN_WEB_RECODE
Length of output: 95
robots.txt disallow 설정이 누락되어 크롤러가 비공개 경로에 접근할 수 있습니다.
robots.txt의
disallow목록(49-65줄)에서 다음 경로들이 누락되었습니다:/store/review/edit/*(PRIVATE_PATHS에는 있음)/monitoring/*(PRIVATE_PATHS에는 있음)또한
/auth/,/webview/,/report/는 trailing slash 규칙으로만 설정되어/auth,/webview,/report정확 경로는 차단되지 않습니다. 크롤러가 이 경로들에 접근할 수 있게 됩니다.🔧 제안 수정안
🤖 Prompt for AI Agents