비속어 영어 단어데이터 추가#231
Conversation
|
빌드 시작 |
|
Warning Review limit reached
More reviews will be available in 45 minutes and 20 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
개요
변경 사항슬랭 필터 데이터 통합 업데이트
예상 코드 리뷰 노력🎯 2 (Simple) | ⏱️ ~12분 시
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
빌드 성공 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/main/resources/data/slang.txt (1)
1-2554: ⚡ Quick winslang.txt 중복 항목은 없음, 대신 섹션/정렬로 유지보수성 개선 권장
sort ... | uniq -d기준 중복 항목 0개(파일 2553라인).- 다만 언어/범주 구분을 위한 주석·섹션·정렬이 없어(한국어/영어 혼재) 유지보수/검증이 어려움 → 언어별/카테고리별 섹션 주석 추가 및 섹션 내 가나다/알파벳 정렬(또는 그룹화)로 변경/관리 용이하게 권장.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/resources/data/slang.txt` around lines 1 - 2554, The file slang.txt has no duplicate entries but mixes languages and categories, making maintenance and validation hard; update slang.txt (the top-level file) by adding clear commented sections (e.g., "# Korean — obscene", "# English — obscene", "# Sexual", "# Hate/Politics", etc.), move entries into appropriate language/category sections, and sort each section internally (Hangul-sorted for Korean sections, ASCII/alphabetical for English sections) while preserving the current deduplication behavior (you can still verify with sort | uniq -d). Ensure section headers are consistent and documented at the top so future contributors can add words into the correct section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-28: Slang matching is case-sensitive because
SlangFilter.containsSlang() compares chars directly; normalize both the loaded
patterns and the input to a single case (e.g., toLowerCase) so KMP runs on
same-case data: when reading slang.txt convert each entry to lowercase (or
normalize during pattern/load function) and ensure containsSlang() lowercases
the input string before running KMP; alternatively ensure the pattern loader
(the code that builds the pattern list used by containsSlang) canonicalizes
entries to lowercase so mixed-case entries like "Fucking" won't cause misses.
- Line 2448: The slang list contains overly common tokens (e.g., "Admin",
"test", "cs", ".jp", "Olympic", "IOS" in src/main/resources/data/slang.txt) that
cause false positives when wordService.containSlang(nickName) checks for
substring matches (used by /api/users/nickname/slang); fix by either removing or
making these entries specific (e.g., remove generic tokens or replace with
stricter variants) OR change containSlang to perform whole-word/word-boundary
matching (tokenize nickname or use regex word boundaries) so only true slang
hits are blocked; update the slang.txt entries and/or modify the
containSlang(nickName) implementation accordingly.
---
Nitpick comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-2554: The file slang.txt has no duplicate entries but mixes
languages and categories, making maintenance and validation hard; update
slang.txt (the top-level file) by adding clear commented sections (e.g., "#
Korean — obscene", "# English — obscene", "# Sexual", "# Hate/Politics", etc.),
move entries into appropriate language/category sections, and sort each section
internally (Hangul-sorted for Korean sections, ASCII/alphabetical for English
sections) while preserving the current deduplication behavior (you can still
verify with sort | uniq -d). Ensure section headers are consistent and
documented at the top so future contributors can add words into the correct
section.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0deaf030-d5ea-41cc-bacb-3f2f3b525771
📒 Files selected for processing (1)
src/main/resources/data/slang.txt
| bitch | ||
| dildo | ||
| dork | ||
| fetish | ||
| fuck | ||
| fucker | ||
| Fucking | ||
| fuckyou | ||
| hentai | ||
| motherfucker | ||
| oral | ||
| orgasms | ||
| penis | ||
| pennis | ||
| pervert | ||
| petish | ||
| phonesex | ||
| porn | ||
| porner | ||
| porno | ||
| sex | ||
| sibal | ||
| sipal | ||
| siutty | ||
| slut | ||
| slutty | ||
| suck | ||
| sucks |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: 영어 비속어의 대소문자 변형 커버리지 확인
# 대문자로 시작하는 영어 단어 찾기
echo "=== 대문자로 시작하는 영어 항목 ==="
rg -n '^[A-Z][a-z]+$' src/main/resources/data/slang.txt | head -20
# 동일 단어의 다른 케이스 변형 존재 여부 확인
echo -e "\n=== 'fuck' 관련 변형 ==="
rg -i -n '^fuck' src/main/resources/data/slang.txt
echo -e "\n=== 소문자/대문자 쌍 미존재 항목 확인 ==="
# Lines 1-100에서 영문 항목 추출하여 중복 체크
rg -n '^[A-Za-z]' src/main/resources/data/slang.txt | head -50Repository: JECT-Study/JECT2-4th-Server
Length of output: 958
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S src || true
fd -t f "SlangFilter*" -a . || trueRepository: JECT-Study/JECT2-4th-Server
Length of output: 419
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S . || true
fd -t f "SlangFilter*" . || trueRepository: JECT-Study/JECT2-4th-Server
Length of output: 408
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S . || true
fd -t f "SlangFilter*" . || trueRepository: JECT-Study/JECT2-4th-Server
Length of output: 408
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java
wc -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java
echo "---- SlangFilter.java (top) ----"
nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | head -200
echo "---- SlangFilter.java (tail) ----"
nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | tail -120
echo "---- WordService.java (relevant) ----"
nl -ba src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,220p'Repository: JECT-Study/JECT2-4th-Server
Length of output: 486
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- SlangFilter.java ----"
cat -n src/main/java/com/ject/vs/util/SlangFilter.java
echo "---- WordService.java (relevant section) ----"
cat -n src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,140p'Repository: JECT-Study/JECT2-4th-Server
Length of output: 5075
대소문자 혼용으로 필터 매칭 누락/우회 가능 — 영문 정규화 필요
SlangFilter.containsSlang()은 KMP에서 입력/패턴을 그대로 char로 비교(소문자화 등 정규화 없음)하므로 대소문자를 구분합니다. 예를 들어 slang.txt에 fuck(소문자)와 Fucking(대문자 시작)이 함께 있어도 사용자가 fucking처럼 입력하면 매칭이 누락될 수 있습니다. 영문 항목은 로딩/매칭 단계에서 모두 같은 규칙(예: 소문자)으로 정규화하거나, 케이스 변형을 전부 데이터로 추가하세요.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/resources/data/slang.txt` around lines 1 - 28, Slang matching is
case-sensitive because SlangFilter.containsSlang() compares chars directly;
normalize both the loaded patterns and the input to a single case (e.g.,
toLowerCase) so KMP runs on same-case data: when reading slang.txt convert each
entry to lowercase (or normalize during pattern/load function) and ensure
containsSlang() lowercases the input string before running KMP; alternatively
ensure the pattern loader (the code that builds the pattern list used by
containsSlang) canonicalizes entries to lowercase so mixed-case entries like
"Fucking" won't cause misses.
| a$$hole | ||
| as$hole | ||
| a$shole | ||
| Admin |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: 코드베이스에서 필터링될 수 있는 일반 용어 사용 확인
echo "=== 'test' 사용 사례 ==="
rg -i '\btest\b' --type=java -C 1 | head -20
echo -e "\n=== 'admin' 사용 사례 ==="
rg -i '\badmin\b' --type=java -C 1 | head -20
echo -e "\n=== 'ios' 사용 사례 ==="
rg -i '\bios\b' --type=java -C 1 | head -10Repository: JECT-Study/JECT2-4th-Server
Length of output: 3910
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/main/resources/data/slang.txt"
echo "=== File check ==="
ls -l "$FILE"
echo
echo "=== Lines 2438-2480 (주요 확인 구간) ==="
nl -ba "$FILE" | sed -n '2438,2480p'Repository: JECT-Study/JECT2-4th-Server
Length of output: 272
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/main/resources/data/slang.txt"
echo "=== Lines 2438-2480 (content) ==="
sed -n '2438,2480p' "$FILE" | cat -n
echo
echo "=== Exact target lines ==="
for n in 2448 2469 2473 2475 2476 2477; do
echo "--- $n ---"
sed -n "${n}p" "$FILE"
doneRepository: JECT-Study/JECT2-4th-Server
Length of output: 817
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) slang.txt 파일 참조(로드) 지점 찾기
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' "slang\.txt" .
# 2) slang 항목을 그대로 매칭하는지(contains/containsIgnoreCase/regex) 주변 로직 찾기
# - slang 관련 변수명 추정 검색
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' \
-S "slang|욕설|비속어|filter|profan|censor|badword" src/main/javaRepository: JECT-Study/JECT2-4th-Server
Length of output: 8315
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,200p' src/main/java/com/ject/vs/util/SlangFilter.java
echo "----"
sed -n '1,220p' src/main/java/com/ject/vs/user/port/WordService.javaRepository: JECT-Study/JECT2-4th-Server
Length of output: 4055
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,140p' src/main/java/com/ject/vs/user/port/UserService.java
echo "----"
sed -n '1,120p' src/main/java/com/ject/vs/user/adapter/web/UserController.javaRepository: JECT-Study/JECT2-4th-Server
Length of output: 9976
닉네임 슬랭 필터에 과도하게 일반적인 토큰이 포함되어 오탐 위험이 큼
- src/main/resources/data/slang.txt
- 2448
Admin, 2469test, 2473cs, 2475.jp, 2476Olympic, 2477IOS
- 2448
- 해당 슬랭 필터는 KMP로 “단어 경계 없이” 입력 닉네임에 슬랭 토큰이 부분 문자열로 포함되면 차단합니다( /api/users/nickname/slang 경로에서
wordService.containSlang(nickName)사용 ). test/cs같은 매우 흔한 문자열은 닉네임을 불필요하게 거부할 가능성이 큼.
권장: 일반 토큰 제거/구체화
-Admin
-test
-cs
-.jp
-Olympic
-IOS- 별도로 2045-2480 구간에서 기존 한국어 게임/기관/운영자 항목이 실제로 삭제/대체됐는지는 slang.txt 변경 전후 diff로 확인 필요(현재 제공 스니펫만으로는 누락 여부 판단 불가).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/resources/data/slang.txt` at line 2448, The slang list contains
overly common tokens (e.g., "Admin", "test", "cs", ".jp", "Olympic", "IOS" in
src/main/resources/data/slang.txt) that cause false positives when
wordService.containSlang(nickName) checks for substring matches (used by
/api/users/nickname/slang); fix by either removing or making these entries
specific (e.g., remove generic tokens or replace with stricter variants) OR
change containSlang to perform whole-word/word-boundary matching (tokenize
nickname or use regex word boundaries) so only true slang hits are blocked;
update the slang.txt entries and/or modify the containSlang(nickName)
implementation accordingly.
f1062c4 to
e4c818e
Compare
📌 관련 이슈
🔍 작업 내용
기존 slang.txt 파일에 있는 한국어 욕설 데이터에 영어 욕설데이터를 추가
📝 변경 사항
slang.txt 파일 내용 추가
💬 리뷰어에게
Summary by CodeRabbit
릴리스 노트