Skip to content

비속어 영어 단어데이터 추가#231

Merged
KII1ua merged 2 commits into
developfrom
refactor/slang_word
Jun 7, 2026
Merged

비속어 영어 단어데이터 추가#231
KII1ua merged 2 commits into
developfrom
refactor/slang_word

Conversation

@KII1ua

@KII1ua KII1ua commented Jun 7, 2026

Copy link
Copy Markdown
Member

📌 관련 이슈

  • closes #

🔍 작업 내용

기존 slang.txt 파일에 있는 한국어 욕설 데이터에 영어 욕설데이터를 추가

📝 변경 사항

slang.txt 파일 내용 추가

💬 리뷰어에게

Summary by CodeRabbit

릴리스 노트

  • Chores
    • 필터링 데이터 리스트 업데이트: 543줄 추가, 130줄 제거
    • 리스트 항목 재구성 및 새 항목 추가
    • 기존 항목 구성 일부 교체

@KII1ua KII1ua self-assigned this Jun 7, 2026
@KII1ua KII1ua added the feature label Jun 7, 2026
@KII1ua

KII1ua commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

빌드 시작

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@KII1ua, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 20 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff8e11bf-cfb2-4177-ad7f-571489cd31c6

📥 Commits

Reviewing files that changed from the base of the PR and between f1062c4 and e4c818e.

📒 Files selected for processing (1)
  • src/main/resources/data/slang.txt

개요

src/main/resources/data/slang.txt 슬랭 필터 데이터 파일이 욕설, 성적 비하, 음란 관련 용어와 그 변형들(숫자/철자/접미사 포함)로 확대되었습니다. 파일 상단이 확장되고, 중간 구간들에 산발적으로 새로운 변형이 추가되며, 대규모 섹션(2045-2480)이 영문 성적/욕설 변형으로 교체되고, 범죄 및 약물 관련 용어가 후반부에 추가되었습니다.

변경 사항

슬랭 필터 데이터 통합 업데이트

Layer / File(s) 설명
파일 상단 욕설 용어 확장
src/main/resources/data/slang.txt
파일 시작 부분(1-29)에 다수의 영문 욕설 및 성적 비하 용어가 추가되고, 기존 항목이 재배치되었습니다.
중간 구간 변형 용어 산발적 추가
src/main/resources/data/slang.txt
1512-2020 구간에 걸쳐 숫자 변형(예: 십8, 씨8), 철자 변형(예: 사kkasi), 접미 변형(예: 젓같은?), 영문 변형(예: loli, SUCKSEX) 등 다양한 형태의 욕설·비하 용어 변형이 추가되었습니다.
영문 성적/욕설 변형 대규모 섹션 교체
src/main/resources/data/slang.txt
2045-2480 구간이 대규모로 교체되어, 기존의 한국어 게임/기관/운영자 계열 항목이 제거되고 영문 성적 및 욕설 변형 용어들로 대량 대체되었습니다.
범죄/약물 관련 용어 추가 및 재구성
src/main/resources/data/slang.txt
파일 후반부에 n번방(2502), GHB(2548-2554) 등 범죄 및 약물 관련 용어가 추가되고, 기존 약물/도박 관련 용어 묶음이 확장 및 재구성되었습니다.

예상 코드 리뷰 노력

🎯 2 (Simple) | ⏱️ ~12분

🐰 욕설 목록을 늘렸네요!
필터링 강화하는 여정이 계속되고,
사용자는 더 깨끗한 곳에서 놉니다. ✨
나쁜 말들은 마치 당근처럼 쪽쪽쪽!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 제목이 PR의 주요 변경 사항을 명확하게 요약하고 있습니다. 슬랭 데이터 파일에 영어 욕설 단어 데이터를 추가한 핵심 목표를 직관적으로 전달합니다.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/slang_word

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

빌드 성공
배포 준비 완료!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/main/resources/data/slang.txt (1)

1-2554: ⚡ Quick win

slang.txt 중복 항목은 없음, 대신 섹션/정렬로 유지보수성 개선 권장

  • sort ... | uniq -d 기준 중복 항목 0개(파일 2553라인).
  • 다만 언어/범주 구분을 위한 주석·섹션·정렬이 없어(한국어/영어 혼재) 유지보수/검증이 어려움 → 언어별/카테고리별 섹션 주석 추가 및 섹션 내 가나다/알파벳 정렬(또는 그룹화)로 변경/관리 용이하게 권장.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/data/slang.txt` around lines 1 - 2554, The file slang.txt
has no duplicate entries but mixes languages and categories, making maintenance
and validation hard; update slang.txt (the top-level file) by adding clear
commented sections (e.g., "# Korean — obscene", "# English — obscene", "#
Sexual", "# Hate/Politics", etc.), move entries into appropriate
language/category sections, and sort each section internally (Hangul-sorted for
Korean sections, ASCII/alphabetical for English sections) while preserving the
current deduplication behavior (you can still verify with sort | uniq -d).
Ensure section headers are consistent and documented at the top so future
contributors can add words into the correct section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-28: Slang matching is case-sensitive because
SlangFilter.containsSlang() compares chars directly; normalize both the loaded
patterns and the input to a single case (e.g., toLowerCase) so KMP runs on
same-case data: when reading slang.txt convert each entry to lowercase (or
normalize during pattern/load function) and ensure containsSlang() lowercases
the input string before running KMP; alternatively ensure the pattern loader
(the code that builds the pattern list used by containsSlang) canonicalizes
entries to lowercase so mixed-case entries like "Fucking" won't cause misses.
- Line 2448: The slang list contains overly common tokens (e.g., "Admin",
"test", "cs", ".jp", "Olympic", "IOS" in src/main/resources/data/slang.txt) that
cause false positives when wordService.containSlang(nickName) checks for
substring matches (used by /api/users/nickname/slang); fix by either removing or
making these entries specific (e.g., remove generic tokens or replace with
stricter variants) OR change containSlang to perform whole-word/word-boundary
matching (tokenize nickname or use regex word boundaries) so only true slang
hits are blocked; update the slang.txt entries and/or modify the
containSlang(nickName) implementation accordingly.

---

Nitpick comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-2554: The file slang.txt has no duplicate entries but mixes
languages and categories, making maintenance and validation hard; update
slang.txt (the top-level file) by adding clear commented sections (e.g., "#
Korean — obscene", "# English — obscene", "# Sexual", "# Hate/Politics", etc.),
move entries into appropriate language/category sections, and sort each section
internally (Hangul-sorted for Korean sections, ASCII/alphabetical for English
sections) while preserving the current deduplication behavior (you can still
verify with sort | uniq -d). Ensure section headers are consistent and
documented at the top so future contributors can add words into the correct
section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0deaf030-d5ea-41cc-bacb-3f2f3b525771

📥 Commits

Reviewing files that changed from the base of the PR and between ed2687a and f1062c4.

📒 Files selected for processing (1)
  • src/main/resources/data/slang.txt

Comment on lines +1 to +28
bitch
dildo
dork
fetish
fuck
fucker
Fucking
fuckyou
hentai
motherfucker
oral
orgasms
penis
pennis
pervert
petish
phonesex
porn
porner
porno
sex
sibal
sipal
siutty
slut
slutty
suck
sucks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: 영어 비속어의 대소문자 변형 커버리지 확인

# 대문자로 시작하는 영어 단어 찾기
echo "=== 대문자로 시작하는 영어 항목 ==="
rg -n '^[A-Z][a-z]+$' src/main/resources/data/slang.txt | head -20

# 동일 단어의 다른 케이스 변형 존재 여부 확인
echo -e "\n=== 'fuck' 관련 변형 ==="
rg -i -n '^fuck' src/main/resources/data/slang.txt

echo -e "\n=== 소문자/대문자 쌍 미존재 항목 확인 ==="
# Lines 1-100에서 영문 항목 추출하여 중복 체크
rg -n '^[A-Za-z]' src/main/resources/data/slang.txt | head -50

Repository: JECT-Study/JECT2-4th-Server

Length of output: 958


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S src || true
fd -t f "SlangFilter*" -a . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 419


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S . || true
fd -t f "SlangFilter*" . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 408


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "class SlangFilter|SlangFilter" -S . || true
fd -t f "SlangFilter*" . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 408


🏁 Script executed:

#!/bin/bash
set -euo pipefail
ls -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java
wc -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java

echo "---- SlangFilter.java (top) ----"
nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | head -200

echo "---- SlangFilter.java (tail) ----"
nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | tail -120

echo "---- WordService.java (relevant) ----"
nl -ba src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,220p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 486


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- SlangFilter.java ----"
cat -n src/main/java/com/ject/vs/util/SlangFilter.java

echo "---- WordService.java (relevant section) ----"
cat -n src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,140p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 5075


대소문자 혼용으로 필터 매칭 누락/우회 가능 — 영문 정규화 필요
SlangFilter.containsSlang()은 KMP에서 입력/패턴을 그대로 char로 비교(소문자화 등 정규화 없음)하므로 대소문자를 구분합니다. 예를 들어 slang.txtfuck(소문자)와 Fucking(대문자 시작)이 함께 있어도 사용자가 fucking처럼 입력하면 매칭이 누락될 수 있습니다. 영문 항목은 로딩/매칭 단계에서 모두 같은 규칙(예: 소문자)으로 정규화하거나, 케이스 변형을 전부 데이터로 추가하세요.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/data/slang.txt` around lines 1 - 28, Slang matching is
case-sensitive because SlangFilter.containsSlang() compares chars directly;
normalize both the loaded patterns and the input to a single case (e.g.,
toLowerCase) so KMP runs on same-case data: when reading slang.txt convert each
entry to lowercase (or normalize during pattern/load function) and ensure
containsSlang() lowercases the input string before running KMP; alternatively
ensure the pattern loader (the code that builds the pattern list used by
containsSlang) canonicalizes entries to lowercase so mixed-case entries like
"Fucking" won't cause misses.

a$$hole
as$hole
a$shole
Admin

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: 코드베이스에서 필터링될 수 있는 일반 용어 사용 확인

echo "=== 'test' 사용 사례 ==="
rg -i '\btest\b' --type=java -C 1 | head -20

echo -e "\n=== 'admin' 사용 사례 ==="
rg -i '\badmin\b' --type=java -C 1 | head -20

echo -e "\n=== 'ios' 사용 사례 ==="
rg -i '\bios\b' --type=java -C 1 | head -10

Repository: JECT-Study/JECT2-4th-Server

Length of output: 3910


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/main/resources/data/slang.txt"

echo "=== File check ==="
ls -l "$FILE"

echo
echo "=== Lines 2438-2480 (주요 확인 구간) ==="
nl -ba "$FILE" | sed -n '2438,2480p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 272


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/main/resources/data/slang.txt"

echo "=== Lines 2438-2480 (content) ==="
sed -n '2438,2480p' "$FILE" | cat -n

echo
echo "=== Exact target lines ==="
for n in 2448 2469 2473 2475 2476 2477; do
  echo "--- $n ---"
  sed -n "${n}p" "$FILE"
done

Repository: JECT-Study/JECT2-4th-Server

Length of output: 817


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) slang.txt 파일 참조(로드) 지점 찾기
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' "slang\.txt" .

# 2) slang 항목을 그대로 매칭하는지(contains/containsIgnoreCase/regex) 주변 로직 찾기
#    - slang 관련 변수명 추정 검색
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' \
  -S "slang|욕설|비속어|filter|profan|censor|badword" src/main/java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 8315


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,200p' src/main/java/com/ject/vs/util/SlangFilter.java
echo "----"
sed -n '1,220p' src/main/java/com/ject/vs/user/port/WordService.java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 4055


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,140p' src/main/java/com/ject/vs/user/port/UserService.java
echo "----"
sed -n '1,120p' src/main/java/com/ject/vs/user/adapter/web/UserController.java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 9976


닉네임 슬랭 필터에 과도하게 일반적인 토큰이 포함되어 오탐 위험이 큼

  • src/main/resources/data/slang.txt
    • 2448 Admin, 2469 test, 2473 cs, 2475 .jp, 2476 Olympic, 2477 IOS
  • 해당 슬랭 필터는 KMP로 “단어 경계 없이” 입력 닉네임에 슬랭 토큰이 부분 문자열로 포함되면 차단합니다( /api/users/nickname/slang 경로에서 wordService.containSlang(nickName) 사용 ).
  • test/cs 같은 매우 흔한 문자열은 닉네임을 불필요하게 거부할 가능성이 큼.
권장: 일반 토큰 제거/구체화
-Admin
-test
-cs
-.jp
-Olympic
-IOS
  • 별도로 2045-2480 구간에서 기존 한국어 게임/기관/운영자 항목이 실제로 삭제/대체됐는지는 slang.txt 변경 전후 diff로 확인 필요(현재 제공 스니펫만으로는 누락 여부 판단 불가).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/data/slang.txt` at line 2448, The slang list contains
overly common tokens (e.g., "Admin", "test", "cs", ".jp", "Olympic", "IOS" in
src/main/resources/data/slang.txt) that cause false positives when
wordService.containSlang(nickName) checks for substring matches (used by
/api/users/nickname/slang); fix by either removing or making these entries
specific (e.g., remove generic tokens or replace with stricter variants) OR
change containSlang to perform whole-word/word-boundary matching (tokenize
nickname or use regex word boundaries) so only true slang hits are blocked;
update the slang.txt entries and/or modify the containSlang(nickName)
implementation accordingly.

@KII1ua KII1ua force-pushed the refactor/slang_word branch from f1062c4 to e4c818e Compare June 7, 2026 10:13
@KII1ua KII1ua merged commit 27149a4 into develop Jun 7, 2026
1 check passed
@github-actions github-actions Bot deleted the refactor/slang_word branch June 7, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant