비속어 영어 단어데이터 추가 by KII1ua · Pull Request #231 · JECT-Study/JECT2-4th-Server

KII1ua · 2026-06-07T09:58:37Z

📌 관련 이슈

closes #

🔍 작업 내용

기존 slang.txt 파일에 있는 한국어 욕설 데이터에 영어 욕설데이터를 추가

📝 변경 사항

slang.txt 파일 내용 추가

💬 리뷰어에게

Summary by CodeRabbit

릴리스 노트

Chores
- 필터링 데이터 리스트 업데이트: 543줄 추가, 130줄 제거
- 리스트 항목 재구성 및 새 항목 추가
- 기존 항목 구성 일부 교체

KII1ua · 2026-06-07T09:58:45Z

빌드 시작

coderabbitai · 2026-06-07T09:58:48Z

Warning

Review limit reached

@KII1ua, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 20 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff8e11bf-cfb2-4177-ad7f-571489cd31c6

📥 Commits

Reviewing files that changed from the base of the PR and between f1062c4 and e4c818e.

📒 Files selected for processing (1)

src/main/resources/data/slang.txt

개요

src/main/resources/data/slang.txt 슬랭 필터 데이터 파일이 욕설, 성적 비하, 음란 관련 용어와 그 변형들(숫자/철자/접미사 포함)로 확대되었습니다. 파일 상단이 확장되고, 중간 구간들에 산발적으로 새로운 변형이 추가되며, 대규모 섹션(2045-2480)이 영문 성적/욕설 변형으로 교체되고, 범죄 및 약물 관련 용어가 후반부에 추가되었습니다.

변경 사항

슬랭 필터 데이터 통합 업데이트

Layer / File(s)	설명
파일 상단 욕설 용어 확장 `src/main/resources/data/slang.txt`	파일 시작 부분(1-29)에 다수의 영문 욕설 및 성적 비하 용어가 추가되고, 기존 항목이 재배치되었습니다.
중간 구간 변형 용어 산발적 추가 `src/main/resources/data/slang.txt`	1512-2020 구간에 걸쳐 숫자 변형(예: `십8`, `씨8`), 철자 변형(예: `사kkasi`), 접미 변형(예: `젓같은?`), 영문 변형(예: `loli`, `SUCKSEX`) 등 다양한 형태의 욕설·비하 용어 변형이 추가되었습니다.
영문 성적/욕설 변형 대규모 섹션 교체 `src/main/resources/data/slang.txt`	2045-2480 구간이 대규모로 교체되어, 기존의 한국어 게임/기관/운영자 계열 항목이 제거되고 영문 성적 및 욕설 변형 용어들로 대량 대체되었습니다.
범죄/약물 관련 용어 추가 및 재구성 `src/main/resources/data/slang.txt`	파일 후반부에 n번방(2502), GHB(2548-2554) 등 범죄 및 약물 관련 용어가 추가되고, 기존 약물/도박 관련 용어 묶음이 확장 및 재구성되었습니다.

예상 코드 리뷰 노력

🎯 2 (Simple) | ⏱️ ~12분

시

🐰 욕설 목록을 늘렸네요!
필터링 강화하는 여정이 계속되고,
사용자는 더 깨끗한 곳에서 놉니다. ✨
나쁜 말들은 마치 당근처럼 쪽쪽쪽!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	제목이 PR의 주요 변경 사항을 명확하게 요약하고 있습니다. 슬랭 데이터 파일에 영어 욕설 단어 데이터를 추가한 핵심 목표를 직관적으로 전달합니다.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/slang_word

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-07T10:00:14Z

빌드 성공
배포 준비 완료!

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/main/resources/data/slang.txt (1)
1-2554: ⚡ Quick win

slang.txt 중복 항목은 없음, 대신 섹션/정렬로 유지보수성 개선 권장

sort ... | uniq -d 기준 중복 항목 0개(파일 2553라인).

다만 언어/범주 구분을 위한 주석·섹션·정렬이 없어(한국어/영어 혼재) 유지보수/검증이 어려움 → 언어별/카테고리별 섹션 주석 추가 및 섹션 내 가나다/알파벳 정렬(또는 그룹화)로 변경/관리 용이하게 권장.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/data/slang.txt` around lines 1 - 2554, The file slang.txt
has no duplicate entries but mixes languages and categories, making maintenance
and validation hard; update slang.txt (the top-level file) by adding clear
commented sections (e.g., "# Korean — obscene", "# English — obscene", "#
Sexual", "# Hate/Politics", etc.), move entries into appropriate
language/category sections, and sort each section internally (Hangul-sorted for
Korean sections, ASCII/alphabetical for English sections) while preserving the
current deduplication behavior (you can still verify with sort | uniq -d).
Ensure section headers are consistent and documented at the top so future
contributors can add words into the correct section.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-28: Slang matching is case-sensitive because
SlangFilter.containsSlang() compares chars directly; normalize both the loaded
patterns and the input to a single case (e.g., toLowerCase) so KMP runs on
same-case data: when reading slang.txt convert each entry to lowercase (or
normalize during pattern/load function) and ensure containsSlang() lowercases
the input string before running KMP; alternatively ensure the pattern loader
(the code that builds the pattern list used by containsSlang) canonicalizes
entries to lowercase so mixed-case entries like "Fucking" won't cause misses.
- Line 2448: The slang list contains overly common tokens (e.g., "Admin",
"test", "cs", ".jp", "Olympic", "IOS" in src/main/resources/data/slang.txt) that
cause false positives when wordService.containSlang(nickName) checks for
substring matches (used by /api/users/nickname/slang); fix by either removing or
making these entries specific (e.g., remove generic tokens or replace with
stricter variants) OR change containSlang to perform whole-word/word-boundary
matching (tokenize nickname or use regex word boundaries) so only true slang
hits are blocked; update the slang.txt entries and/or modify the
containSlang(nickName) implementation accordingly.

---

Nitpick comments:
In `@src/main/resources/data/slang.txt`:
- Around line 1-2554: The file slang.txt has no duplicate entries but mixes
languages and categories, making maintenance and validation hard; update
slang.txt (the top-level file) by adding clear commented sections (e.g., "#
Korean — obscene", "# English — obscene", "# Sexual", "# Hate/Politics", etc.),
move entries into appropriate language/category sections, and sort each section
internally (Hangul-sorted for Korean sections, ASCII/alphabetical for English
sections) while preserving the current deduplication behavior (you can still
verify with sort | uniq -d). Ensure section headers are consistent and
documented at the top so future contributors can add words into the correct
section.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0deaf030-d5ea-41cc-bacb-3f2f3b525771

📥 Commits

Reviewing files that changed from the base of the PR and between ed2687a and f1062c4.

📒 Files selected for processing (1)

src/main/resources/data/slang.txt

coderabbitai · 2026-06-07T10:04:18Z

+bitch
+dildo
+dork
+fetish
+fuck
+fucker
+Fucking
+fuckyou
+hentai
+motherfucker
+oral
+orgasms
+penis
+pennis
+pervert
+petish
+phonesex
+porn
+porner
+porno
+sex
+sibal
+sipal
+siutty
+slut
+slutty
+suck
+sucks


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: 영어 비속어의 대소문자 변형 커버리지 확인 # 대문자로 시작하는 영어 단어 찾기 echo "=== 대문자로 시작하는 영어 항목 ===" rg -n '^[A-Z][a-z]+$' src/main/resources/data/slang.txt | head -20 # 동일 단어의 다른 케이스 변형 존재 여부 확인 echo -e "\n=== 'fuck' 관련 변형 ===" rg -i -n '^fuck' src/main/resources/data/slang.txt echo -e "\n=== 소문자/대문자 쌍 미존재 항목 확인 ===" # Lines 1-100에서 영문 항목 추출하여 중복 체크 rg -n '^[A-Za-z]' src/main/resources/data/slang.txt | head -50

Repository: JECT-Study/JECT2-4th-Server

Length of output: 958

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "class SlangFilter|SlangFilter" -S src || true fd -t f "SlangFilter*" -a . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 419

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "class SlangFilter|SlangFilter" -S . || true fd -t f "SlangFilter*" . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 408

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "class SlangFilter|SlangFilter" -S . || true fd -t f "SlangFilter*" . || true

Repository: JECT-Study/JECT2-4th-Server

Length of output: 408

🏁 Script executed:

#!/bin/bash set -euo pipefail ls -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java wc -l src/main/java/com/ject/vs/util/SlangFilter.java src/main/java/com/ject/vs/user/port/WordService.java echo "---- SlangFilter.java (top) ----" nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | head -200 echo "---- SlangFilter.java (tail) ----" nl -ba src/main/java/com/ject/vs/util/SlangFilter.java | tail -120 echo "---- WordService.java (relevant) ----" nl -ba src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,220p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 486

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- SlangFilter.java ----" cat -n src/main/java/com/ject/vs/util/SlangFilter.java echo "---- WordService.java (relevant section) ----" cat -n src/main/java/com/ject/vs/user/port/WordService.java | sed -n '1,140p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 5075

대소문자 혼용으로 필터 매칭 누락/우회 가능 — 영문 정규화 필요
SlangFilter.containsSlang()은 KMP에서 입력/패턴을 그대로 char로 비교(소문자화 등 정규화 없음)하므로 대소문자를 구분합니다. 예를 들어 slang.txt에 fuck(소문자)와 Fucking(대문자 시작)이 함께 있어도 사용자가 fucking처럼 입력하면 매칭이 누락될 수 있습니다. 영문 항목은 로딩/매칭 단계에서 모두 같은 규칙(예: 소문자)으로 정규화하거나, 케이스 변형을 전부 데이터로 추가하세요.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/resources/data/slang.txt` around lines 1 - 28, Slang matching is case-sensitive because SlangFilter.containsSlang() compares chars directly; normalize both the loaded patterns and the input to a single case (e.g., toLowerCase) so KMP runs on same-case data: when reading slang.txt convert each entry to lowercase (or normalize during pattern/load function) and ensure containsSlang() lowercases the input string before running KMP; alternatively ensure the pattern loader (the code that builds the pattern list used by containsSlang) canonicalizes entries to lowercase so mixed-case entries like "Fucking" won't cause misses.

coderabbitai · 2026-06-07T10:04:18Z

+a$$hole
+as$hole
+a$shole
+Admin


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: 코드베이스에서 필터링될 수 있는 일반 용어 사용 확인 echo "=== 'test' 사용 사례 ===" rg -i '\btest\b' --type=java -C 1 | head -20 echo -e "\n=== 'admin' 사용 사례 ===" rg -i '\badmin\b' --type=java -C 1 | head -20 echo -e "\n=== 'ios' 사용 사례 ===" rg -i '\bios\b' --type=java -C 1 | head -10

Repository: JECT-Study/JECT2-4th-Server

Length of output: 3910

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="src/main/resources/data/slang.txt" echo "=== File check ===" ls -l "$FILE" echo echo "=== Lines 2438-2480 (주요 확인 구간) ===" nl -ba "$FILE" | sed -n '2438,2480p'

Repository: JECT-Study/JECT2-4th-Server

Length of output: 272

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="src/main/resources/data/slang.txt" echo "=== Lines 2438-2480 (content) ===" sed -n '2438,2480p' "$FILE" | cat -n echo echo "=== Exact target lines ===" for n in 2448 2469 2473 2475 2476 2477; do echo "--- $n ---" sed -n "${n}p" "$FILE" done

Repository: JECT-Study/JECT2-4th-Server

Length of output: 817

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) slang.txt 파일 참조(로드) 지점 찾기 rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' "slang\.txt" . # 2) slang 항목을 그대로 매칭하는지(contains/containsIgnoreCase/regex) 주변 로직 찾기 # - slang 관련 변수명 추정 검색 rg -n --hidden --glob '!**/node_modules/**' --glob '!**/build/**' --glob '!**/dist/**' \ -S "slang|욕설|비속어|filter|profan|censor|badword" src/main/java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 8315

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,200p' src/main/java/com/ject/vs/util/SlangFilter.java echo "----" sed -n '1,220p' src/main/java/com/ject/vs/user/port/WordService.java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 4055

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,140p' src/main/java/com/ject/vs/user/port/UserService.java echo "----" sed -n '1,120p' src/main/java/com/ject/vs/user/adapter/web/UserController.java

Repository: JECT-Study/JECT2-4th-Server

Length of output: 9976

닉네임 슬랭 필터에 과도하게 일반적인 토큰이 포함되어 오탐 위험이 큼

src/main/resources/data/slang.txt

2448 Admin, 2469 test, 2473 cs, 2475 .jp, 2476 Olympic, 2477 IOS

해당 슬랭 필터는 KMP로 “단어 경계 없이” 입력 닉네임에 슬랭 토큰이 부분 문자열로 포함되면 차단합니다( /api/users/nickname/slang 경로에서 wordService.containSlang(nickName) 사용 ).

test/cs 같은 매우 흔한 문자열은 닉네임을 불필요하게 거부할 가능성이 큼.

권장: 일반 토큰 제거/구체화

-Admin -test -cs -.jp -Olympic -IOS

별도로 2045-2480 구간에서 기존 한국어 게임/기관/운영자 항목이 실제로 삭제/대체됐는지는 slang.txt 변경 전후 diff로 확인 필요(현재 제공 스니펫만으로는 누락 여부 판단 불가).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/resources/data/slang.txt` at line 2448, The slang list contains overly common tokens (e.g., "Admin", "test", "cs", ".jp", "Olympic", "IOS" in src/main/resources/data/slang.txt) that cause false positives when wordService.containSlang(nickName) checks for substring matches (used by /api/users/nickname/slang); fix by either removing or making these entries specific (e.g., remove generic tokens or replace with stricter variants) OR change containSlang to perform whole-word/word-boundary matching (tokenize nickname or use regex word boundaries) so only true slang hits are blocked; update the slang.txt entries and/or modify the containSlang(nickName) implementation accordingly.

KII1ua self-assigned this Jun 7, 2026

KII1ua added the feature label Jun 7, 2026

KII1ua requested review from Junhyukkkk and tlarbals824 as code owners June 7, 2026 09:58

github-actions Bot added the refactor label Jun 7, 2026

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

KII1ua added 2 commits June 7, 2026 19:11

영어 slang 단어 추가

12c7f26

rebase 합치기

e4c818e

KII1ua force-pushed the refactor/slang_word branch from f1062c4 to e4c818e Compare June 7, 2026 10:13

KII1ua merged commit 27149a4 into develop Jun 7, 2026
1 check passed

github-actions Bot deleted the refactor/slang_word branch June 7, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

비속어 영어 단어데이터 추가#231

비속어 영어 단어데이터 추가#231
KII1ua merged 2 commits into
developfrom
refactor/slang_word

KII1ua commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

KII1ua commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Review limit reached

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

KII1ua commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 관련 이슈

🔍 작업 내용

📝 변경 사항

💬 리뷰어에게

Summary by CodeRabbit

릴리스 노트

Uh oh!

KII1ua commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

개요

변경 사항

예상 코드 리뷰 노력

시

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KII1ua commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading