Skip to content

Commit ecf4c00

Browse files
authored
Create README.md
1 parent b48d7b1 commit ecf4c00

1 file changed

Lines changed: 260 additions & 0 deletions

File tree

README.md

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# 🎰 GbDetector Documentation
2+
3+
## 📌 Overview
4+
5+
**GbDetector** is an advanced text analysis module designed to identify gambling-related content through pattern matching and sophisticated text processing. It is encapsulated in an Immediately Invoked Function Expression (IIFE) to prevent global namespace pollution, offering a clean API for text analysis.
6+
7+
## 🎯 Purpose
8+
9+
This module assists in detecting potential gambling-related content in text, making it useful for content moderation, compliance monitoring, and filtering systems. It leverages various detection mechanisms to identify both overt and obfuscated gambling-related text.
10+
11+
## 🌟 Key Features
12+
13+
* Text normalization and cleaning ✨
14+
* Detection of obfuscated gambling terminology 🕵️‍♂️
15+
* Pattern matching for gambling-related keywords 🎲
16+
* Leet-speak and character substitution handling 🔤
17+
* URL pattern detection 🌐
18+
* Custom keyword list and blocklist support 📃
19+
* Evasion technique detection 🧩
20+
* Contextual indicator analysis 📖
21+
* Contact info extraction 📞
22+
* Multi-language support 🌍
23+
* Fuzzy search detection 🔍
24+
25+
## 🧭 Architecture Flow Diagram
26+
27+
```
28+
[INPUT TEXT] → [INITIAL PREPROCESSING]
29+
30+
[GARBAGE/REPETITION CHECK] → [URL PATTERN CHECK]
31+
32+
[TEXT NORMALIZATION] → [WORD RECONSTRUCTION]
33+
34+
[NUMBER MERGING] → [BLOCKLIST CHECK]
35+
36+
[PATTERN MATCHING] → [LEET-SPEAK CONVERSION]
37+
38+
[KEYWORD MATCHING] → [CONFIDENCE SCORING]
39+
40+
[OUTPUT RESULT]
41+
```
42+
43+
## 🧠 Detection Algorithm
44+
45+
1. **Preprocessing**:
46+
47+
* Convert newlines to spaces
48+
* Normalize whitespace
49+
* Add spaces around dots
50+
51+
2. **Initial Checks**:
52+
53+
* Detect excessive non-alphanumeric ("garbage") characters
54+
* Identify abnormal repetition patterns
55+
* Detect suspicious URL patterns
56+
* Detect suspicious code or emoji sequences
57+
58+
3. **Evasion Technique Analysis**:
59+
60+
* Identify evasion methods
61+
* Score based on detected techniques
62+
63+
4. **Contextual Indicator Detection**:
64+
65+
* Analyze text context for gambling cues
66+
* Score based on found indicators
67+
68+
5. **Contact Info Extraction**:
69+
70+
* Identify and extract potential contact information
71+
72+
6. **Text Normalization**:
73+
74+
* Remove diacritics and standardize characters
75+
* Reconstruct intentionally split words
76+
* Merge numbers with preceding words
77+
78+
7. **Language Pattern Detection**:
79+
80+
* Apply language-specific pattern checks
81+
* Score based on matches
82+
83+
8. **Domain Matching**:
84+
85+
* Check domains in text against blocklist
86+
* Consider domain format variations
87+
88+
9. **Multi-pass Pattern Matching**:
89+
90+
* Apply standard and custom regex patterns
91+
* Detect with varying strictness levels
92+
* Convert leet-speak with adjustable digit ignore
93+
* Perform fuzzy matching for obfuscated terms
94+
95+
10. **Supporting Keyword Analysis**:
96+
97+
* Detect supporting keywords post-pattern match
98+
* Add bonus based on keyword match count
99+
100+
11. **Content Length & Complexity Analysis**:
101+
102+
* Count metrics like word/character totals, average word length
103+
* Identify spammy gambling content characteristics
104+
105+
12. **Final Evaluation**:
106+
107+
* Normalize checkpoint scores by sensitivity factor
108+
* Determine confidence level: none, low, medium, high
109+
* Generate a detailed analysis report if requested
110+
111+
## 🧩 Core Functions
112+
113+
### `detect(text = "", options = {})`
114+
115+
Main function for detecting gambling patterns in text.
116+
117+
**Parameters:**
118+
119+
* `text` *(string)*: The text to be analyzed
120+
* `options` *(object)*: Configuration options
121+
122+
* `keywords`: Pattern terms to detect (e.g., site names)
123+
* `supportKeywords`: Supporting keyword list
124+
* `domains`: List of domains to detect
125+
* `allowlist`: Whitelisted terms
126+
* `sensitivityLevel`: Detection sensitivity (1–5, default 3)
127+
* `includeAnalysis`: Include detailed analysis in results
128+
* `detectRepetition`, `detectUrlPatterns`, `detectEvasionTechniques`, `detectContextualIndicators`: Boolean toggles for specific detection types
129+
* `extractContactInfo`: Whether to extract contact info
130+
* `language`: Language selection ('en', 'id', 'all')
131+
* `debug`: Show debug info
132+
133+
**Returns:**
134+
An object with detection results:
135+
136+
* `isGambling` *(boolean)*: Whether gambling content is detected
137+
* `confidence` *(string)*: "none", "low", "medium", or "high"
138+
* `checkpoint` *(number)*: Numerical detection score
139+
* `details` *(string)*: Human-readable explanation
140+
* `comment` *(string)*: Original analyzed text
141+
* `analysis` *(object, optional)*: Detailed analysis info
142+
143+
### 🧹 Text Processing
144+
145+
* `cleanText(text)` – Normalize and clean text
146+
* `cleanWeirdPatterns(text)` – Remove odd spacing and punctuation
147+
* `reconstructSeparatedWords(text)` – Reconstruct deliberately split words
148+
* `mergeTextWithTrailingNumbers(text)` – Merge numbers trailing words
149+
150+
### 🧠 Detection Logic
151+
152+
* `isMostlyAsciiGarbage(text, threshold = 0.45)` – Detect non-alphanumeric spam
153+
* `hasAbnormalRepetition(text)` – Detect character/pattern repetition
154+
* `hasSuspiciousUrlPatterns(text)` – Detect obfuscated URLs
155+
* `hasSuspiciousCodeSequences(text)` – Detect suspicious symbols/emoji
156+
* `convertCommentFixed(comment, ignoreLastDigits = 0)` – Convert symbols to characters
157+
* `fuzzySearch(keywords, text)` – Perform fuzzy keyword match
158+
159+
### 🔬 Advanced Analysis
160+
161+
* `analyzeEvasionTechniques(text)`
162+
* `detectContextualGamblingIndicators(text)`
163+
* `extractContactInfos(text)`
164+
* `detectLanguageSpecificPatterns(text, language)`
165+
166+
### 📐 Pattern Matching
167+
168+
* `createPatternRegex(terms, loose = false)` – Create RegExp pattern
169+
* `TinyPatternRegex(terms)` – Create focused RegExp pattern
170+
171+
## 🧮 Mathematical Formulas
172+
173+
### 🎯 Confidence Threshold Calculation
174+
175+
```js
176+
sensitivityFactor = Math.max(1, Math.min(5, sensitivityLevel)) / 3
177+
178+
confidenceThresholds = {
179+
low: Math.max(0.45, 0.5 * sensitivityFactor),
180+
medium: Math.max(0.9, 0.8 * sensitivityFactor),
181+
high: Math.max(1.2, 2.5 * sensitivityFactor)
182+
}
183+
```
184+
185+
### 🧠 Keyword Bonus Calculation
186+
187+
```js
188+
keywordBonus = Math.min(1.5, 0.03 * (keywordMatchCount/2) + 0.7)
189+
```
190+
191+
### 🧹 Garbage Character Ratio
192+
193+
```js
194+
garbageRatio = numberOfGarbageCharacters / totalTextLength
195+
isGarbage = garbageRatio >= threshold (default 0.45)
196+
```
197+
198+
## 💡 Usage Example
199+
200+
```javascript
201+
const examples = [
202+
"sl0t88 maxwin guaranteed win!",
203+
"J4ckp0t Zeus99 trusted - sign up now!",
204+
"c a s i n o online with credit & e-wallet deposit",
205+
"best online gambling site slot gacor maxwin today",
206+
"Get rich quick with winning bets at my-gambling-site.com",
207+
"This is a normal sentence with no gambling content.",
208+
"Z.e.u.s g.a.c.o.r m.a.x.w.i.n",
209+
"j*u*d*i o*n*l*i*n*e biggest site"
210+
];
211+
212+
const customConfig = {
213+
keywords: ["win", "maxwin", "deposit", "withdraw", "gacor"],
214+
supportKeywords: ["jp", "jackpot", "slot", "judi", "casino"],
215+
domains: ["scamsite.com", "badword"],
216+
allowlist: ["normal", "common"],
217+
sensitivityLevel: 3,
218+
includeAnalysis: true,
219+
detectRepetition: true,
220+
detectUrlPatterns: true,
221+
detectEvasionTechniques: true,
222+
detectContextualIndicators: true,
223+
extractContactInfo: true,
224+
language: 'all',
225+
debug: true
226+
};
227+
228+
console.log("=== TESTING EXAMPLES ===");
229+
examples.forEach((example, index) => {
230+
console.log(`\nExample ${index + 1}: "${example}"`);
231+
// run detection logic here
232+
});
233+
```
234+
235+
## 🃏 Default Keywords
236+
237+
**GbDetector** comes with a comprehensive set of default keywords for various gambling-related terms in both English and Indonesian:
238+
239+
**🔑 Primary Keywords**:
240+
241+
```javascript
242+
[
243+
"slot", "casino", "jack", "zeus", "scatter", "toto", "judol", "jodol",
244+
"poker", "roulette", "betting", "gamble", "joker"
245+
]
246+
```
247+
248+
**🛠️ Supporting Keywords**:
249+
250+
* English gambling terms: `wdp`, `wd`, `win`, `happy`, `joyful`, `rich`, `trustworthy`, `lucky`, `trust`, etc.
251+
* Indonesian gambling terms: `menang`, `senang`, `gacor`, `gembira`, `kaya`, `pasti dapat`, `bangga`, `panen`, etc.
252+
253+
## 📄 License
254+
255+
MIT License
256+
© 2025 Ramsyan-Tungga
257+
258+
## ✅ Conclusion
259+
260+
The **GbDetector** module offers a powerful solution for identifying gambling-related content in text, even when obfuscated. By combining various detection techniques and offering extensive configuration options, the module achieves high accuracy while minimizing false positives. Its modular design ensures easy integration into different systems requiring content moderation or filtering capabilities.

0 commit comments

Comments
 (0)