Skip to content

Commit 4cb12d4

Browse files
committed
Initial commit: subindo projeto de classificação balanceada
0 parents  commit 4cb12d4

19 files changed

Lines changed: 560 additions & 0 deletions

.github/workflows/ci.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: ["**"]
6+
pull_request:
7+
branches: ["**"]
8+
9+
jobs:
10+
tests:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- name: Checkout repository
15+
uses: actions/checkout@v4
16+
17+
- name: Set up Python
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: "3.12"
21+
22+
- name: Install dependencies
23+
run: |
24+
python -m pip install --upgrade pip
25+
pip install -r requirements.txt
26+
27+
- name: Run tests
28+
run: pytest -q

.gitignore

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Python cache and compiled files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.pyd
6+
*.pyo
7+
8+
# Python runtime files
9+
.python-version
10+
11+
# Test, coverage and reports
12+
.coverage
13+
.coverage.*
14+
coverage.xml
15+
htmlcov/
16+
pytestdebug.log
17+
18+
# Logs
19+
*.log
20+
21+
# Virtual environments
22+
.venv/
23+
venv/
24+
env/
25+
ENV/
26+
27+
# Distribution / packaging
28+
build/
29+
dist/
30+
*.egg-info/
31+
.eggs/
32+
pip-wheel-metadata/
33+
34+
# Tool caches
35+
.pytest_cache/
36+
.mypy_cache/
37+
.ruff_cache/
38+
.ipynb_checkpoints/
39+
.tox/
40+
.nox/
41+
42+
# IDE/editor
43+
.vscode/
44+
.idea/
45+
*.code-workspace
46+
47+
# OS files
48+
.DS_Store
49+
Thumbs.db
50+
desktop.ini
51+
52+
# Generated outputs
53+
graficos/
54+
55+
# Local environment files
56+
.env
57+
.env.*
58+
59+
# Type checker / language server
60+
.pyright/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 [fullname]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Projeto de Deteccao de Anomalias em Cartoes de Credito
2+
3+
Projeto desenvolvido por: Hélio Júnior
4+
Plataforma de Estudos: DIO - Digital Innovation One
5+
6+
Este projeto aplica tecnicas de Machine Learning para deteccao de fraude em transacoes de cartao de credito usando a base publica hospedada em:
7+
https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv
8+
9+
## Objetivo
10+
11+
Comparar modelos e abordagens para classificacao de transacoes fraudulentas, incluindo:
12+
- Regressao Logistica
13+
- Random Forest
14+
- Pipeline com threshold customizado
15+
- XGBoost
16+
- Busca de hiperparametros com GridSearchCV
17+
- Explicabilidade com SHAP (opcional)
18+
19+
## Estrutura do Projeto
20+
21+
- main.py: atalho para iniciar a aplicacao
22+
- run.ps1: script de execucao com um comando
23+
- src/main.py: orquestracao principal do fluxo
24+
- src/data_processing.py: carga e preparo dos dados
25+
- src/logistic_analysis.py: treino, metricas e graficos da Regressao Logistica
26+
- src/balancing.py: undersampling e SMOTE
27+
- src/random_forest_analysis.py: treino e avaliacao de Random Forest
28+
- src/pipeline_analysis.py: pipeline de padronizacao + Regressao Logistica e threshold customizado
29+
- src/xgboost_analysis.py: treino, avaliacao e importancia de features do XGBoost
30+
- src/hyperparameter_tuning.py: ajuste de hiperparametros com GridSearchCV
31+
- src/shap_analysis.py: explicabilidade com SHAP
32+
- .gitignore: arquivos e pastas ignorados pelo Git
33+
- requirements.txt: dependencias do projeto
34+
- analise.ipynb: notebook para analises adicionais
35+
36+
## Requisitos
37+
38+
- Python 3.10+
39+
- Ambiente virtual recomendado
40+
41+
## Instalacao
42+
43+
No PowerShell, dentro da pasta do projeto:
44+
45+
```powershell
46+
py -3.12 -m venv .venv
47+
.\.venv\Scripts\Activate.ps1
48+
python -m pip install --upgrade pip
49+
python -m pip install -r requirements.txt
50+
```
51+
52+
## Como Executar
53+
54+
```powershell
55+
python .\main.py
56+
```
57+
58+
Ou com um comando dedicado:
59+
60+
```powershell
61+
.\run.ps1
62+
```
63+
64+
## Testes
65+
66+
Executar testes locais:
67+
68+
```powershell
69+
pytest -q
70+
```
71+
72+
Os testes atuais validam importacao e integridade basica dos modulos.
73+
74+
## CI
75+
76+
Foi adicionado pipeline de CI em .github/workflows/ci.yml para:
77+
- instalar dependencias
78+
- executar pytest automaticamente em push e pull request
79+
80+
## Saidas Geradas
81+
82+
Durante a execucao, o projeto:
83+
- Exibe relatorios de classificacao no terminal para cada modelo
84+
- Exibe AUC e Average Precision para Regressao Logistica
85+
- Gera imagens em graficos/: roc_curve.png, precision_recall_curve.png, xgboost_feature_importance.png e shap_bar.png (se SHAP estiver instalado e funcional)
86+
87+
## Observacoes
88+
89+
- Se o pacote SHAP nao estiver instalado, a etapa de explicabilidade e ignorada automaticamente.
90+
- O backend grafico esta configurado para modo nao interativo (Agg), ideal para execucao em terminal.
91+
92+
## Melhorias Futuras
93+
94+
- Salvar metricas em arquivos CSV/JSON
95+
- Adicionar comparacao visual entre modelos
96+
- Inserir validacao cruzada para todos os modelos
97+
- Criar testes unitarios de comportamento para cada modulo
98+
99+
## Licenca
100+
101+
Projeto para fins educacionais.

analise.ipynb

Whitespace-only changes.

main.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#Projeto desenvolvido por: [Hélio Júnior]
2+
#Plataforma de Estudos: [DIO - Digital Innovation One]
3+
4+
from src.main import main
5+
6+
7+
if __name__ == "__main__":
8+
main()

requirements.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pandas
2+
matplotlib
3+
scikit-learn
4+
imbalanced-learn
5+
xgboost
6+
shap
7+
pytest

run.ps1

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
$pythonExe = ".\.venv\Scripts\python.exe"
2+
3+
if (-not (Test-Path $pythonExe)) {
4+
Write-Host "Ambiente .venv nao encontrado. Crie com: py -3.12 -m venv .venv"
5+
exit 1
6+
}
7+
8+
& $pythonExe -m src.main

src/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#Projeto desenvolvido por: [Hélio Júnior]
2+
#Plataforma de Estudos: [DIO - Digital Innovation One]

src/balancing.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#Projeto desenvolvido por: [Hélio Júnior]
2+
#Plataforma de Estudos: [DIO - Digital Innovation One]
3+
4+
import pandas as pd
5+
from imblearn.over_sampling import SMOTE
6+
7+
8+
def create_undersampled_dataframe(df, target_col: str = "Class", random_state: int = 42) -> pd.DataFrame:
9+
fraudes = df[df[target_col] == 1]
10+
normais = df[df[target_col] == 0].sample(n=len(fraudes), random_state=random_state)
11+
return pd.concat([fraudes, normais])
12+
13+
14+
def apply_smote(X, y):
15+
smote = SMOTE()
16+
return smote.fit_resample(X, y)

0 commit comments

Comments
 (0)