Skip to content

Commit a4512fd

Browse files
authored
Merge branch 'main' into experiment2
2 parents de0124f + b140c38 commit a4512fd

File tree

12 files changed

+431164
-10
lines changed

12 files changed

+431164
-10
lines changed

.github/workflows/fly-deploy.yml

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,26 @@
1-
# See https://fly.io/docs/app-guides/continuous-deployment-with-github-actions/
1+
name: Deploy to Fly.io
22

3-
name: Fly Deploy
43
on:
54
push:
65
branches:
7-
- main
6+
- main # Ensure this matches your deployment branch
7+
88
jobs:
99
deploy:
10-
name: Deploy app
10+
name: Deploy app to Fly.io
1111
runs-on: ubuntu-latest
12-
concurrency: deploy-group # optional: ensure only one action runs at a time
12+
concurrency: deploy-group # Ensures only one action runs at a time
13+
1314
steps:
14-
- uses: actions/checkout@v4
15-
- uses: superfly/flyctl-actions/setup-flyctl@master
16-
- run: flyctl deploy --remote-only
15+
- name: Checkout repository
16+
uses: actions/checkout@v4
17+
18+
- name: Install Flyctl
19+
uses: superfly/flyctl-actions/setup-flyctl@master
20+
21+
- name: Deploy to Fly.io
1722
env:
18-
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
23+
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }} # Ensure it uses the updated token
24+
run: |
25+
flyctl auth whoami # Verifies Fly.io authentication before deploying
26+
flyctl deploy --remote-only

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,5 +70,8 @@ build/
7070
*.log
7171
gemini_api.key.txt
7272

73+
# nova
74+
.nova
7375

74-
.vercel
76+
# vercel
77+
.vercel

experiment-1/Boston_Crime_Cleaned_v2.csv

Lines changed: 215207 additions & 0 deletions
Large diffs are not rendered by default.

experiment-1/Dockerfile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# ✅ Use the official Python slim image
2+
FROM python:3.11-slim
3+
4+
# ✅ Set working directory inside the container
5+
WORKDIR /app/ReThink_AI_Chatbot
6+
7+
# ✅ Copy everything from the current directory to the container
8+
COPY . .
9+
10+
# ✅ Ensure the database folder exists
11+
RUN mkdir -p /app/ReThink_AI_Chatbot/db
12+
13+
# ✅ Explicitly copy the CSV file
14+
COPY db/Boston_Crime_Cleaned_v2.csv /app/ReThink_AI_Chatbot/db/Boston_Crime_Cleaned_v2.csv
15+
16+
# ✅ Install dependencies
17+
RUN pip install --no-cache-dir -r requirements.txt
18+
19+
# ✅ Expose the correct port (Fly.io expects 8080)
20+
EXPOSE 8080
21+
22+
# ✅ Start the Flask app
23+
CMD ["gunicorn", "-b", "0.0.0.0:8080", "app:app"]

experiment-1/Procfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
web: gunicorn app:app

experiment-1/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# RethinkAI
2+
3+
# 🔥 LLM Chatbot using Flask & Gemini API
4+
5+
This project is a **Flask-based chatbot** that interacts with **Google Gemini API** to provide intelligent responses. It processes user questions dynamically using a dataset stored in memory and allows users to provide feedback on responses.
6+
7+
## 📌 Features
8+
9+
-**Flask-based API** to interact with Google Gemini LLM.
10+
-**Gemini API Integration** for generating responses.
11+
-**Logging System** to track user queries and responses.
12+
-**Feedback System** (👍 / 👎) to collect user feedback.
13+
14+
---
15+
16+
## 🚀 Getting Started
17+
18+
### **1️⃣ Clone the Repository**
19+
20+
```sh
21+
git clone https://github.com/yourusername/llm-flask-chatbot.git
22+
cd llm-flask-chatbot
23+
24+
```
25+
26+
### **2️⃣ Create & Activate a Virtual Environment**
27+
28+
```sh
29+
python3 -m venv venv
30+
source venv/bin/activate # On Mac/Linux
31+
venv\Scripts\activate # On Windows
32+
```
33+
34+
### **3️⃣ Install Dependencies**
35+
36+
pip install -r requirements.txt
37+
38+
### **4️⃣ Setup Environment Variables**
39+
40+
- Create a .env file in the project root
41+
42+
```sh
43+
nano .env
44+
```
45+
46+
- Add the following:
47+
48+
```sh
49+
GEMINI_API_KEY=your_google_gemini_api_key
50+
```

experiment-1/app.py

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# raw dataset is being sent to the Gemini model directly.
2+
from flask import Flask, request, jsonify, render_template
3+
import os
4+
import pandas as pd
5+
import google.generativeai as genai
6+
import asyncio
7+
from datetime import datetime
8+
import uuid
9+
from dotenv import load_dotenv
10+
11+
APP_VERSION = "0.01"
12+
13+
# ✅ Load environment variables
14+
load_dotenv()
15+
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
16+
genai.configure(api_key=GEMINI_API_KEY)
17+
18+
app = Flask(__name__, template_folder="templates")
19+
20+
LOG_FILE = "llm_query_log.csv"
21+
22+
# ✅ Load CSV function
23+
def load_csv(file_path, max_rows=1000):
24+
"""Loads a CSV file and limits rows to fit within Gemini's token limit."""
25+
try:
26+
df = pd.read_csv(file_path)
27+
df = df.head(max_rows)
28+
print(f"\n✅ Loaded {len(df)} rows (limited to avoid quota issues).")
29+
return df
30+
except Exception as e:
31+
print(f"❌ Error loading CSV: {e}")
32+
return None
33+
34+
# ✅ Determine dataset path (works for local & server)
35+
if os.path.exists("/app/ReThink_AI_Chatbot/db/Boston_Crime_Cleaned_v2.csv"): # ✅ Adjust for Fly.io
36+
file_path = "/app/ReThink_AI_Chatbot/db/Boston_Crime_Cleaned_v2.csv"
37+
else:
38+
file_path = os.path.join(os.getcwd(), "db", "Boston_Crime_Cleaned_v2.csv") # ✅ Local fallback
39+
40+
41+
42+
df = load_csv(file_path, max_rows=1000) # ✅ Load dataset globally
43+
44+
# ✅ Generate dataset prompt function
45+
def generate_initial_prompt(df):
46+
"""Converts the dataset into a format suitable for Gemini input."""
47+
dataset_text = df.to_string(index=False)
48+
49+
dataset_prompt = f"""
50+
You are a data analysis assistant. Below is a dataset containing {df.shape[0]} rows and {df.shape[1]} columns.
51+
52+
**Dataset Columns:** {', '.join(df.columns)}
53+
54+
**Dataset Preview:**
55+
{dataset_text}
56+
57+
This dataset will be used for answering multiple questions.
58+
When asked a question related to the dataset, just explain your findings and don't give the code to be used on the dataset.
59+
Please answer based on this dataset.
60+
"""
61+
62+
return dataset_prompt
63+
64+
# ✅ Send prompt to Gemini
65+
async def get_gemini_response(prompt):
66+
"""Sends the prompt to Google Gemini and returns the response."""
67+
try:
68+
model = genai.GenerativeModel("gemini-1.5-pro")
69+
loop = asyncio.get_event_loop()
70+
response = await loop.run_in_executor(None, lambda: model.generate_content(prompt))
71+
72+
print(f"\n✅ Gemini Response: {response.text}") # ✅ Log the response!
73+
74+
return response.text
75+
except Exception as e:
76+
print(f"❌ Error generating response: {e}") # ✅ Log the error!
77+
return f"❌ Error generating response: {e}"
78+
79+
80+
# ✅ Log query function
81+
def log_query(question, answer):
82+
"""Logs the question, answer, timestamp, and assigns a unique query ID."""
83+
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
84+
query_id = str(uuid.uuid4())[:8] # Unique short ID for each question
85+
86+
log_entry = pd.DataFrame([{
87+
"Query_ID": query_id,
88+
"Timestamp": timestamp,
89+
"Question": question,
90+
"Answer": answer,
91+
"Feedback": ""
92+
}])
93+
94+
if not os.path.exists(LOG_FILE):
95+
log_entry.to_csv(LOG_FILE, index=False)
96+
else:
97+
log_entry.to_csv(LOG_FILE, mode='a', header=False, index=False)
98+
99+
return query_id
100+
101+
# ✅ Flask Routes
102+
@app.route("/")
103+
def home():
104+
"""Serves the chatbot frontend with version info."""
105+
return render_template("index.html", version=APP_VERSION)
106+
107+
@app.route("/ask", methods=["POST"])
108+
def ask():
109+
"""Handles user questions and sends them to the LLM."""
110+
try:
111+
data = request.get_json()
112+
user_question = data.get("question", "")
113+
114+
if not user_question:
115+
return jsonify({"error": "No question provided"}), 400
116+
117+
print(f"🔄 Processing user question: {user_question}")
118+
119+
# ✅ Use dataset in the prompt
120+
dataset_prompt = generate_initial_prompt(df)
121+
full_prompt = f"{dataset_prompt}\n\nUser question: {user_question}"
122+
123+
loop = asyncio.new_event_loop()
124+
asyncio.set_event_loop(loop)
125+
response = loop.run_until_complete(get_gemini_response(full_prompt))
126+
127+
if "❌ Error" in response:
128+
print(f"❌ ERROR from Gemini API: {response}")
129+
return jsonify({"error": response}), 500
130+
131+
query_id = log_query(user_question, response)
132+
return jsonify({"answer": response, "query_id": query_id})
133+
134+
except Exception as e:
135+
print(f"❌ Exception in /ask: {e}")
136+
return jsonify({"error": f"Internal server error: {e}"}), 500
137+
138+
@app.route("/crime-stats", methods=["GET"])
139+
def crime_stats():
140+
"""Dynamically calculates key crime insights."""
141+
if df is None:
142+
return jsonify({"error": "Dataset not loaded"}), 500
143+
144+
most_common_crime = df["Crime"].mode()[0] if "Crime" in df.columns else "N/A"
145+
peak_hour = df["Hour"].mode()[0] if "Hour" in df.columns else "N/A"
146+
most_affected_area = df["Neighborhood"].mode()[0] if "Neighborhood" in df.columns else "N/A"
147+
148+
return jsonify({
149+
"most_common_crime": most_common_crime,
150+
"peak_hour": f"{peak_hour}:00 - {peak_hour+1}:00" if isinstance(peak_hour, int) else "N/A",
151+
"most_affected_area": most_affected_area
152+
})
153+
154+
@app.route("/feedback", methods=["POST"])
155+
def feedback():
156+
"""Stores user feedback in the log file using Query_ID."""
157+
data = request.get_json()
158+
query_id = data.get("query_id", "")
159+
feedback = data.get("feedback", "")
160+
161+
if not query_id or not feedback:
162+
return jsonify({"error": "Invalid feedback"}), 400
163+
164+
if not os.path.exists(LOG_FILE):
165+
return jsonify({"error": "Log file not found"}), 500
166+
167+
try:
168+
log_df = pd.read_csv(LOG_FILE, dtype=str, encoding="utf-8", on_bad_lines="skip")
169+
except pd.errors.ParserError:
170+
return jsonify({"error": "CSV file corrupted, please check formatting"}), 500
171+
172+
if "Query_ID" not in log_df.columns or query_id not in log_df["Query_ID"].values:
173+
return jsonify({"error": "Query ID not found"}), 400
174+
175+
log_df.loc[log_df["Query_ID"] == query_id, "Feedback"] = feedback
176+
log_df.to_csv(LOG_FILE, index=False)
177+
return jsonify({"success": "Feedback recorded"})
178+
179+
# ✅ Run Flask App (Supports Local & Server Deployment)
180+
def start_app():
181+
"""Starts the Flask application (for local & server)."""
182+
print("\n🚀 Server is running on 0.0.0.0:8080") # ✅ Fixed!
183+
app.run(host="0.0.0.0", port=8080)
184+
185+
if __name__ == "__main__":
186+
start_app()

0 commit comments

Comments
 (0)