Skip to content

Commit 401d4b6

Browse files
authored
Merge pull request #2016 from oracle-devrel/sent-categ-update
Major update
2 parents dac5cba + f02c146 commit 401d4b6

12 files changed

Lines changed: 432 additions & 617 deletions

File tree

ai/generative-ai-service/sentiment-categorization/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@
33
The Customer Message Analyzer is a tool designed to analyze customer messages through unsupervised categorization, sentiment analysis, and summary reporting. It helps businesses understand customer feedback without requiring extensive manual labeling or analysis.
44

55

6-
Reviewed: 01.04.2025
7-
6+
Reviewed: 19.09.2025
7+
8+
<img width="2542" height="1202" alt="image" src="https://github.com/user-attachments/assets/bdb7dbb0-78ec-4896-bb93-927bf75c31d9" />
9+
810
# When to use this asset?
911

1012
Customer service teams, product managers, and marketing professionals would use this asset when they need to quickly understand large volumes of customer feedback, identify trends, and make data-driven decisions to improve products or services.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[client]
2+
showSidebarNavigation = false
3+
toolbarMode = "minimal"
4+
5+
[server]
6+
headless = true
Lines changed: 51 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,33 @@
11
# Customer-Agent Conversation Analysis and Categorization Demo
2-
This demo showcases an AI-powered solution for analyzing batches of customer messages, categorizing them into hierarchical levels, extracting sentiment scores, and generating structured reports.
32

4-
## Key Features
5-
* **Hierarchical Categorization**: Automatically categorizes messages into three levels of hierarchy:
6-
+ Primary Category: High-level categorization
7-
+ Secondary Category: Mid-level categorization, building upon primary categories
8-
+ Tertiary Category: Low-level categorization, providing increased specificity and detail
9-
* **Sentiment Analysis**: Extracts sentiment scores for each message, ranging from very negative (1) to very positive (10)
10-
* **Structured Reporting**: Generates a comprehensive report analyzing the batch of messages, including:
11-
+ Category distribution across all three levels
12-
+ Sentiment score distribution
13-
+ Summaries of key findings and insights
14-
15-
## Data Requirements
16-
* Customer messages should be stored in a CSV file(s) within a folder named `data`.
17-
* Each CSV file should contain a column with the message text.
18-
19-
## Python Version
20-
This project requires **Python 3.13** or later. You can check your current Python version by running:
21-
```
22-
python --version
23-
```
24-
or
25-
```
26-
python3 --version
27-
```
3+
This demo showcases an AI-powered solution for analyzing batches of customer messages, categorizing them into hierarchical levels, extracting sentiment scores, and generating structured reports. The latest version adds a professional, corporate UI theme, CSV upload/validation in the sidebar, and step-aware progress feedback during processing.
284

29-
## Getting Started
30-
To run the demo, follow these steps:
31-
1. Clone the repository using `git clone`.
32-
2. *(Optional but recommended)* Create and activate a Python virtual environment:
33-
- On Windows:
34-
```
35-
python -m venv venv
36-
venv\Scripts\activate
37-
```
38-
- On macOS/Linux:
39-
```
40-
python3 -m venv venv
41-
source venv/bin/activate
42-
```
43-
3. Place your CSV files containing customer messages in the `data` folder. Ensure each includes a column with the message text.
44-
4. Install dependencies using `pip install -r requirements.txt`.
45-
5. Run the application using `streamlit run app.py`.
5+
## Key Features
6+
- Hierarchical Categorization
7+
- Primary Category: High-level categorization
8+
- Secondary Category: Mid-level categorization, building upon primary categories
9+
- Tertiary Category: Low-level categorization, providing increased specificity and detail
10+
- Sentiment Analysis
11+
- Extracts sentiment scores for each message, from very negative (1) to very positive (10)
12+
- Structured Reporting
13+
- Category distribution across all three levels
14+
- Sentiment score distribution
15+
- Summaries of key findings and insights
16+
- CSV Upload and Validation
17+
- Upload CSV in the sidebar; validates required columns ID and Message before running
18+
- Displays a preview in the sidebar and a full interactive table in the main area
19+
- Execution Progress and Status
20+
- Step-aware progress bar with status text showing the currently running stage and total steps
4621

47-
## Example Use Cases
48-
* Analyze customer feedback from surveys, reviews, or social media platforms to identify trends and patterns.
49-
* Inform product development and customer support strategies by understanding customer sentiment and preferences.
50-
* Optimize marketing campaigns by targeting specific customer segments based on their interests and concerns.
5122

5223
## Technical Details
53-
* The solution leverages Oracle Cloud Infrastructure (OCI) GenAI, a suite of AI services designed to simplify AI adoption.
54-
* Specifically, this demo utilizes the Cohere R+ model, a state-of-the-art language model optimized for natural language processing tasks.
55-
* All aspects of the demo, including:
56-
+ Hierarchical categorization
57-
+ Sentiment analysis
58-
+ Structured report generation are powered by GenAI, ensuring accurate and efficient analysis of customer messages.
59-
24+
- Built on Oracle Cloud Infrastructure (OCI) GenAI services
25+
- End-to-end flow powered by GenAI for:
26+
- Hierarchical categorization
27+
- Sentiment analysis
28+
- Structured report generation
6029

6130
## Project Structure
62-
63-
The repository is organized as follows:
64-
6531
```plaintext
6632
│ app.py # Main Streamlit application entry point
6733
│ README.md # Project documentation
@@ -70,31 +36,46 @@ The repository is organized as follows:
7036
├───backend
7137
│ │ feedback_agent.py # Logic for feedback processing agents
7238
│ │ feedback_wrapper.py # Wrappers and interfaces for feedback functionalities
73-
│ │ message_handler.py # Utilities for handling and preprocessing messages
7439
│ │
7540
│ ├───data
7641
│ │ complaints_messages.csv # Example dataset of customer messages
7742
│ │
7843
│ └───utils
7944
│ config.py # Configuration and setup for the project
80-
│ llm_config.py # Model- and LLM-related configuration
8145
│ prompts.py # Prompt templates for language models
8246
83-
└───pages
84-
SentimentByCat.py # Additional Streamlit page for sentiment by category
47+
└───data
48+
complaints_messages.csv # Example dataset of customer messages
8549
```
86-
## Output
87-
The demo will display an interactive dashboard with the generated report, providing valuable insights into customer messages, including:
88-
* Category distribution across all three levels
89-
* Sentiment score distribution
90-
* Summaries of key findings and insights
9150

92-
## Contributing
93-
We welcome contributions to improve and expand the capabilities of this demo. Please fork the repository and submit a pull request with your changes.
51+
## Getting Started
52+
1. Clone the repository using git clone.
53+
2. (Optional) Create and activate a Python virtual environment:
54+
- Windows:
55+
- python -m venv venv
56+
- venv\Scripts\activate
57+
- macOS/Linux:
58+
- python3 -m venv venv
59+
- source venv/bin/activate
60+
3. Place your CSV files in the data folder. Ensure each includes the required columns ID and Message.
61+
4. Install dependencies with pip install -r requirements.txt.
62+
5. Run the application with `streamlit run app.py`.
63+
64+
## Data Requirements
65+
- Input format: CSV with two columns: ID and Message
66+
- The app validates:
67+
- File extension is CSV
68+
- Both required columns are present
69+
- The full dataset is visualized in the main view after successful validation.
70+
71+
## Output
72+
The dashboard displays an interactive report with:
73+
- Category distribution across all three levels
74+
- Sentiment score distribution
75+
- Summaries of key findings and insights
76+
- Step-by-step execution status and overall progress of the analysis run
9477

9578
## License
9679
Copyright (c) 2025 Oracle and/or its affiliates.
97-
9880
Licensed under the Universal Permissive License (UPL), Version 1.0.
99-
100-
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
81+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Lines changed: 208 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,216 @@
1+
import json
2+
import pandas as pd
13
import streamlit as st
4+
import plotly.express as px
5+
from backend.feedback_wrapper import FeedbackAgentWrapper
26

3-
st.set_page_config(
4-
page_title="Hello",
5-
page_icon="👋",
7+
st.set_page_config(page_title="Feedback Dashboard", page_icon="📊", layout="wide")
8+
9+
def load_styles():
10+
try:
11+
with open("styles.css", "r", encoding="utf-8") as f:
12+
st.markdown(f"<style>{f.read()}</style>", unsafe_allow_html=True)
13+
except Exception:
14+
st.markdown(
15+
"<style>.main-header{border-bottom:3px solid #C74634;padding:1rem 0}</style>",
16+
unsafe_allow_html=True,
17+
)
18+
19+
load_styles()
20+
21+
st.markdown(
22+
"""
23+
<div class="main-header">
24+
<div class="header-content">
25+
<h1>Customer Feedback Dashboard</h1>
26+
<p>Analyze customer sentiment, insights, and trends across categories</p>
27+
</div>
28+
</div>
29+
""",
30+
unsafe_allow_html=True,
31+
)
32+
33+
st.sidebar.markdown('<div class="stMarkdown"><h3>Data Input</h3></div>', unsafe_allow_html=True)
34+
uploaded_file = st.sidebar.file_uploader("Upload CSV (required columns: ID, Message)", type=["csv"])
35+
36+
data_list = None
37+
df_uploaded = None
38+
valid_file = False
39+
40+
if uploaded_file is not None:
41+
try:
42+
df_uploaded = pd.read_csv(uploaded_file)
43+
required_columns = {"ID", "Message"}
44+
if not required_columns.issubset(set(df_uploaded.columns)):
45+
st.sidebar.error(f"CSV must include columns: {', '.join(required_columns)}")
46+
else:
47+
valid_file = True
48+
st.sidebar.success("File uploaded and validated successfully.")
49+
st.sidebar.markdown("</div>", unsafe_allow_html=True)
50+
51+
data_list = df_uploaded.values.tolist()
52+
except Exception as e:
53+
st.sidebar.error(f"An error occurred while processing the file: {e}")
54+
55+
st.sidebar.markdown('<div class="stMarkdown"><h4>Run</h4></div>', unsafe_allow_html=True)
56+
if "flow_completed" not in st.session_state:
57+
st.session_state.flow_completed = True
58+
59+
start_button = st.sidebar.button(
60+
"Start",
61+
disabled=not (st.session_state.flow_completed and valid_file),
662
)
763

8-
st.write("# Welcome to Streamlit! 👋")
64+
st.markdown('<div class="comparison-container">', unsafe_allow_html=True)
65+
66+
if df_uploaded is not None and valid_file:
67+
st.markdown("### Uploaded Data")
68+
dataset_exp = st.expander(uploaded_file.name, expanded=True)
69+
dataset_exp.dataframe(df_uploaded, height=200, use_container_width=True)
70+
71+
def display_category(data):
72+
if not isinstance(data, dict) or "categories" not in data:
73+
st.warning("No category data found in the report.")
74+
return
75+
76+
st.markdown('<div class="metrics-grid">', unsafe_allow_html=True)
77+
st.markdown("</div>", unsafe_allow_html=True)
78+
79+
for category in data.get("categories", []):
80+
with st.container():
81+
st.markdown(
82+
f'<div class="response-card finetuned"><div class="response-header">'
83+
f'<div class="model-name">{category.get("category_level_1", "Unknown")}</div>'
84+
f'</div><div class="response-content">',
85+
unsafe_allow_html=True,
86+
)
87+
st.write(category.get("summary", ""))
988

10-
st.sidebar.success("Select a demo above.")
89+
col1, col2, col3 = st.columns(3)
90+
with col1:
91+
avg = category.get("average_sentiment_score", None)
92+
if avg is not None:
93+
st.metric("Avg Sentiment Score", avg, delta=None)
94+
st.progress(min(max(avg / 10, 0.0), 1.0))
95+
with col2:
96+
high = category.get("highest_sentiment_message", {})
97+
st.success(f"Highest Sentiment: {high.get('sentiment_score', 'N/A')}")
98+
st.write(f"“{high.get('summary', '')}”")
99+
with col3:
100+
low = category.get("lowest_sentiment_message", {})
101+
st.error(f"Lowest Sentiment: {low.get('sentiment_score', 'N/A')}")
102+
st.write(f"“{low.get('summary', '')}”")
11103

104+
st.markdown("#### Key Insights")
105+
for insight in category.get("key_insights", []):
106+
st.info(f"• {insight}")
107+
108+
st.markdown("#### Subcategories Breakdown")
109+
for subcategory in category.get("subcategories", []):
110+
with st.expander(
111+
f"{subcategory.get('category_level_2','(Unknown)')} "
112+
f"(Avg: {subcategory.get('average_sentiment_score','N/A')})"
113+
):
114+
st.write(subcategory.get("summary", ""))
115+
st.markdown("</div></div>", unsafe_allow_html=True)
116+
117+
def display_sentiment(df: pd.DataFrame):
118+
if df.empty:
119+
st.warning("No sentiment data to display.")
120+
return
121+
fig = px.bar(
122+
df,
123+
x="id",
124+
y="sentiment_score",
125+
color="sentiment_score",
126+
text="topic",
127+
labels={"id": "Id", "sentiment_score": "Sentiment Score (1-10)"},
128+
title="Sentiment Scores per Feedback Category",
129+
)
130+
fig.update_layout(
131+
margin=dict(l=10, r=10, t=50, b=10),
132+
legend_title_text="Score",
133+
)
134+
fig.update_traces(textposition="inside")
135+
st.plotly_chart(fig, use_container_width=True)
136+
137+
if start_button and st.session_state.flow_completed and valid_file:
138+
st.session_state.flow_completed = False
139+
140+
agent = FeedbackAgentWrapper(data_list)
141+
steps, edges = agent.get_nodes_edges()
142+
steps = steps[1:-1]
143+
outputs = []
144+
current_step = steps[0] if steps else "summarize"
145+
146+
status_placeholder = st.empty()
147+
progress_bar = st.progress(0)
148+
total_steps = len(steps) if steps else 1
149+
step_counter = 0
150+
151+
while current_step != "FINALIZED":
152+
status_placeholder.markdown(
153+
f'<div class="response-meta">Running step: <strong>{current_step}</strong> '
154+
f'({step_counter}/{total_steps})</div>',
155+
unsafe_allow_html=True,
156+
)
157+
next_step, output = agent.run_step_by_step()
158+
if not output:
159+
current_step = "FINALIZED"
160+
else:
161+
outputs.append(output)
162+
current_step = next_step
163+
step_counter += 1
164+
progress_bar.progress(min(step_counter / max(total_steps, 1), 1.0))
165+
166+
progress_bar.progress(1.0)
167+
status_placeholder.markdown(
168+
f'<div class="response-meta">Completed {step_counter} of {total_steps} steps.</div>',
169+
unsafe_allow_html=True,
170+
)
171+
172+
def find_report(objs):
173+
for o in objs:
174+
for v in o.values():
175+
if isinstance(v, dict) and "reports" in v:
176+
return v["reports"]
177+
return None
178+
179+
report_list = find_report(outputs) or []
180+
if report_list:
181+
try:
182+
categories = json.loads(report_list[0])
183+
st.markdown("### Report")
184+
display_category(categories)
185+
except json.JSONDecodeError:
186+
st.error("Report is not valid JSON.")
187+
188+
def find_summaries(objs):
189+
for o in objs:
190+
if "summarize" in o and "messages_info" in o["summarize"]:
191+
return o["summarize"]["messages_info"]
192+
return []
193+
194+
summaries = find_summaries(outputs)
195+
try:
196+
df = pd.DataFrame([s if isinstance(s, dict) else s.dict() for s in summaries])
197+
except Exception:
198+
df = pd.DataFrame()
199+
200+
if not df.empty:
201+
st.markdown("### Sentiment Overview")
202+
display_sentiment(df)
203+
204+
st.session_state.flow_completed = True
205+
206+
# Footer
12207
st.markdown(
13208
"""
14-
This is a demo!
15-
"""
16-
)
209+
<div class="oracle-footer">
210+
© Oracle Corporation | Technology Engineering | OCI Generative AI Services
211+
</div>
212+
""",
213+
unsafe_allow_html=True,
214+
)
215+
216+
st.markdown("</div>", unsafe_allow_html=True)

0 commit comments

Comments
 (0)