Skip to content

Commit c1538c5

Browse files
committed
feat(meetings): add Milestone 3 meeting minutes
1 parent e6386f1 commit c1538c5

1 file changed

Lines changed: 155 additions & 0 deletions

File tree

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
<!-- markdownlint-disable MD024 -->
2+
<!-- Disabled MD024 (Multiple headings with the same content) rule
3+
because repeated headings (Summary, Action Items) are intentionally used
4+
across multiple sections for structural clarity.
5+
-->
6+
# Milestone 3 Meeting Minutes
7+
8+
## Meeting 14
9+
10+
**Date:** November 7, 2025 (Friday, 3:00 PM EST)
11+
**Attendees:** Amro, Aseel, Caesar, Safia, Banu
12+
13+
### Summary
14+
15+
- **Amro** presented the latest progress on his work with **Mistral 7B** integrated
16+
with **RAG**. The model showed improved accuracy and more contextually aligned
17+
responses. Although latency increased (64–>152), it was considered acceptable
18+
given the model’s size and the team’s limited compute.
19+
- **Caesar** tested **CodeCarbon** and the **Emissions Tracker** and reported
20+
inconsistent results. Amro suggested trying an **offline version of
21+
CodeCarbon**.
22+
- **Safia** continued experiments with **SLM (TinyLlama) + RAG** and will begin
23+
testing the **unified test prompts** next.
24+
- Based on **Evan’s feedback**, the team decided to **evaluate outputs rather
25+
than models**. **Python evaluation libraries** produced weak results, so the
26+
team will use a **hybrid evaluation**: AI-based plus human-based.
27+
- The previously discussed **Google Form** idea received positive feedback from
28+
Evan and will serve as a **human-based evaluation tool**.
29+
- The **initial Google Form structure** was outlined:
30+
- Intro section with a short project description.
31+
- Following sections will show **model-generated texts** (open-source and
32+
commercial) in **random order**.
33+
- Participants will **guess the source** and rate clarity, relevance, and
34+
accuracy on a **1–5 scale**.
35+
- Per Evan’s suggestion, the **target audience** should be **diverse and
36+
multicultural**, so the form will be shared in the **cohort group**.
37+
38+
### Action Items
39+
40+
- The team aims to **meet again tomorrow**, depending on availability.
41+
- The **recursive model approach** will be **re-evaluated**.
42+
- **README files** will be prepared for all selected models in the repository.
43+
- Work on the **Google Form** will begin, targeting a **November 15**
44+
publication date.
45+
46+
### Future Work
47+
48+
- Once published, the form will remain open for **two weeks** for response
49+
collection and analysis.
50+
- In the **first week of December (final week of ELO2)**, the data will be
51+
analyzed manually and used to write the **research article**.
52+
53+
---
54+
55+
## Meeting 15
56+
57+
**Date:** November 9, 2025 (Sunday, 2:30 PM EST)
58+
**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia
59+
60+
### Summary
61+
62+
- Due to scheduling conflicts, the meeting planned for Saturday was held today.
63+
- A brief recap of Meeting 14 was provided.
64+
- The team reviewed the general work plan and discussed the three models under
65+
testing.
66+
- Caesar reported that the CodeCarbon issue was resolved using Amro’s prior
67+
suggestion. His distilled model initially failed on creative tasks.
68+
- Amro recommended refining prompts. After adjusting guidance, temperature, and
69+
other parameters live during the meeting, Caesar’s model improved and produced
70+
two correct creative outputs out of three.
71+
- Amro shared that Mistral 7B also improved in creative tasks after adopting the
72+
new guidance prompts.
73+
- Reem presented her proposal to modify the recursive reasoning method into a
74+
**recursive editing** approach. The process involves:
75+
- One model generating a draft.
76+
- A feedback provider model reviewing it.
77+
- A refinement cycle combining draft and feedback.
78+
- Iteration until quality is high or a limit is reached.
79+
- Evaluation criteria vary by task.
80+
- The current model lineup:
81+
- **Caesar:** LaMini (distilled open-source) + RAG
82+
- **Amro:** Mistral 7B + RAG
83+
- **Safia:** TinyLlama (SLM) + RAG
84+
- The team agreed to apply the recursive editing framework to Safia’s model as
85+
the **fourth setup**.
86+
- Human evaluation plans were discussed, focusing on participant-based accuracy
87+
and quality assessment. Final testing will be completed this week.
88+
89+
### Action Items
90+
91+
- Reem, Aseel, and Banu will implement **recursive editing** on the small model
92+
(TinyLlama + RAG + Recursive Editing).
93+
- Caesar will apply recursive editing to the distilled model.
94+
- Amro will test the method on Mistral.
95+
- All members will prepare outputs and documentation before **Friday,
96+
November 14**.
97+
- Friday’s meeting will review results and prepare the **Google Form**, which
98+
will be finalized on **Saturday, November 15**.
99+
100+
---
101+
102+
## Meeting 16
103+
104+
**Date:** November 14, 2025 (Thursday, 2:30 PM EST)
105+
**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia
106+
107+
### Summary
108+
109+
- After asynchronous discussions, the team determined that **TinyLlama is not
110+
suitable** for recursive editing. The team switched to **Microsoft’s Phi-3**,
111+
but it was removed from Hugging Face. New SLMs were selected:
112+
**Reem → Microsoft Qwen**, **Aseel → Google Gemma**.
113+
- **Caesar** reported **token limitation issues** during recursive editing.
114+
- **Reem** shared updates on recursive cycle experiments and issues related to
115+
**context window tuning** and **token allocation**.
116+
- **Amro** proposed using **specialized small models** for focused tasks
117+
(critique and refinement) instead of relying on large models. Retrieval and
118+
initial generation could be done by the user, while a small model refines the
119+
text.
120+
- Technical discussions included context window tuning, max-token adjustments,
121+
and restoring capacity by splitting loops across notebook cells.
122+
- The team reaffirmed that **GPUs outperform CPUs** significantly.
123+
- The team began discussing the **Google Form methodology**, deciding to use
124+
**vague model descriptions** to avoid bias.
125+
126+
### Action Items
127+
128+
- Team will **meet again tomorrow at 2:30 PM EST** to finalize remaining work.
129+
- **Safia and Banu** will develop the **Google Form** after the meeting and send
130+
it to Evan by Monday.
131+
- **Reem and Aseel** will continue working on their models.
132+
- All members must finalize **implementations and documentation**.
133+
134+
---
135+
136+
## Meeting 17
137+
138+
**Date:** November 15, 2025 (Wednesday, 2:30 PM EST)
139+
**Attendees:** Amro, Aseel, Caesar, Reem, Banu
140+
141+
### Summary
142+
143+
- **Aseel** confirmed she pushed her model and related work earlier today.
144+
- **Reem** reported that she is finalizing outputs for upload.
145+
- **Amro** improved his model documentation for clarity and organization.
146+
- The team discussed the **structure of the Google Form**, focusing on whether
147+
to include only **open-source model outputs** or also **commercial outputs**.
148+
The group will seek **Evan’s guidance** before finalizing the structure.
149+
150+
### Action Items
151+
152+
- **Create the first Google Form draft tomorrow** and prepare to present it on
153+
Monday.
154+
- **Publish the final form on Monday** after incorporating Evan’s feedback.
155+
- **Finalize all model documentation** and begin reorganizing the repository.

0 commit comments

Comments
 (0)