Agenda:    Milestone 3 Meeting Minutes

## Meeting 14

**Date:** November 7, 2025 (Friday, 3:00 PM EST)  
**Attendees:** Amro, Aseel, Caesar, Safia, Banu  

### Summary
- **Amro** presented the latest progress on his work with **Mistral 7B** integrated with **RAG**. The model demonstrated significantly improved accuracy and more contextually appropriate responses. Although latency increased (64–>152), this was deemed acceptable given the model’s large size and the team’s limited computational resources.  
- **Caesar** tested **CodeCarbon** and the **Emissions Tracker** on his model and briefed the group on his findings, that he was not always getting consistent results. And Amro suggested trying an **offline version of CodeCarbon** to address the issue.
- **Safia** continued her experiments with **SLM(TinyLlama) + RAG** and noted that she would begin testing the **unified test prompts** next.  
- Based on **Evan’s feedback**, the team decided to **evaluate outputs rather than models** directly. It was noted by Amro that existing **Python-based evaluation libraries** yielded unsatisfactory results. Consequently, the group agreed to adopt a **hybrid evaluation approach**, combining **AI-based** and **human-based** methods.  
- The previously discussed **Google Form idea** received positive feedback from **Evan**, who considered it a strong and valuable **experimental component** of the project. The form will also serve as a **human-based evaluation tool**.  
- The **initial structure of the Google Form** was outlined:  
  - The first section will include a **brief project introduction** and general information.  
  - Subsequent sections will present **texts generated by both open-source and commercial models**, displayed in **random order**.  
  - Participants will be asked to **guess the source** of each text and rate it on several criteria, like **clarity, relevance, and accuracy**, using a **1–5 scale**.
  -  Following **Evan’s suggestion**, the **target audience** should be **diverse and multicultural**, so the form will be shared within the **cohort group** once finalized.  

### Action Items
- The team aims to **meet again tomorrow**, subject to everyone’s availability.  
- The **recursive model approach** will be **re-evaluated**.  
- **README files** will be prepared for all selected models in the repository, providing clear information about each model’s architecture, functionality, and key characteristics.
- Work on the **Google Form** will begin, with a **target publication date of November 15**.  
 
### Future Work
- Once published, the form will remain open for **two weeks**, during which responses will be collected and analyzed.  
- In the **first week of December(final week of ELO2)**—the collected data will be **manually analyzed**, and the team will write the **research article** based on the findings.  

---

## Meeting 15  

**Date:** November 9, 2025 (Sunday, 2:30 PM EST)  
**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia  

### Summary  
- Due to members’ schedule constraints, the meeting originally planned for Saturday was held today instead.  
- A brief recap of Meeting 14 was provided.  
- The team reviewed the overall working plan and discussed the status of the three models currently under testing.  
- Caesar reported that the CodeCarbon issue was resolved after applying Amro’s earlier suggestion. However, his distilled model was unable to generate meaningful outputs for the creative thinking task at first.  
- Amro suggested that Caesar adjust and refine the prompts used with the model. After modifying the guidance prompt, temperature, and a few other parameters during the meeting, Caesar’s model showed improved generation results, producing two correct responses out of three creative questions in real time.  
- Amro shared that his Mistral 7B model, which was already performing well overall, started producing strong results for the creative generation task after implementing the new guidance prompts.  
- Reem presented [her proposal](https://docs.google.com/document/d/1s3c8C19xUwm474WNLbx-QjqRUjhfkGlUeK7FjU3I2Cs/edit?usp=drivesdk) to modify the existing recursive reasoning method into a recursive editing approach. She summarized possible directions for its integration, explaining that the process would involve:  
  - One model generating an initial draft.  
  - A feedback provider model offering feedback on the draft.  
  - A refinement cycle combining the initial output and feedback to produce an improved version.  
  - The cycle repeating iteratively until a high-quality output is achieved or a set number of iterations is reached.  
  - Evaluation criteria varying depending on the specific task and its metrics.  
- The group reviewed the current model lineup:  
  - **Caesar:** LaMini (open-source distilled model) + RAG  
  - **Amro:** Mistral 7B (open-source) + RAG  
  - **Safia:** TinyLlama (open-source SLM) + RAG  
- The team agreed to apply the recursive editing framework to Safia’s model as the fourth setup for further testing and evaluation.  
- The team also discussed the approach for human evaluation, focusing on assessing the accuracy and quality of outputs based on participants’ opinions. It was agreed to complete final testing during the week.  

### Action Items  
- Reem, Aseel, and Banu will implement the recursive editing framework on the small language model (TinyLlama + RAG + Recursive Editing).  
- Caesar will apply the same recursive editing methodology to the distilled model.  
- Amro will test the approach on the Mistral model setup.  
- All members will prepare model outputs and documentation by the next meeting on **Friday, November 14**.  
- The agenda for Friday’s meeting will focus on reviewing results and preparing the **Google Form**, which will be finalized on **Saturday, November 15**, to collect structured human feedback on model outputs.  

---

## Meeting 16

**Date:** November 14, 2025 (Thursday, 2:30 PM EST)  
**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia  

### Summary
- Following the last meeting, through asynchronous work and communication via WhatsApp, it was determined that the T**inyLlama model was not suitable for recursive editing**. As a result, it was decided to **switch to Microsoft's Phi-3 model**, but shortly after, the model was **removed from Hugging Face** and became unavailable. Consequently, the search for another suitable SLM began and finally the new SLMs were confirmed: **Reem** is working on **Microsoft's Qwen model**, while **Aseel** is implementing **Google's Gemma model** .
- **Caesar** reported facing **token limitation challenges** when implementing recursive editing on his model.
- **Reem** presented updates on her **recursive cycle experimentation** and discussed the technical problems she encountered, particularly regarding **context window tuning** and **token allocation** across generation, feedback, and refinement stages.
- **Amro** proposed a strategic approach: **using different specialized small models for specific tasks** (such as text criticism and refinement) rather than relying on large generalized models, aligning with current AI development trends. He suggested **offloading the retrieval and initial generation tasks** to users (allowing them to copy-paste from documents), then u**sing a small model specifically to criticize and refine the generated text** (similar to writing an email and asking ChatGPT to review and fix it.)This approach acknowledges that small models lack the capacity for full answer generation but could excel at focused refinement tasks.
- Additional technical discussions covered **context window tuning**, **max token adjustments**, and restoring model capacity by separating loops into different cells.
- The group reaffirmed that **GPUs remain significantly more effective than CPUs** for model execution and testing.
- **Google Form methodology** was started to be discussed: The team decided to use **vague model descriptions** in the evaluation form to prevent participant bias during assessment.

### Action Items
- The team will **reconvene tomorrow at 2:30 PM EST** to finalize the work and clarify remaining tasks, specifically regarding the Google Form.
- **Safia and Banu** will work on the **Google Form** after tomorrow meeting. It must be finalized and **sent to Evan for review by Monday** before posting in public groups.
- **Reem and Aseel** will continue working on their respective models.
- All members to finalize **model implementations and documentation**.

---

## Meeting 17

**Date:** November 15, 2025 (Wednesday, 2:30 PM EST)  
**Attendees:** Amro, Aseel, Caesar, Reem, Banu  

### Summary
- **Aseel** confirmed that she pushed her model and related work to the repository earlier in the day.  
- **Reem** reported that she is in the process of finalizing her model outputs and preparing them for upload.  
- **Amro** noted that he made improvements to the documentation for his model, refining clarity and organization.  
- The team discussed the **structure of the Google Form**. Key points included:  
  - Whether all texts in the form should come exclusively from the **open-source models** the team has been working on,  
  - or whether the form should also include **commercial model outputs**, presented in a mixed format.  
  - The group agreed to seek **Evan’s guidance** to determine the most appropriate design. The final form structure will be shaped according to his recommendation.  

### Action Items
- **Create the first Google Form draft tomorrow** and prepare to present it on Monday.  
- **Finalize and publish the form on Monday**, after incorporating Evan’s feedback.  
- **Finalize the model documentation** in the shortest time possible and **initiate the overall repository restructuring**.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agenda: Milestone 3 Meeting Minutes #17

Meeting 14

Summary

Action Items

Future Work

Meeting 15

Summary

Action Items

Meeting 16

Summary

Action Items

Meeting 17

Summary

Action Items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agenda: Milestone 3 Meeting Minutes #17

Description

Meeting 14

Summary

Action Items

Future Work

Meeting 15

Summary

Action Items

Meeting 16

Summary

Action Items

Meeting 17

Summary

Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions