|
| 1 | +<!-- markdownlint-disable MD024 --> |
| 2 | +<!-- Disabled MD024 (Multiple headings with the same content) rule |
| 3 | +because repeated headings (Summary, Action Items) are intentionally used |
| 4 | +across multiple sections for structural clarity. |
| 5 | +--> |
| 6 | +# Milestone 3 Meeting Minutes |
| 7 | + |
| 8 | +## Meeting 14 |
| 9 | + |
| 10 | +**Date:** November 7, 2025 (Friday, 3:00 PM EST) |
| 11 | +**Attendees:** Amro, Aseel, Caesar, Safia, Banu |
| 12 | + |
| 13 | +### Summary |
| 14 | + |
| 15 | +- **Amro** presented the latest progress on his work with **Mistral 7B** integrated |
| 16 | + with **RAG**. The model showed improved accuracy and more contextually aligned |
| 17 | + responses. Although latency increased (64–>152), it was considered acceptable |
| 18 | + given the model’s size and the team’s limited compute. |
| 19 | +- **Caesar** tested **CodeCarbon** and the **Emissions Tracker** and reported |
| 20 | + inconsistent results. Amro suggested trying an **offline version of |
| 21 | + CodeCarbon**. |
| 22 | +- **Safia** continued experiments with **SLM (TinyLlama) + RAG** and will begin |
| 23 | + testing the **unified test prompts** next. |
| 24 | +- Based on **Evan’s feedback**, the team decided to **evaluate outputs rather |
| 25 | + than models**. **Python evaluation libraries** produced weak results, so the |
| 26 | + team will use a **hybrid evaluation**: AI-based plus human-based. |
| 27 | +- The previously discussed **Google Form** idea received positive feedback from |
| 28 | + Evan and will serve as a **human-based evaluation tool**. |
| 29 | +- The **initial Google Form structure** was outlined: |
| 30 | + - Intro section with a short project description. |
| 31 | + - Following sections will show **model-generated texts** (open-source and |
| 32 | + commercial) in **random order**. |
| 33 | + - Participants will **guess the source** and rate clarity, relevance, and |
| 34 | + accuracy on a **1–5 scale**. |
| 35 | + - Per Evan’s suggestion, the **target audience** should be **diverse and |
| 36 | + multicultural**, so the form will be shared in the **cohort group**. |
| 37 | + |
| 38 | +### Action Items |
| 39 | + |
| 40 | +- The team aims to **meet again tomorrow**, depending on availability. |
| 41 | +- The **recursive model approach** will be **re-evaluated**. |
| 42 | +- **README files** will be prepared for all selected models in the repository. |
| 43 | +- Work on the **Google Form** will begin, targeting a **November 15** |
| 44 | + publication date. |
| 45 | + |
| 46 | +### Future Work |
| 47 | + |
| 48 | +- Once published, the form will remain open for **two weeks** for response |
| 49 | + collection and analysis. |
| 50 | +- In the **first week of December (final week of ELO2)**, the data will be |
| 51 | + analyzed manually and used to write the **research article**. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Meeting 15 |
| 56 | + |
| 57 | +**Date:** November 9, 2025 (Sunday, 2:30 PM EST) |
| 58 | +**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia |
| 59 | + |
| 60 | +### Summary |
| 61 | + |
| 62 | +- Due to scheduling conflicts, the meeting planned for Saturday was held today. |
| 63 | +- A brief recap of Meeting 14 was provided. |
| 64 | +- The team reviewed the general work plan and discussed the three models under |
| 65 | + testing. |
| 66 | +- Caesar reported that the CodeCarbon issue was resolved using Amro’s prior |
| 67 | + suggestion. His distilled model initially failed on creative tasks. |
| 68 | +- Amro recommended refining prompts. After adjusting guidance, temperature, and |
| 69 | + other parameters live during the meeting, Caesar’s model improved and produced |
| 70 | + two correct creative outputs out of three. |
| 71 | +- Amro shared that Mistral 7B also improved in creative tasks after adopting the |
| 72 | + new guidance prompts. |
| 73 | +- Reem presented her proposal to modify the recursive reasoning method into a |
| 74 | + **recursive editing** approach. The process involves: |
| 75 | + - One model generating a draft. |
| 76 | + - A feedback provider model reviewing it. |
| 77 | + - A refinement cycle combining draft and feedback. |
| 78 | + - Iteration until quality is high or a limit is reached. |
| 79 | + - Evaluation criteria vary by task. |
| 80 | +- The current model lineup: |
| 81 | + - **Caesar:** LaMini (distilled open-source) + RAG |
| 82 | + - **Amro:** Mistral 7B + RAG |
| 83 | + - **Safia:** TinyLlama (SLM) + RAG |
| 84 | +- The team agreed to apply the recursive editing framework to Safia’s model as |
| 85 | + the **fourth setup**. |
| 86 | +- Human evaluation plans were discussed, focusing on participant-based accuracy |
| 87 | + and quality assessment. Final testing will be completed this week. |
| 88 | + |
| 89 | +### Action Items |
| 90 | + |
| 91 | +- Reem, Aseel, and Banu will implement **recursive editing** on the small model |
| 92 | + (TinyLlama + RAG + Recursive Editing). |
| 93 | +- Caesar will apply recursive editing to the distilled model. |
| 94 | +- Amro will test the method on Mistral. |
| 95 | +- All members will prepare outputs and documentation before **Friday, |
| 96 | + November 14**. |
| 97 | +- Friday’s meeting will review results and prepare the **Google Form**, which |
| 98 | + will be finalized on **Saturday, November 15**. |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Meeting 16 |
| 103 | + |
| 104 | +**Date:** November 14, 2025 (Thursday, 2:30 PM EST) |
| 105 | +**Attendees:** Amro, Aseel, Banu, Caesar, Reem, Safia |
| 106 | + |
| 107 | +### Summary |
| 108 | + |
| 109 | +- After asynchronous discussions, the team determined that **TinyLlama is not |
| 110 | + suitable** for recursive editing. The team switched to **Microsoft’s Phi-3**, |
| 111 | + but it was removed from Hugging Face. New SLMs were selected: |
| 112 | + **Reem → Microsoft Qwen**, **Aseel → Google Gemma**. |
| 113 | +- **Caesar** reported **token limitation issues** during recursive editing. |
| 114 | +- **Reem** shared updates on recursive cycle experiments and issues related to |
| 115 | + **context window tuning** and **token allocation**. |
| 116 | +- **Amro** proposed using **specialized small models** for focused tasks |
| 117 | + (critique and refinement) instead of relying on large models. Retrieval and |
| 118 | + initial generation could be done by the user, while a small model refines the |
| 119 | + text. |
| 120 | +- Technical discussions included context window tuning, max-token adjustments, |
| 121 | + and restoring capacity by splitting loops across notebook cells. |
| 122 | +- The team reaffirmed that **GPUs outperform CPUs** significantly. |
| 123 | +- The team began discussing the **Google Form methodology**, deciding to use |
| 124 | + **vague model descriptions** to avoid bias. |
| 125 | + |
| 126 | +### Action Items |
| 127 | + |
| 128 | +- Team will **meet again tomorrow at 2:30 PM EST** to finalize remaining work. |
| 129 | +- **Safia and Banu** will develop the **Google Form** after the meeting and send |
| 130 | + it to Evan by Monday. |
| 131 | +- **Reem and Aseel** will continue working on their models. |
| 132 | +- All members must finalize **implementations and documentation**. |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Meeting 17 |
| 137 | + |
| 138 | +**Date:** November 15, 2025 (Wednesday, 2:30 PM EST) |
| 139 | +**Attendees:** Amro, Aseel, Caesar, Reem, Banu |
| 140 | + |
| 141 | +### Summary |
| 142 | + |
| 143 | +- **Aseel** confirmed she pushed her model and related work earlier today. |
| 144 | +- **Reem** reported that she is finalizing outputs for upload. |
| 145 | +- **Amro** improved his model documentation for clarity and organization. |
| 146 | +- The team discussed the **structure of the Google Form**, focusing on whether |
| 147 | + to include only **open-source model outputs** or also **commercial outputs**. |
| 148 | + The group will seek **Evan’s guidance** before finalizing the structure. |
| 149 | + |
| 150 | +### Action Items |
| 151 | + |
| 152 | +- **Create the first Google Form draft tomorrow** and prepare to present it on |
| 153 | + Monday. |
| 154 | +- **Publish the final form on Monday** after incorporating Evan’s feedback. |
| 155 | +- **Finalize all model documentation** and begin reorganizing the repository. |
0 commit comments