|
| 1 | +<!-- markdownlint-disable MD024 MD013 --> |
| 2 | +<!-- Disabled MD024 (Multiple headings with the same content) rule |
| 3 | +because repeated headings (Summary, Action Items) are |
| 4 | +intentionally used across multiple sections for structural clarity. |
| 5 | +Disabled MD013 (Line length) rule because mathematical formulas |
| 6 | +and technical content require longer lines for readability. --> |
| 7 | + |
| 8 | +# Milestone 2 Meeting Minutes |
| 9 | + |
| 10 | +## **Meeting 9** |
| 11 | + |
| 12 | +**Date:** October 16, 2025 (Thursday, 2:00 PM EST) |
| 13 | + |
| 14 | +**Attendees:** Amro, Aseel, Caesar, Safia |
| 15 | + |
| 16 | +### **Summary** |
| 17 | + |
| 18 | +- The team decided to change the project approach due to limited access to environmental data (energy, carbon, and water consumption) for commercial AI models such as GPT, Claude, and Gemini. |
| 19 | +- Since large-scale testing requires computational resources beyond the team’s capacity, the new plan focuses on evaluating open-source models using laptop hardware. |
| 20 | +- Results will be compared with published environmental and performance data of commercial models to highlight how open-source AI can provide sustainable and accessible alternatives. |
| 21 | + |
| 22 | +### **Action Items** |
| 23 | + |
| 24 | +1. **Research and calculate environmental cost metrics:** |
| 25 | + - **Energy Consumption:** |
| 26 | + |
| 27 | + Etotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×tEtotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×t |
| 28 | + |
| 29 | + - **Facility Overhead:** |
| 30 | + |
| 31 | + Efacility=Etotal×PUEEfacility=Etotal×PUE |
| 32 | + |
| 33 | + - **Carbon Footprint:** |
| 34 | + |
| 35 | + Cemissions=Efacility×CICemissions=Efacility×CI |
| 36 | + |
| 37 | + - **Water Footprint:** |
| 38 | + |
| 39 | + Wconsumed=Efacility×WUEWconsumed=Efacility×WUE |
| 40 | + |
| 41 | +2. Determine how much laptop hardware can handle (small, medium, large up to 3B). |
| 42 | +3. Apply FLOPs-based linear scaling and empirical interpolation to improve result accuracy. |
| 43 | +4. Add all presented work from previous meeting (model selection, evaluation methodology, environmental metrics) to the **domain study section** of the repository. |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## **Meeting 10** |
| 48 | + |
| 49 | +**Date:** October 19, 2025 (Saturday, 12:00 PM EST) |
| 50 | + |
| 51 | +**Attendees:** Amro, Aseel, Caesar, Banu, Reem |
| 52 | + |
| 53 | +### Summary |
| 54 | + |
| 55 | +- The group discussed options for testing and running AI models. |
| 56 | +- Ideas included running quantized models locally (with some accuracy loss) and using Google Colab for limited runs. |
| 57 | +- Another idea was to use the Hugging Face API for accuracy and RAG testing, though this approach does not allow measuring environmental costs. |
| 58 | +- The team also explored Recursive Reasoning Models as efficient and environmentally friendly alternatives, though task variety for testing remains limited. |
| 59 | + |
| 60 | +### Action Items |
| 61 | + |
| 62 | +1. Watch the video about recursive models and explore whether a small-scale recursive model can be built. |
| 63 | +2. If possible, compare its accuracy and environmental impact with a distilled model (e.g., **DistilGPT**). |
| 64 | +3. If not feasible, return to comparing **basic**, **RAG**, **distilled**, and **commercial models**. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## **Meeting 11** |
| 69 | + |
| 70 | +**Date:** October 22, 2025 (Wednesday, 12:00 PM EST) |
| 71 | + |
| 72 | +**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu |
| 73 | + |
| 74 | +### Summary |
| 75 | + |
| 76 | +- Following office hour feedback from Evan, the team decided to focus on **small language models (SLMs)** due to their efficiency. |
| 77 | +- The group agreed to compare open-source SLMs with distilled commercial models. |
| 78 | +- It was decided to apply **RAG techniques** (via the **Ragas Python library**) to quantized, SLM, and recursive models to narrow the gap with commercial systems. |
| 79 | +- Because of the project’s evolving direction, the final deliverable will shift from a **dashboard** to a **research paper or article**. |
| 80 | +- The team also plans to create a **Google Form** later to assess public and expert awareness of the topic. |
| 81 | + |
| 82 | +### Action Items |
| 83 | + |
| 84 | +- **Reem:** Test DistilBERT on Hugging Face |
| 85 | +- **Aseel:** Research commercial models |
| 86 | +- **Amro:** Test the RAG method |
| 87 | +- **Caesar:** Combine Distilled + RAG models |
| 88 | +- **Safia:** Combine SLM + RAG models |
| 89 | +- **Banu:** Develop a unified test prompt (e.g., a poem or short text) |
| 90 | +- **All:** Prepare the GitHub repository |
| 91 | + |
| 92 | +### **Future Tasks** |
| 93 | + |
| 94 | +- Create and distribute an awareness form |
| 95 | +- Develop a communication strategy |
| 96 | +- Publish the research article |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## **Meeting 12** |
| 101 | + |
| 102 | +**Date:** October 27, 2025 (Monday, 1:00 PM EST) |
| 103 | + |
| 104 | +**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu |
| 105 | + |
| 106 | +### Summary |
| 107 | + |
| 108 | +- Team members presented updates on their assigned tasks from the previous meeting. |
| 109 | +- **Reem** shared findings on **DistilBERT**, concluding that the model performed poorly for the project’s needs. |
| 110 | +- **Caesar** presented a **DistilBERT + RAG demo**, confirming similar inefficiencies; both suggested that RAG could still be valuable if paired with a more capable distilled model. |
| 111 | +- **Amro** demonstrated his **RAG implementation**, discussed constraints, and noted ongoing refinements. |
| 112 | +- **Safia** showcased her **SLM + RAG demo** and shared documentation. |
| 113 | +- **Aseel** and **Banu** updated on **commercial model research** and **test prompt development** respectively. |
| 114 | +- The team discussed next research directions: |
| 115 | + - Experiment with **recursive models** |
| 116 | + - Search for a more efficient **distilled model** |
| 117 | + - Possibly abandon commercial model comparisons in favor of evaluating specific approaches or model-task pairings |
| 118 | + |
| 119 | +### Action Items |
| 120 | + |
| 121 | +1. All members continue their respective research and experiments. |
| 122 | +2. Push all updates and outputs to the **GitHub repository** before the **ELO2 Midpoint Breakout Room Session** on **Wednesday, October 29**. |
| 123 | +3. Identify a better distilled model for testing. |
| 124 | +4. Evaluate test prompts on **SLM + RAG models**. |
| 125 | +5. Hold a follow-up meeting on **Thursday** to review progress and next steps. |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## **Meeting 13** |
| 130 | + |
| 131 | +**Date:** October 31, 2025 (Friday, 12:00 PM EST) |
| 132 | + |
| 133 | +**Attendees:** Amro, Aseel, Banu, Caesar |
| 134 | + |
| 135 | +### Summary |
| 136 | + |
| 137 | +- The originally planned follow-up meeting was postponed due to scheduling conflicts. |
| 138 | +- **Amro** presented his **RAG demo** using **Banu’s test prompts** — the model answered most questions correctly but added unnecessary details and struggled with harder ones. Some hallucinations were observed. |
| 139 | +- **Caesar** discovered a new, improved distilled model (**MBZUAI/LaMini-Flan-T5-248M**), applied **RAG**, and shared a demo. It performed well on most test prompts except the hard ones. |
| 140 | +- The team outlined a **two-week roadmap** focused on **coding and technical tasks**, followed by **repository organization**. |
| 141 | + |
| 142 | +### Action Items |
| 143 | + |
| 144 | +- Prioritize coding tasks now; clean and organize the repository later. |
| 145 | +- **Amro:** Continue refining RAG implementation. |
| 146 | +- **Caesar:** Test the **CodeCarbon** library on the new model. |
| 147 | +- **Banu:** Add a **generative paragraph task** to test prompts and create **three new prompts** for it (for use in the upcoming Google Form). |
| 148 | +- **Aseel:** Prepare a draft for the **main README**. |
| 149 | +- Team to explore **recursive models** in the coming days. |
| 150 | +- Use **Slack** actively for communication and finalize the next meeting date later. |
0 commit comments