Merge pull request #15 from MIT-Emerging-Talent/meeting_minutes

doctorbanu · web-flow · commit d2ae9288ed56 · 2025-11-01T11:44:23.000-04:00
Meeting minutes: Adding milestone 1&amp;2 meeting notes + Meeting notes ReadMe
diff --git a/meeting_minutes/README.md b/meeting_minutes/README.md
@@ -0,0 +1,30 @@
+<!-- markdownlint-disable MD013 -->
+
+# 🗓️ Meeting Minutes – Environmental Impact of AI Models
+
+This directory documents the weekly progress and decision-making process for the research project on **the environmental and performance trade-offs between large proprietary and small open-source AI models**.
+
+Each meeting entry outlines team discussions, feedback, experimental progress, and assigned tasks across project milestones.
+
+## 🧭 Milestone 1 – Scoping & Research Question Refinement
+
+**Timeline:** September 27 – October 14, 2025
+
+The first milestone focused on refining the research direction and defining a clear, measurable problem within **Green AI**. After exploring various AI-related topics, the team finalized the project title — **“Green AI Benchmarking of Foundation Models”** — and the research question:
+
+> Can open-source LLMs match the accuracy of commercial models while reducing environmental impact?
+>
+
+Key progress included reviewing literature on energy, carbon, and water use in AI systems, selecting benchmark tasks (**reasoning** and **summarization**), and identifying evaluation metrics for **accuracy** and **environmental footprint**. The team also chose comparison models (**GPT-4** and **Mistral-7B**), created shared documentation, and distributed responsibilities among members.
+
+By the end of Milestone 1, the project established its scope, research framework, and collaborative infrastructure, setting the stage for **Milestone 2**, focused on tool setup and metric calibration.
+
+## ⚙️ Milestone 2 – Tool Setup & Experiment Planning
+
+**Timeline:** October 15 – Ongoing
+
+With the research framework and scope finalized in Milestone 1, **Milestone 2** focuses on preparing the experimental environment and defining how sustainability metrics will be measured. This phase involves setting up tools such as **CodeCarbon**, **CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google.
+
+The team also plans to configure testing environments for small open-source models (e.g., **Mistral**, **LLaMA-2**) using **Hugging Face Transformers**, **PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable is the **experimental design document**, which will outline the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process.
+
+This milestone sets the foundation for **Milestone 3**, where real model experiments and energy tracking will begin.
diff --git a/meeting_minutes/milestone1_meetings.md b/meeting_minutes/milestone1_meetings.md
@@ -0,0 +1,180 @@
+<!-- markdownlint-disable MD024 -->
+<!-- Disabled MD024 (Multiple headings with the same content) rule
+because repeated headings (Summary, Action Items) are
+intentionally used across multiple sections for structural clarity. -->
+# Milestone 1 Meeting Minutes
+
+## Meeting 1  
+
+**Date:** September 27, 2025 (Saturday, 10:00 AM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu  
+
+### Summary  
+
+- Group members met and introduced themselves.  
+- Project topic suggestions were presented:  
+  - *AI Jobs vs Real Jobs* (continuation of CDSP)  
+  - *Reddit Mental Health Text Analysis*  
+  - *Machine Learning for Climate–Environmental Data*  
+
+### Action Items  
+
+- Conduct a **domain search** on the proposed topics.  
+- Bring **alternative project ideas** to the next meeting.  
+- Create a [**Google Doc**](https://docs.google.com/document/d/1dk0j0GUoDWqBHmLArcS2xoW5ct5nOjdlCeX3P-yhhOw/edit?tab=t.0)
+  to facilitate asynchronous collaboration.  
+
+---
+
+## Meeting 2  
+
+**Date:** September 29, 2025 (Monday, 12:00 PM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu  
+
+### Summary  
+
+- Members presented new project ideas and ELO2 process plans:  
+  - *Mental Health of University Students in Sudan*  
+  - *Probabilistic Dental Triage System with Synthetic Data Generation for
+    Resource-Limited Settings*  
+  - *Project: Green AI Benchmarking of Foundation Models*  
+  - *Green AI — Energy & Water Efficiency in Machine Learning*  
+- Previously proposed topics were dropped due to various constraints.  
+- The new ideas were discussed, but no final consensus was reached.  
+
+### Action Items  
+
+- All members will research the newly proposed topics before the next meeting.  
+- The group will reach a **final decision** on the project topic at the next
+  session.  
+
+---
+
+## Meeting 3  
+
+**Date:** September 30, 2025 (Tuesday, 1:30 PM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu  
+
+### Summary  
+
+- The topics discussed in the previous meeting were revisited.  
+- After evaluating the group’s collective knowledge, experience, and skills,
+  the team decided that **“Project: Green AI Benchmarking of Foundation
+  Models”** was the most suitable topic for the ELO2 project.  
+
+### Action Items  
+
+- Conduct **domain research** on the selected project topic.  
+
+---
+
+## Meeting 4  
+
+**Date:** October 5, 2025 (Sunday, 12:00 PM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia  
+
+### Summary  
+
+- Safia officially joined the project team.  
+- Amro presented a [**two-month (ELO2 deadline) project plan**](https://docs.google.com/document/d/19OCqflqeRLHzdPs9URrRWPzIdh3g1uw9TgX7-d_SXp8/edit?tab=t.0#heading=h.qd58vuomlp42).
+- The team discussed **how to kick off the project**, including **milestones,
+  constraints, and deliverables**.  
+- During domain research, Reem found a [**recently published study**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09JUBJQ9C2/2505.09598v4.pdf)
+  with striking methodological similarities to the group’s topic and shared it
+  with us.  
+
+### Action Items  
+
+- Seek **Evan’s feedback** on how to proceed with the project in light of the
+  new findings.  
+
+---
+
+## Meeting 5  
+
+**Date:** October 7, 2025 (Tuesday, 11:00 AM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia  
+
+### Summary  
+
+- Based on Evan’s feedback, the group decided to **extend the topic-finalization
+  phase** by approximately two weeks and focused to adjust the project subject.
+- Members proposed ways to **refine and make the project more original**, such
+  as:  
+  - Comparing *Big AI vs Small AI* models  
+  - Evaluating *Accuracy vs Eco-Friendliness*  
+
+### Action Items  
+
+- Conduct **in-depth research** to refine and strengthen the project’s
+  originality.  
+- Review the **sources cited in the research paper** previously shared by Reem.
+
+---
+
+## Meeting 6  
+
+**Date:** October 9, 2025 (Thursday, 10:30 AM EST)  
+**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia  
+
+### Summary  
+
+- The group held a **brainstorming session** to further develop and differentiate
+  the project topic.  
+- Amro drafted a [**preliminary project plan**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09KJCKUEUB/approach_.pdf)
+  based on the discussion.  
+
+### Action Items  
+
+- Agreed to hold another meeting the following day to finalize the details.  
+- A GitHub repository will be created for the project.  
+
+---
+
+## Meeting 7  
+
+**Date:** October 10, 2025 (Friday, 12:00 PM EST)  
+**Attendees:** Amro, Reem, Caesar, Banu  
+
+### Summary  
+
+- A **new and original research question** was finalized:  
+  *“To what extent can open-source LLMs achieve comparable accuracy to
+  corporate (commercial) models while significantly reducing environmental
+  footprint?”*  
+- A new [**Google Doc**](https://docs.google.com/document/d/1BAoWHe8D3c_QAEFugS1CNEUqU5jugBwg1dFJE6-LVQo/edit?tab=t.0)
+  was created to share useful resources and references for the project.  
+
+### Action Items  
+
+- All members to gain **basic knowledge about RAG and distilled models**.  
+- **Banu and Aseel:** Select which models to use.  
+- **Caesar and Safia:** Define how to measure **accuracy metrics**.  
+- **Amro and Reem:** Define how to measure **environmental cost metrics**.  
+
+---
+
+## Meeting 8  
+
+**Date:** October 14, 2025 (Tuesday, 1:30 PM EST)  
+**Attendees:** Amro, Aseel, Caesar, Banu, Safia  
+
+### Summary  
+
+- Members presented progress on their assigned tasks from the previous meeting.
+- **Aseel & Banu:** Selected *GPT-4* (commercial) and *Mistral-7B* (open-source)
+    models. Evaluation will focus on *reasoning* and *summarization* using *MMLU*
+    and *Math* datasets. Detailed documentation can be found in the
+    [Model Evaluation Report](https://docs.google.com/document/d/1oOYIdLDumoZyYqgsQuBXDlXr1yZfo1sJNNanmIEGD8I/edit?tab=t.0).
+- **Caesar & Safia:** Suggested using the *LightEval* library with a customized
+    dataset. Caesar demonstrated how to split the *GSM8K* dataset into a
+    500-example subset. Detailed documentation can be found in the
+    [Accuracy Notes](https://docs.google.com/document/d/19L4vX-67O-fNNSmY9S8QaHUULZwgzwmKVoJGdzSsUWo/edit?tab=t.0).
+- **Amro & Reem:** Presented environmental metrics and detailed evaluation
+    methods for environmental factors.  
+
+### Action Items  
+
+- Review all presented work by **October 16th**.  
+- Meet again on **October 16th** to **discuss task allocation** for the second
+  milestone.  
diff --git a/meeting_minutes/milestone2_meetings.md b/meeting_minutes/milestone2_meetings.md
@@ -0,0 +1,150 @@
+<!-- markdownlint-disable MD024 MD013 -->
+<!-- Disabled MD024 (Multiple headings with the same content) rule
+because repeated headings (Summary, Action Items) are
+intentionally used across multiple sections for structural clarity.
+Disabled MD013 (Line length) rule because mathematical formulas
+and technical content require longer lines for readability. -->
+
+# Milestone 2 Meeting Minutes
+
+## **Meeting 9**
+
+**Date:** October 16, 2025 (Thursday, 2:00 PM EST)
+
+**Attendees:** Amro, Aseel, Caesar, Safia
+
+### **Summary**
+
+- The team decided to change the project approach due to limited access to environmental data (energy, carbon, and water consumption) for commercial AI models such as GPT, Claude, and Gemini.
+- Since large-scale testing requires computational resources beyond the team’s capacity, the new plan focuses on evaluating open-source models using laptop hardware.
+- Results will be compared with published environmental and performance data of commercial models to highlight how open-source AI can provide sustainable and accessible alternatives.
+
+### **Action Items**
+
+1. **Research and calculate environmental cost metrics:**
+    - **Energy Consumption:**
+
+        Etotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×tEtotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×t
+
+    - **Facility Overhead:**
+
+        Efacility=Etotal×PUEEfacility=Etotal×PUE
+
+    - **Carbon Footprint:**
+
+        Cemissions=Efacility×CICemissions=Efacility×CI
+
+    - **Water Footprint:**
+
+        Wconsumed=Efacility×WUEWconsumed=Efacility×WUE
+
+2. Determine how much laptop hardware can handle (small, medium, large up to 3B).
+3. Apply FLOPs-based linear scaling and empirical interpolation to improve result accuracy.
+4. Add all presented work from previous meeting (model selection, evaluation methodology, environmental metrics) to the **domain study section** of the repository.
+
+---
+
+## **Meeting 10**
+
+**Date:** October 19, 2025 (Saturday, 12:00 PM EST)
+
+**Attendees:** Amro, Aseel, Caesar, Banu, Reem
+
+### Summary
+
+- The group discussed options for testing and running AI models.
+- Ideas included running quantized models locally (with some accuracy loss) and using Google Colab for limited runs.
+- Another idea was to use the Hugging Face API for accuracy and RAG testing, though this approach does not allow measuring environmental costs.
+- The team also explored Recursive Reasoning Models as efficient and environmentally friendly alternatives, though task variety for testing remains limited.
+
+### Action Items
+
+1. Watch the video about recursive models and explore whether a small-scale recursive model can be built.
+2. If possible, compare its accuracy and environmental impact with a distilled model (e.g., **DistilGPT**).
+3. If not feasible, return to comparing **basic**, **RAG**, **distilled**, and **commercial models**.
+
+---
+
+## **Meeting 11**
+
+**Date:** October 22, 2025 (Wednesday, 12:00 PM EST)
+
+**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu
+
+### Summary
+
+- Following office hour feedback from Evan, the team decided to focus on **small language models (SLMs)** due to their efficiency.
+- The group agreed to compare open-source SLMs with distilled commercial models.
+- It was decided to apply **RAG techniques** (via the **Ragas Python library**) to quantized, SLM, and recursive models to narrow the gap with commercial systems.
+- Because of the project’s evolving direction, the final deliverable will shift from a **dashboard** to a **research paper or article**.
+- The team also plans to create a **Google Form** later to assess public and expert awareness of the topic.
+
+### Action Items
+
+- **Reem:** Test DistilBERT on Hugging Face
+- **Aseel:** Research commercial models
+- **Amro:** Test the RAG method
+- **Caesar:** Combine Distilled + RAG models
+- **Safia:** Combine SLM + RAG models
+- **Banu:** Develop a unified test prompt (e.g., a poem or short text)
+- **All:** Prepare the GitHub repository
+
+### **Future Tasks**
+
+- Create and distribute an awareness form
+- Develop a communication strategy
+- Publish the research article
+
+---
+
+## **Meeting 12**
+
+**Date:** October 27, 2025 (Monday, 1:00 PM EST)
+
+**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu
+
+### Summary
+
+- Team members presented updates on their assigned tasks from the previous meeting.
+- **Reem** shared findings on **DistilBERT**, concluding that the model performed poorly for the project’s needs.
+- **Caesar** presented a **DistilBERT + RAG demo**, confirming similar inefficiencies; both suggested that RAG could still be valuable if paired with a more capable distilled model.
+- **Amro** demonstrated his **RAG implementation**, discussed constraints, and noted ongoing refinements.
+- **Safia** showcased her **SLM + RAG demo** and shared documentation.
+- **Aseel** and **Banu** updated on **commercial model research** and **test prompt development** respectively.
+- The team discussed next research directions:
+  - Experiment with **recursive models**
+  - Search for a more efficient **distilled model**
+  - Possibly abandon commercial model comparisons in favor of evaluating specific approaches or model-task pairings
+
+### Action Items
+
+1. All members continue their respective research and experiments.
+2. Push all updates and outputs to the **GitHub repository** before the **ELO2 Midpoint Breakout Room Session** on **Wednesday, October 29**.
+3. Identify a better distilled model for testing.
+4. Evaluate test prompts on **SLM + RAG models**.
+5. Hold a follow-up meeting on **Thursday** to review progress and next steps.
+
+---
+
+## **Meeting 13**
+
+**Date:** October 31, 2025 (Friday, 12:00 PM EST)
+
+**Attendees:** Amro, Aseel, Banu, Caesar
+
+### Summary
+
+- The originally planned follow-up meeting was postponed due to scheduling conflicts.
+- **Amro** presented his **RAG demo** using **Banu’s test prompts** — the model answered most questions correctly but added unnecessary details and struggled with harder ones. Some hallucinations were observed.
+- **Caesar** discovered a new, improved distilled model (**MBZUAI/LaMini-Flan-T5-248M**), applied **RAG**, and shared a demo. It performed well on most test prompts except the hard ones.
+- The team outlined a **two-week roadmap** focused on **coding and technical tasks**, followed by **repository organization**.
+
+### Action Items
+
+- Prioritize coding tasks now; clean and organize the repository later.
+- **Amro:** Continue refining RAG implementation.
+- **Caesar:** Test the **CodeCarbon** library on the new model.
+- **Banu:** Add a **generative paragraph task** to test prompts and create **three new prompts** for it (for use in the upcoming Google Form).
+- **Aseel:** Prepare a draft for the **main README**.
+- Team to explore **recursive models** in the coming days.
+- Use **Slack** actively for communication and finalize the next meeting date later.