diff --git a/meeting_minutes/README.md b/meeting_minutes/README.md new file mode 100644 index 0000000..6f5ec89 --- /dev/null +++ b/meeting_minutes/README.md @@ -0,0 +1,30 @@ + + +# 🗓️ Meeting Minutes – Environmental Impact of AI Models + +This directory documents the weekly progress and decision-making process for the research project on **the environmental and performance trade-offs between large proprietary and small open-source AI models**. + +Each meeting entry outlines team discussions, feedback, experimental progress, and assigned tasks across project milestones. + +## 🧭 Milestone 1 – Scoping & Research Question Refinement + +**Timeline:** September 27 – October 14, 2025 + +The first milestone focused on refining the research direction and defining a clear, measurable problem within **Green AI**. After exploring various AI-related topics, the team finalized the project title — **“Green AI Benchmarking of Foundation Models”** — and the research question: + +> Can open-source LLMs match the accuracy of commercial models while reducing environmental impact? +> + +Key progress included reviewing literature on energy, carbon, and water use in AI systems, selecting benchmark tasks (**reasoning** and **summarization**), and identifying evaluation metrics for **accuracy** and **environmental footprint**. The team also chose comparison models (**GPT-4** and **Mistral-7B**), created shared documentation, and distributed responsibilities among members. + +By the end of Milestone 1, the project established its scope, research framework, and collaborative infrastructure, setting the stage for **Milestone 2**, focused on tool setup and metric calibration. + +## ⚙️ Milestone 2 – Tool Setup & Experiment Planning + +**Timeline:** October 15 – Ongoing + +With the research framework and scope finalized in Milestone 1, **Milestone 2** focuses on preparing the experimental environment and defining how sustainability metrics will be measured. This phase involves setting up tools such as **CodeCarbon**, **CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google. + +The team also plans to configure testing environments for small open-source models (e.g., **Mistral**, **LLaMA-2**) using **Hugging Face Transformers**, **PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable is the **experimental design document**, which will outline the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process. + +This milestone sets the foundation for **Milestone 3**, where real model experiments and energy tracking will begin. diff --git a/meeting_minutes/milestone1_meetings.md b/meeting_minutes/milestone1_meetings.md new file mode 100644 index 0000000..90189b8 --- /dev/null +++ b/meeting_minutes/milestone1_meetings.md @@ -0,0 +1,180 @@ + + +# Milestone 1 Meeting Minutes + +## Meeting 1 + +**Date:** September 27, 2025 (Saturday, 10:00 AM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu + +### Summary + +- Group members met and introduced themselves. +- Project topic suggestions were presented: + - *AI Jobs vs Real Jobs* (continuation of CDSP) + - *Reddit Mental Health Text Analysis* + - *Machine Learning for Climate–Environmental Data* + +### Action Items + +- Conduct a **domain search** on the proposed topics. +- Bring **alternative project ideas** to the next meeting. +- Create a [**Google Doc**](https://docs.google.com/document/d/1dk0j0GUoDWqBHmLArcS2xoW5ct5nOjdlCeX3P-yhhOw/edit?tab=t.0) + to facilitate asynchronous collaboration. + +--- + +## Meeting 2 + +**Date:** September 29, 2025 (Monday, 12:00 PM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu + +### Summary + +- Members presented new project ideas and ELO2 process plans: + - *Mental Health of University Students in Sudan* + - *Probabilistic Dental Triage System with Synthetic Data Generation for + Resource-Limited Settings* + - *Project: Green AI Benchmarking of Foundation Models* + - *Green AI — Energy & Water Efficiency in Machine Learning* +- Previously proposed topics were dropped due to various constraints. +- The new ideas were discussed, but no final consensus was reached. + +### Action Items + +- All members will research the newly proposed topics before the next meeting. +- The group will reach a **final decision** on the project topic at the next + session. + +--- + +## Meeting 3 + +**Date:** September 30, 2025 (Tuesday, 1:30 PM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu + +### Summary + +- The topics discussed in the previous meeting were revisited. +- After evaluating the group’s collective knowledge, experience, and skills, + the team decided that **“Project: Green AI Benchmarking of Foundation + Models”** was the most suitable topic for the ELO2 project. + +### Action Items + +- Conduct **domain research** on the selected project topic. + +--- + +## Meeting 4 + +**Date:** October 5, 2025 (Sunday, 12:00 PM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia + +### Summary + +- Safia officially joined the project team. +- Amro presented a [**two-month (ELO2 deadline) project plan**](https://docs.google.com/document/d/19OCqflqeRLHzdPs9URrRWPzIdh3g1uw9TgX7-d_SXp8/edit?tab=t.0#heading=h.qd58vuomlp42). +- The team discussed **how to kick off the project**, including **milestones, + constraints, and deliverables**. +- During domain research, Reem found a [**recently published study**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09JUBJQ9C2/2505.09598v4.pdf) + with striking methodological similarities to the group’s topic and shared it + with us. + +### Action Items + +- Seek **Evan’s feedback** on how to proceed with the project in light of the + new findings. + +--- + +## Meeting 5 + +**Date:** October 7, 2025 (Tuesday, 11:00 AM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia + +### Summary + +- Based on Evan’s feedback, the group decided to **extend the topic-finalization + phase** by approximately two weeks and focused to adjust the project subject. +- Members proposed ways to **refine and make the project more original**, such + as: + - Comparing *Big AI vs Small AI* models + - Evaluating *Accuracy vs Eco-Friendliness* + +### Action Items + +- Conduct **in-depth research** to refine and strengthen the project’s + originality. +- Review the **sources cited in the research paper** previously shared by Reem. + +--- + +## Meeting 6 + +**Date:** October 9, 2025 (Thursday, 10:30 AM EST) +**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia + +### Summary + +- The group held a **brainstorming session** to further develop and differentiate + the project topic. +- Amro drafted a [**preliminary project plan**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09KJCKUEUB/approach_.pdf) + based on the discussion. + +### Action Items + +- Agreed to hold another meeting the following day to finalize the details. +- A GitHub repository will be created for the project. + +--- + +## Meeting 7 + +**Date:** October 10, 2025 (Friday, 12:00 PM EST) +**Attendees:** Amro, Reem, Caesar, Banu + +### Summary + +- A **new and original research question** was finalized: + *“To what extent can open-source LLMs achieve comparable accuracy to + corporate (commercial) models while significantly reducing environmental + footprint?”* +- A new [**Google Doc**](https://docs.google.com/document/d/1BAoWHe8D3c_QAEFugS1CNEUqU5jugBwg1dFJE6-LVQo/edit?tab=t.0) + was created to share useful resources and references for the project. + +### Action Items + +- All members to gain **basic knowledge about RAG and distilled models**. +- **Banu and Aseel:** Select which models to use. +- **Caesar and Safia:** Define how to measure **accuracy metrics**. +- **Amro and Reem:** Define how to measure **environmental cost metrics**. + +--- + +## Meeting 8 + +**Date:** October 14, 2025 (Tuesday, 1:30 PM EST) +**Attendees:** Amro, Aseel, Caesar, Banu, Safia + +### Summary + +- Members presented progress on their assigned tasks from the previous meeting. +- **Aseel & Banu:** Selected *GPT-4* (commercial) and *Mistral-7B* (open-source) + models. Evaluation will focus on *reasoning* and *summarization* using *MMLU* + and *Math* datasets. Detailed documentation can be found in the + [Model Evaluation Report](https://docs.google.com/document/d/1oOYIdLDumoZyYqgsQuBXDlXr1yZfo1sJNNanmIEGD8I/edit?tab=t.0). +- **Caesar & Safia:** Suggested using the *LightEval* library with a customized + dataset. Caesar demonstrated how to split the *GSM8K* dataset into a + 500-example subset. Detailed documentation can be found in the + [Accuracy Notes](https://docs.google.com/document/d/19L4vX-67O-fNNSmY9S8QaHUULZwgzwmKVoJGdzSsUWo/edit?tab=t.0). +- **Amro & Reem:** Presented environmental metrics and detailed evaluation + methods for environmental factors. + +### Action Items + +- Review all presented work by **October 16th**. +- Meet again on **October 16th** to **discuss task allocation** for the second + milestone. diff --git a/meeting_minutes/milestone2_meetings.md b/meeting_minutes/milestone2_meetings.md new file mode 100644 index 0000000..ea0761c --- /dev/null +++ b/meeting_minutes/milestone2_meetings.md @@ -0,0 +1,150 @@ + + + +# Milestone 2 Meeting Minutes + +## **Meeting 9** + +**Date:** October 16, 2025 (Thursday, 2:00 PM EST) + +**Attendees:** Amro, Aseel, Caesar, Safia + +### **Summary** + +- The team decided to change the project approach due to limited access to environmental data (energy, carbon, and water consumption) for commercial AI models such as GPT, Claude, and Gemini. +- Since large-scale testing requires computational resources beyond the team’s capacity, the new plan focuses on evaluating open-source models using laptop hardware. +- Results will be compared with published environmental and performance data of commercial models to highlight how open-source AI can provide sustainable and accessible alternatives. + +### **Action Items** + +1. **Research and calculate environmental cost metrics:** + - **Energy Consumption:** + + Etotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×tEtotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×t + + - **Facility Overhead:** + + Efacility=Etotal×PUEEfacility=Etotal×PUE + + - **Carbon Footprint:** + + Cemissions=Efacility×CICemissions=Efacility×CI + + - **Water Footprint:** + + Wconsumed=Efacility×WUEWconsumed=Efacility×WUE + +2. Determine how much laptop hardware can handle (small, medium, large up to 3B). +3. Apply FLOPs-based linear scaling and empirical interpolation to improve result accuracy. +4. Add all presented work from previous meeting (model selection, evaluation methodology, environmental metrics) to the **domain study section** of the repository. + +--- + +## **Meeting 10** + +**Date:** October 19, 2025 (Saturday, 12:00 PM EST) + +**Attendees:** Amro, Aseel, Caesar, Banu, Reem + +### Summary + +- The group discussed options for testing and running AI models. +- Ideas included running quantized models locally (with some accuracy loss) and using Google Colab for limited runs. +- Another idea was to use the Hugging Face API for accuracy and RAG testing, though this approach does not allow measuring environmental costs. +- The team also explored Recursive Reasoning Models as efficient and environmentally friendly alternatives, though task variety for testing remains limited. + +### Action Items + +1. Watch the video about recursive models and explore whether a small-scale recursive model can be built. +2. If possible, compare its accuracy and environmental impact with a distilled model (e.g., **DistilGPT**). +3. If not feasible, return to comparing **basic**, **RAG**, **distilled**, and **commercial models**. + +--- + +## **Meeting 11** + +**Date:** October 22, 2025 (Wednesday, 12:00 PM EST) + +**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu + +### Summary + +- Following office hour feedback from Evan, the team decided to focus on **small language models (SLMs)** due to their efficiency. +- The group agreed to compare open-source SLMs with distilled commercial models. +- It was decided to apply **RAG techniques** (via the **Ragas Python library**) to quantized, SLM, and recursive models to narrow the gap with commercial systems. +- Because of the project’s evolving direction, the final deliverable will shift from a **dashboard** to a **research paper or article**. +- The team also plans to create a **Google Form** later to assess public and expert awareness of the topic. + +### Action Items + +- **Reem:** Test DistilBERT on Hugging Face +- **Aseel:** Research commercial models +- **Amro:** Test the RAG method +- **Caesar:** Combine Distilled + RAG models +- **Safia:** Combine SLM + RAG models +- **Banu:** Develop a unified test prompt (e.g., a poem or short text) +- **All:** Prepare the GitHub repository + +### **Future Tasks** + +- Create and distribute an awareness form +- Develop a communication strategy +- Publish the research article + +--- + +## **Meeting 12** + +**Date:** October 27, 2025 (Monday, 1:00 PM EST) + +**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu + +### Summary + +- Team members presented updates on their assigned tasks from the previous meeting. +- **Reem** shared findings on **DistilBERT**, concluding that the model performed poorly for the project’s needs. +- **Caesar** presented a **DistilBERT + RAG demo**, confirming similar inefficiencies; both suggested that RAG could still be valuable if paired with a more capable distilled model. +- **Amro** demonstrated his **RAG implementation**, discussed constraints, and noted ongoing refinements. +- **Safia** showcased her **SLM + RAG demo** and shared documentation. +- **Aseel** and **Banu** updated on **commercial model research** and **test prompt development** respectively. +- The team discussed next research directions: + - Experiment with **recursive models** + - Search for a more efficient **distilled model** + - Possibly abandon commercial model comparisons in favor of evaluating specific approaches or model-task pairings + +### Action Items + +1. All members continue their respective research and experiments. +2. Push all updates and outputs to the **GitHub repository** before the **ELO2 Midpoint Breakout Room Session** on **Wednesday, October 29**. +3. Identify a better distilled model for testing. +4. Evaluate test prompts on **SLM + RAG models**. +5. Hold a follow-up meeting on **Thursday** to review progress and next steps. + +--- + +## **Meeting 13** + +**Date:** October 31, 2025 (Friday, 12:00 PM EST) + +**Attendees:** Amro, Aseel, Banu, Caesar + +### Summary + +- The originally planned follow-up meeting was postponed due to scheduling conflicts. +- **Amro** presented his **RAG demo** using **Banu’s test prompts** — the model answered most questions correctly but added unnecessary details and struggled with harder ones. Some hallucinations were observed. +- **Caesar** discovered a new, improved distilled model (**MBZUAI/LaMini-Flan-T5-248M**), applied **RAG**, and shared a demo. It performed well on most test prompts except the hard ones. +- The team outlined a **two-week roadmap** focused on **coding and technical tasks**, followed by **repository organization**. + +### Action Items + +- Prioritize coding tasks now; clean and organize the repository later. +- **Amro:** Continue refining RAG implementation. +- **Caesar:** Test the **CodeCarbon** library on the new model. +- **Banu:** Add a **generative paragraph task** to test prompts and create **three new prompts** for it (for use in the upcoming Google Form). +- **Aseel:** Prepare a draft for the **main README**. +- Team to explore **recursive models** in the coming days. +- Use **Slack** actively for communication and finalize the next meeting date later.