Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions meeting_minutes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- markdownlint-disable MD013 -->

# 🗓️ Meeting Minutes – Environmental Impact of AI Models

This directory documents the weekly progress and decision-making process for the research project on **the environmental and performance trade-offs between large proprietary and small open-source AI models**.

Each meeting entry outlines team discussions, feedback, experimental progress, and assigned tasks across project milestones.

## 🧭 Milestone 1 – Scoping & Research Question Refinement

**Timeline:** September 27 – October 14, 2025

The first milestone focused on refining the research direction and defining a clear, measurable problem within **Green AI**. After exploring various AI-related topics, the team finalized the project title — **“Green AI Benchmarking of Foundation Models”** — and the research question:

> Can open-source LLMs match the accuracy of commercial models while reducing environmental impact?
>

Key progress included reviewing literature on energy, carbon, and water use in AI systems, selecting benchmark tasks (**reasoning** and **summarization**), and identifying evaluation metrics for **accuracy** and **environmental footprint**. The team also chose comparison models (**GPT-4** and **Mistral-7B**), created shared documentation, and distributed responsibilities among members.

By the end of Milestone 1, the project established its scope, research framework, and collaborative infrastructure, setting the stage for **Milestone 2**, focused on tool setup and metric calibration.

## ⚙️ Milestone 2 – Tool Setup & Experiment Planning

**Timeline:** October 15 – Ongoing

With the research framework and scope finalized in Milestone 1, **Milestone 2** focuses on preparing the experimental environment and defining how sustainability metrics will be measured. This phase involves setting up tools such as **CodeCarbon**, **CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google.

The team also plans to configure testing environments for small open-source models (e.g., **Mistral**, **LLaMA-2**) using **Hugging Face Transformers**, **PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable is the **experimental design document**, which will outline the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process.

This milestone sets the foundation for **Milestone 3**, where real model experiments and energy tracking will begin.
180 changes: 180 additions & 0 deletions meeting_minutes/milestone1_meetings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
<!-- markdownlint-disable MD024 -->
<!-- Disabled MD024 (Multiple headings with the same content) rule
because repeated headings (Summary, Action Items) are
intentionally used across multiple sections for structural clarity. -->
# Milestone 1 Meeting Minutes

## Meeting 1

**Date:** September 27, 2025 (Saturday, 10:00 AM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu

### Summary

- Group members met and introduced themselves.
- Project topic suggestions were presented:
- *AI Jobs vs Real Jobs* (continuation of CDSP)
- *Reddit Mental Health Text Analysis*
- *Machine Learning for Climate–Environmental Data*

### Action Items

- Conduct a **domain search** on the proposed topics.
- Bring **alternative project ideas** to the next meeting.
- Create a [**Google Doc**](https://docs.google.com/document/d/1dk0j0GUoDWqBHmLArcS2xoW5ct5nOjdlCeX3P-yhhOw/edit?tab=t.0)
to facilitate asynchronous collaboration.

---

## Meeting 2

**Date:** September 29, 2025 (Monday, 12:00 PM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu

### Summary

- Members presented new project ideas and ELO2 process plans:
- *Mental Health of University Students in Sudan*
- *Probabilistic Dental Triage System with Synthetic Data Generation for
Resource-Limited Settings*
- *Project: Green AI Benchmarking of Foundation Models*
- *Green AI — Energy & Water Efficiency in Machine Learning*
- Previously proposed topics were dropped due to various constraints.
- The new ideas were discussed, but no final consensus was reached.

### Action Items

- All members will research the newly proposed topics before the next meeting.
- The group will reach a **final decision** on the project topic at the next
session.

---

## Meeting 3

**Date:** September 30, 2025 (Tuesday, 1:30 PM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu

### Summary

- The topics discussed in the previous meeting were revisited.
- After evaluating the group’s collective knowledge, experience, and skills,
the team decided that **“Project: Green AI Benchmarking of Foundation
Models”** was the most suitable topic for the ELO2 project.

### Action Items

- Conduct **domain research** on the selected project topic.

---

## Meeting 4

**Date:** October 5, 2025 (Sunday, 12:00 PM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia

### Summary

- Safia officially joined the project team.
- Amro presented a [**two-month (ELO2 deadline) project plan**](https://docs.google.com/document/d/19OCqflqeRLHzdPs9URrRWPzIdh3g1uw9TgX7-d_SXp8/edit?tab=t.0#heading=h.qd58vuomlp42).
- The team discussed **how to kick off the project**, including **milestones,
constraints, and deliverables**.
- During domain research, Reem found a [**recently published study**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09JUBJQ9C2/2505.09598v4.pdf)
with striking methodological similarities to the group’s topic and shared it
with us.

### Action Items

- Seek **Evan’s feedback** on how to proceed with the project in light of the
new findings.

---

## Meeting 5

**Date:** October 7, 2025 (Tuesday, 11:00 AM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia

### Summary

- Based on Evan’s feedback, the group decided to **extend the topic-finalization
phase** by approximately two weeks and focused to adjust the project subject.
- Members proposed ways to **refine and make the project more original**, such
as:
- Comparing *Big AI vs Small AI* models
- Evaluating *Accuracy vs Eco-Friendliness*

### Action Items

- Conduct **in-depth research** to refine and strengthen the project’s
originality.
- Review the **sources cited in the research paper** previously shared by Reem.

---

## Meeting 6

**Date:** October 9, 2025 (Thursday, 10:30 AM EST)
**Attendees:** Amro, Aseel, Reem, Caesar, Banu, Safia

### Summary

- The group held a **brainstorming session** to further develop and differentiate
the project topic.
- Amro drafted a [**preliminary project plan**](https://mitemergingtalent.slack.com/files/U082U854W8Y/F09KJCKUEUB/approach_.pdf)
based on the discussion.

### Action Items

- Agreed to hold another meeting the following day to finalize the details.
- A GitHub repository will be created for the project.

---

## Meeting 7

**Date:** October 10, 2025 (Friday, 12:00 PM EST)
**Attendees:** Amro, Reem, Caesar, Banu

### Summary

- A **new and original research question** was finalized:
*“To what extent can open-source LLMs achieve comparable accuracy to
corporate (commercial) models while significantly reducing environmental
footprint?”*
- A new [**Google Doc**](https://docs.google.com/document/d/1BAoWHe8D3c_QAEFugS1CNEUqU5jugBwg1dFJE6-LVQo/edit?tab=t.0)
was created to share useful resources and references for the project.

### Action Items

- All members to gain **basic knowledge about RAG and distilled models**.
- **Banu and Aseel:** Select which models to use.
- **Caesar and Safia:** Define how to measure **accuracy metrics**.
- **Amro and Reem:** Define how to measure **environmental cost metrics**.

---

## Meeting 8

**Date:** October 14, 2025 (Tuesday, 1:30 PM EST)
**Attendees:** Amro, Aseel, Caesar, Banu, Safia

### Summary

- Members presented progress on their assigned tasks from the previous meeting.
- **Aseel & Banu:** Selected *GPT-4* (commercial) and *Mistral-7B* (open-source)
models. Evaluation will focus on *reasoning* and *summarization* using *MMLU*
and *Math* datasets. Detailed documentation can be found in the
[Model Evaluation Report](https://docs.google.com/document/d/1oOYIdLDumoZyYqgsQuBXDlXr1yZfo1sJNNanmIEGD8I/edit?tab=t.0).
- **Caesar & Safia:** Suggested using the *LightEval* library with a customized
dataset. Caesar demonstrated how to split the *GSM8K* dataset into a
500-example subset. Detailed documentation can be found in the
[Accuracy Notes](https://docs.google.com/document/d/19L4vX-67O-fNNSmY9S8QaHUULZwgzwmKVoJGdzSsUWo/edit?tab=t.0).
- **Amro & Reem:** Presented environmental metrics and detailed evaluation
methods for environmental factors.

### Action Items

- Review all presented work by **October 16th**.
- Meet again on **October 16th** to **discuss task allocation** for the second
milestone.
150 changes: 150 additions & 0 deletions meeting_minutes/milestone2_meetings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
<!-- markdownlint-disable MD024 MD013 -->
<!-- Disabled MD024 (Multiple headings with the same content) rule
because repeated headings (Summary, Action Items) are
intentionally used across multiple sections for structural clarity.
Disabled MD013 (Line length) rule because mathematical formulas
and technical content require longer lines for readability. -->

# Milestone 2 Meeting Minutes

## **Meeting 9**

**Date:** October 16, 2025 (Thursday, 2:00 PM EST)

**Attendees:** Amro, Aseel, Caesar, Safia

### **Summary**

- The team decided to change the project approach due to limited access to environmental data (energy, carbon, and water consumption) for commercial AI models such as GPT, Claude, and Gemini.
- Since large-scale testing requires computational resources beyond the team’s capacity, the new plan focuses on evaluating open-source models using laptop hardware.
- Results will be compared with published environmental and performance data of commercial models to highlight how open-source AI can provide sustainable and accessible alternatives.

### **Action Items**

1. **Research and calculate environmental cost metrics:**
- **Energy Consumption:**

Etotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×tEtotal=(PGPU×UGPU+PCPU×UCPU+Pothers)×t

- **Facility Overhead:**

Efacility=Etotal×PUEEfacility=Etotal×PUE

- **Carbon Footprint:**

Cemissions=Efacility×CICemissions=Efacility×CI

- **Water Footprint:**

Wconsumed=Efacility×WUEWconsumed=Efacility×WUE

2. Determine how much laptop hardware can handle (small, medium, large up to 3B).
3. Apply FLOPs-based linear scaling and empirical interpolation to improve result accuracy.
4. Add all presented work from previous meeting (model selection, evaluation methodology, environmental metrics) to the **domain study section** of the repository.

---

## **Meeting 10**

**Date:** October 19, 2025 (Saturday, 12:00 PM EST)

**Attendees:** Amro, Aseel, Caesar, Banu, Reem

### Summary

- The group discussed options for testing and running AI models.
- Ideas included running quantized models locally (with some accuracy loss) and using Google Colab for limited runs.
- Another idea was to use the Hugging Face API for accuracy and RAG testing, though this approach does not allow measuring environmental costs.
- The team also explored Recursive Reasoning Models as efficient and environmentally friendly alternatives, though task variety for testing remains limited.

### Action Items

1. Watch the video about recursive models and explore whether a small-scale recursive model can be built.
2. If possible, compare its accuracy and environmental impact with a distilled model (e.g., **DistilGPT**).
3. If not feasible, return to comparing **basic**, **RAG**, **distilled**, and **commercial models**.

---

## **Meeting 11**

**Date:** October 22, 2025 (Wednesday, 12:00 PM EST)

**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu

### Summary

- Following office hour feedback from Evan, the team decided to focus on **small language models (SLMs)** due to their efficiency.
- The group agreed to compare open-source SLMs with distilled commercial models.
- It was decided to apply **RAG techniques** (via the **Ragas Python library**) to quantized, SLM, and recursive models to narrow the gap with commercial systems.
- Because of the project’s evolving direction, the final deliverable will shift from a **dashboard** to a **research paper or article**.
- The team also plans to create a **Google Form** later to assess public and expert awareness of the topic.

### Action Items

- **Reem:** Test DistilBERT on Hugging Face
- **Aseel:** Research commercial models
- **Amro:** Test the RAG method
- **Caesar:** Combine Distilled + RAG models
- **Safia:** Combine SLM + RAG models
- **Banu:** Develop a unified test prompt (e.g., a poem or short text)
- **All:** Prepare the GitHub repository

### **Future Tasks**

- Create and distribute an awareness form
- Develop a communication strategy
- Publish the research article

---

## **Meeting 12**

**Date:** October 27, 2025 (Monday, 1:00 PM EST)

**Attendees:** Amro, Aseel, Caesar, Reem, Safia, Banu

### Summary

- Team members presented updates on their assigned tasks from the previous meeting.
- **Reem** shared findings on **DistilBERT**, concluding that the model performed poorly for the project’s needs.
- **Caesar** presented a **DistilBERT + RAG demo**, confirming similar inefficiencies; both suggested that RAG could still be valuable if paired with a more capable distilled model.
- **Amro** demonstrated his **RAG implementation**, discussed constraints, and noted ongoing refinements.
- **Safia** showcased her **SLM + RAG demo** and shared documentation.
- **Aseel** and **Banu** updated on **commercial model research** and **test prompt development** respectively.
- The team discussed next research directions:
- Experiment with **recursive models**
- Search for a more efficient **distilled model**
- Possibly abandon commercial model comparisons in favor of evaluating specific approaches or model-task pairings

### Action Items

1. All members continue their respective research and experiments.
2. Push all updates and outputs to the **GitHub repository** before the **ELO2 Midpoint Breakout Room Session** on **Wednesday, October 29**.
3. Identify a better distilled model for testing.
4. Evaluate test prompts on **SLM + RAG models**.
5. Hold a follow-up meeting on **Thursday** to review progress and next steps.

---

## **Meeting 13**

**Date:** October 31, 2025 (Friday, 12:00 PM EST)

**Attendees:** Amro, Aseel, Banu, Caesar

### Summary

- The originally planned follow-up meeting was postponed due to scheduling conflicts.
- **Amro** presented his **RAG demo** using **Banu’s test prompts** — the model answered most questions correctly but added unnecessary details and struggled with harder ones. Some hallucinations were observed.
- **Caesar** discovered a new, improved distilled model (**MBZUAI/LaMini-Flan-T5-248M**), applied **RAG**, and shared a demo. It performed well on most test prompts except the hard ones.
- The team outlined a **two-week roadmap** focused on **coding and technical tasks**, followed by **repository organization**.

### Action Items

- Prioritize coding tasks now; clean and organize the repository later.
- **Amro:** Continue refining RAG implementation.
- **Caesar:** Test the **CodeCarbon** library on the new model.
- **Banu:** Add a **generative paragraph task** to test prompts and create **three new prompts** for it (for use in the upcoming Google Form).
- **Aseel:** Prepare a draft for the **main README**.
- Team to explore **recursive models** in the coming days.
- Use **Slack** actively for communication and finalize the next meeting date later.
Loading