feat(meetings): add Milestone 3 and Milestone 4 sections to meeting-minutes folder README

doctorbanu · doctorbanu · commit 1ba740e79881 · 2025-11-21T00:37:52.000-05:00
diff --git a/meeting_minutes/README.md b/meeting_minutes/README.md
@@ -1,4 +1,5 @@
 <!-- markdownlint-disable MD013 -->
+<!-- Disabled MD013 (Line length) for better readability -->
 
 # 🗓️ Meeting Minutes – Environmental Impact of AI Models
 
@@ -21,10 +22,30 @@ By the end of Milestone 1, the project established its scope, research framework
 
 ## ⚙️ Milestone 2 – Tool Setup & Experiment Planning
 
-**Timeline:** October 15 – Ongoing
+**Timeline:** October 15 – November 6, 2025
 
 With the research framework and scope finalized in Milestone 1, **Milestone 2** focuses on preparing the experimental environment and defining how sustainability metrics will be measured. This phase involves setting up tools such as **CodeCarbon**, **CarbonTracker**, and **Eco2AI** to monitor energy and carbon usage, and exploring **Water Usage Effectiveness (WUE)** datasets from major cloud providers like AWS, Microsoft, and Google.
 
 The team also plans to configure testing environments for small open-source models (e.g., **Mistral**, **LLaMA-2**) using **Hugging Face Transformers**, **PyTorch**, and GPU-enabled platforms such as **Colab**. Another core deliverable is the **experimental design document**, which will outline the metrics (energy, carbon, water, and accuracy), workflows, and methodology diagrams guiding the model evaluation process.
 
-This milestone sets the foundation for **Milestone 3**, where real model experiments and energy tracking will begin.
+By the end of Milestone 2, the team completed the technical setup, finalized the measurement pipeline, and validated that all tracking tools operate consistently across model types—ensuring a smooth transition into Milestone 3, where full experiments will be executed.
+
+## 📊 Milestone 3 – Model Benchmarking & Data Collection
+
+**Timeline:** November 7 – November 18, 2025
+
+Milestone 3 marks the beginning of the full experimental phase. Using the measurement pipeline and tooling established in Milestone 2, the team runs benchmark tasks on both proprietary and open-source models to collect data on **accuracy** and **environmental impact**. This includes tracking **energy consumption and carbon emissions** for each testing model under consistent test conditions.
+
+During this phase, the team also validates accuracy results on selected reasoning and summarization tasks, investigates irregular outputs, and updates evaluation scripts when needed. Additional observations such as **inference time, token throughput**, and **hardware utilization** are recorded to support later analysis.
+
+By the end of Milestone 3, the project has produced a complete experimental dataset covering sustainability metrics and accuracy scores for all evaluated models, providing a strong foundation for **Milestone 4**, which focuses on human evaluation and qualitative assessment.
+
+## 🧪 Milestone 4 – Human Evaluation & Survey Analysis
+
+**Timeline:** November 19 – ongoing
+
+Milestone 4 centers on incorporating **human judgment** into the benchmarking process. The team prepares a Google Form survey designed to compare model outputs side-by-side. Participants evaluate **clarity, coherence, informativeness, factuality,** and **overall preference**.
+
+Once responses are collected, the team analyzes the results by aggregating scores, assessing agreement among reviewers, and comparing human preferences with automated accuracy metrics from earlier milestones. This helps identify where quantitative and qualitative assessments align or diverge.
+
+By the end of Milestone 4, the project integrates the human evaluation results into the broader dataset, enabling a more nuanced understanding of model performance and preparing the groundwork for **Milestone 5**.