Skip to content

Commit 3ac07a8

Browse files
authored
Add README for ELO2 – GREEN AI Project
This README introduces the ELO2 – GREEN AI Project, detailing its objectives, methods, and findings related to comparing open-source and commercial language models for sustainability.
1 parent 9a7de95 commit 3ac07a8

1 file changed

Lines changed: 200 additions & 0 deletions

File tree

README.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# 🌱 ELO2 – GREEN AI
2+
3+
***Comparing Commercial and Open-Source Language Models for***
4+
***Sustainable AI***
5+
6+
This repository presents the **ELO2 – GREEN AI Project**, developed
7+
within the **MIT Emerging Talent – AI & ML Program (2025)**. The work
8+
investigates the technical performance, sustainability traits, and
9+
human-perceived quality of **open-source language models**
10+
compared to commercial systems.
11+
12+
---
13+
14+
## 🔍 Project Overview
15+
16+
### Research Question
17+
18+
**To what extent can open-source LLMs provide competitive output quality
19+
while operating at significantly lower environmental cost?**
20+
21+
![image](readme_images/trade-off.png)
22+
23+
### Motivation
24+
25+
Large commercial LLMs deliver strong performance but demand substantial
26+
compute and energy. This project examines whether **small, accessible,
27+
and environmentally efficient open-source models**—especially when
28+
enhanced with retrieval and refinement pipelines—can offer practical
29+
alternatives for everyday tasks.
30+
31+
---
32+
33+
## 🧪 Methods
34+
35+
![image](readme_images/project-timeline.png)
36+
37+
### 1. Model Families
38+
39+
The study evaluates several open-source model groups:
40+
41+
- **Quantized Model:** Mistral-7B (GGUF)
42+
- **Distilled Model:** LaMini-Flan-T5-248M
43+
- **Small Models:** Qwen, Gemma
44+
- **Enhanced Pipelines (applied to all model families):**
45+
- **RAG (Retrieval-Augmented Generation)**
46+
- **Recursive Editing**
47+
- includes AI-based critique and iterative refinement
48+
49+
These configurations serve as the optimized open-source setups used in
50+
the comparison against commercial models.
51+
52+
### 2. Tasks & Dataset
53+
54+
Evaluation tasks include:
55+
56+
- summarization
57+
- factual reasoning
58+
- paraphrasing
59+
- short creative writing
60+
- instruction following
61+
- question answering
62+
63+
A targeted excerpt from the **Apollo-11 mission transcripts** served as
64+
the central reference text for all evaluation tasks. All prompts were constructed
65+
directly from this shared material. Using a single, consistent source ensured
66+
that every model was tested under identical informational conditions, allowing
67+
clear and fair comparison of output quality and relevance.
68+
69+
### 3. RAG Pipeline
70+
71+
Retrieval-Augmented Generation (RAG) was applied to multiple model
72+
families. The pipeline includes:
73+
74+
- document indexing
75+
- dense similarity retrieval
76+
- context injection through prompt augmentation
77+
- answer synthesis using guidance prompts
78+
79+
RAG improved factual grounding in nearly all models.
80+
81+
### 4. Recursive Editing Framework
82+
83+
A lightweight iterative refinement procedure was implemented:
84+
85+
1. **Draft Generation:**
86+
The primary model produces an initial output.
87+
88+
2. **AI-Based Critique:**
89+
A secondary SLM evaluates clarity, accuracy, faithfulness and relevance.
90+
91+
3. **Refinement Step:**
92+
A revision prompt integrates critique and generates an improved text.
93+
94+
4. **Stopping Condition:**
95+
The cycle ends after a fixed number of iterations or when critique
96+
stabilizes.
97+
98+
This approach allowed weaker SLMs to yield higher-quality results
99+
without relying on large models.
100+
101+
### 5. Environmental Measurement
102+
103+
Environmental footprint data was captured with **CodeCarbon**, recording:
104+
105+
- CPU/GPU energy usage
106+
- Carbon emissions
107+
- PUE-adjusted overhead
108+
109+
These measurements enabled comparison with published metrics for
110+
commercial LLMs.
111+
112+
### 6. Human Evaluation (Single-Blind)
113+
114+
A structured Google Form experiment collected:
115+
116+
- **source identification** (commercial vs. open-source)
117+
- **quality ratings** on accuracy, faithfulness, relevance, and clarity
118+
(1–5 scale)
119+
120+
Outputs were randomized and anonymized to avoid bias. This provided a
121+
perception-based counterpart to technical evaluation.
122+
123+
### 7. Analysing the Results
124+
125+
....
126+
127+
### 8. Publishing an Article
128+
129+
....
130+
131+
---
132+
133+
## 📊 Key Findings
134+
135+
- FINDING1.....
136+
- FINDING2.....
137+
- FINDING3.....
138+
- FINDING4.....
139+
140+
---
141+
142+
## 🔮 Future Work
143+
144+
- Evaluate additional open-source model families across diverse tasks
145+
- Test optimized pipelines in specialized domains (medical, legal, technical writing)
146+
- Track carbon footprint across full lifecycle (training to deployment)
147+
- Conduct ablation studies isolating RAG vs. recursive editing contributions
148+
149+
---
150+
151+
## 📢 Communication Strategy
152+
153+
The research findings will be shared through formats designed for different
154+
audiences and purposes:
155+
156+
### For Researchers
157+
158+
A comprehensive research article will document the complete experimental design,
159+
statistical analysis, and implications.
160+
161+
🔗 **[View Aticle](link1)**
162+
163+
### For Practitioners & Educators
164+
165+
An executive presentation provides a visual overview of the research question,
166+
methodology, and key findings without requiring deep technical background.
167+
168+
🔗 **[View Presentation](link2)**
169+
170+
### For the Community
171+
172+
A public evaluation study invites participation in assessing AI-generated texts.
173+
This crowdsourced data forms a critical component of the research.
174+
175+
🔗 **[Participate in Study](link3)**
176+
177+
### For Reproducibility
178+
179+
All materials (dataset, prompts, model outputs, evaluation scripts, and carbon
180+
tracking logs) are publicly available in this repository.
181+
182+
🔗 **[Browse Repository](https://github.com/banuozyilmaz2-jpg/ELO2-GREEN-AI)**
183+
184+
---
185+
186+
## 👥 Contributors
187+
188+
- [Amro Mohamed](https://github.com/Elshikh-Amro)
189+
- [Aseel Omer](https://github.com/AseelOmer)
190+
- [Banu Ozyilmaz](https://github.com/doctorbanu)
191+
- [Caesar Ghazi](https://github.com/CaesarGhazi)
192+
- [Reem Osama](https://github.com/reunicorn1)
193+
- [Safia Gibril Nouman](https://github.com/Safi222)
194+
195+
---
196+
197+
## 🙏 Acknowledgments
198+
199+
Special thanks to the **MIT Emerging Talent Program** for their guidance and
200+
feedback throughout the project.

0 commit comments

Comments
 (0)