We built an interactive retrieval-augmented generation (RAG) pipeline using the open-weight model Gemma 2-2b by Google and applied it to a standardized text and prompt set derived from the Apollo 11 lunar landing.
The goal:
- evaluate summarisation, reasoning, retrieval, paraphrasing, and creative tasks in a controlled, reproducible way — logging both answer quality and local sustainability metrics (energy/carbon emissions) via CodeCarbon.
Model ID: google/gemma-2-2b-it (Hugging Face)
Key attributes:
- Open-weight decoder-only model trained by Google
- Supports text-generation and conversational usage
- Suitable for research, summarisation, reasoning, and retrieval tasks
- Lightweight enough for deployment on modest compute resources
Model link: https://huggingface.co/google/gemma-2-2b-it
-
Created a source document (
source.txt) using ~1,400 words of selected Wikipedia excerpts on Apollo 11. -
Defined a set of 21 standardised prompts spanning five categories: summarisation, reasoning, RAG (fact retrieval), paraphrasing, and creative generation.
-
Built a document retrieval component using sentence-transformers to chunk the document and select top-k relevant chunks per query.
-
Developed an interactive notebook workflow that:
- Accepts a question at runtime
- Runs RAG → Draft → Critic → Refiner cycles using Gemma
- Tracks local CPU/GPU energy usage and CO₂ emissions with CodeCarbon
- Logs each question, answer, timestamp, and emissions to a single append-only log file
-
Logged runtime latency and emissions per query for performance and sustainability insights.
-
Clone the repository:
git clone <YOUR_REPO_URL> cd your_repo_folder
-
Place your
source.txtinto./data/. -
Add your Hugging Face API key in the config cell.
-
Run the notebook setup cells, then use the interactive prompt cell to ask questions.
- Reproducibility — fixed source text and prompt set allow consistent evaluation across models.
- Efficiency vs. Accuracy — emissions are logged alongside outputs to explore trade-offs between model performance and energy cost.
- Accessibility — uses an open model and standard Python tools, making research on small language models feasible even on laptops.