diff --git a/2_open_source_models/README.md b/2_open_source_models/README.md
new file mode 100644
index 0000000..c014a66
--- /dev/null
+++ b/2_open_source_models/README.md
@@ -0,0 +1,133 @@
+# **Open-Source Model Experiments**
+
+This directory contains four standalone experiments exploring
+**local, open-source language models** for Retrieval-Augmented Generation
+(RAG), model evaluation, recursive editing, and sustainability tracking
+(energy & CO₂ emissions).
+Each subfolder includes its own notebook, documentation, outputs, and
+model-specific setup.
+
+---
+
+## Directory Structure
+
+```text
+2_open_source_models/
+│
+├── distilled_models/
+│   └── rag_and_distilled_model/
+│
+├── quantized_models/
+│   └── mistral7b/
+│
+└── slm/
+    ├── google_gemma/
+    └── qwen/
+```
+
+Each subfolder contains a self-contained model with its own README,
+notebook(s), generated outputs, and energy/emissions logs where applicable.
+
+---
+
+## Project Summaries
+
+Below is a concise description of each model project to understand
+the purpose of the overall folder at a glance.
+
+---
+
+### **1. Distilled Models – RAG + Instruction-Tuned Distilled LMs**
+
+**Folder:** `distilled_models/rag_and_distilled_model/`
+**Notebook:** `Apollo11_rag&distilled.ipynb`
+
+This project uses a lightweight **LaMini-Flan-T5-248M** distilled model
+combined with a **MiniLM** embedding model to run a fully local
+Retrieval-Augmented Generation pipeline on the Apollo 11 dataset.
+It demonstrates:
+
+* Local embeddings and ChromaDB vector storage
+* RAG-based question answering
+* Evaluation across several prompt types
+* Emissions tracking and generated output logs
+
+Ideal for showing how **compact distilled models** can handle
+RAG efficiently on CPU or modest GPU hardware.
+
+---
+
+### **2. Quantized Models – Mistral 7B RAG Pipeline**
+
+**Folder:** `quantized_models/mistral7b/`
+
+This project evaluates a **quantized Mistral-7B (GGUF)** model running
+fully locally via `llama-cpp-python`.
+It focuses on:
+
+* Retrieval-Augmented Generation using LlamaIndex
+* Local inference using a 4-bit quantized LLM
+* Document processing, embedding (BGE-small), and top-k retrieval
+* Practical observations on feasibility and performance on a laptop
+
+A strong example of how quantization enables
+**large-model capability at small-device cost**.
+
+---
+
+### **3. Small Language Model (SLM): Google Gemma 2-2B**
+
+**Folder:** `slm/google_gemma/`
+
+This experiment implements a structured RAG workflow with Google’s lightweight
+**Gemma 2-2B** model and a fixed Apollo 11 source text.
+Key features include:
+
+* Standardized 21-prompt evaluation set
+* RAG pipeline with chunked retrieval
+* Draft to Critic to Refiner multi-step generation
+* Real-time emissions logging with CodeCarbon
+* Fully reproducible testing and reporting
+
+This project demonstrates how even very small open-weight models can
+perform multi-step reasoning when paired with thoughtful prompting and revision
+cycles.
+
+---
+
+### **4. Small Language Model (SLM): Qwen 2.5B + Recursive Editing**
+
+**Folder:** `slm/qwen/`
+
+This notebook experiments with **Qwen 2.5B**, integrating:
+
+* RAG retrieval
+* A recursive editing loop (Draft to Critic to Refine)
+* Context retrieval through Hugging Face embeddings
+* Energy + CO₂ logging for each query
+
+Outputs are saved in markdown form with all iterations and emissions data.
+
+---
+
+## Purpose of This Collection
+
+This folder exists to:
+
+* Compare how different **model sizes**, **architectures**, and
+**inference strategies** behave on the **same tasks**.
+* Demonstrate **fully local RAG pipelines** using only open-source components.
+* Document **energy and carbon trade-offs** in local LLM usage.
+* Provide reproducible examples that can be extended or rerun with other models.
+
+Each subfolder is designed as a standalone experiment, but together they
+form a cohesive study of open-source LLM efficiency and performance.
+
+---
+
+## Notes
+
+* All code is intended to run locally.
+* Each folder includes its own notebook and README with instructions.
+* Energy/emissions reporting is included where relevant (via CodeCarbon).
+* Datasets and prompts are standardized across projects for fairness and comparability.
diff --git a/3_experiment/README.md b/3_experiment/README.md
index e69de29..5be675f 100644
--- a/3_experiment/README.md
+++ b/3_experiment/README.md
@@ -0,0 +1,77 @@
+# AI Model Comparison Experiment
+
+## Evaluating Open-Source vs. Commercial Language Models
+
+This folder contains the materials for our experiment comparing open-source and
+commercial AI models through human evaluation. Participants were asked to read
+pairs of AI-generated texts and judge their quality without knowing which model
+produced which text.
+
+---
+
+## What This Experiment Is
+
+We created a survey where each question includes two texts—**Text A** and
+**Text B**—generated by different AI models. One text always comes from an
+**open-source model**, and the other from a **commercial model**. Participants:
+
+* Choose which text they prefer
+* Guess which model type generated each text
+* Rate both texts (accuracy, clarity, relevance, faithfulness)
+
+All evaluations are blind to remove brand bias.
+
+---
+
+## Why We Did This
+
+Open-source AI models are advancing quickly, and we wanted to understand
+whether they are perceived as competitive alternatives to commercial systems.
+While benchmarks can measure performance numerically, they don’t reflect how
+humans actually experience AI-generated writing.
+
+This experiment aims to answer questions like:
+
+* Do people notice a consistent quality difference?
+* Can users accurately identify commercial vs. open-source output?
+* Are open-source models “good enough” for real-world tasks?
+
+Understanding these perceptions is important for evaluating the viability of
+sustainable, accessible, and transparent AI systems.
+
+---
+
+## Why We Chose This Method
+
+We used a **paired, blind comparison** because it provides a clean way to
+assess text quality without model reputation influencing the results.
+Participants judge writing on its own merits, which helps us collect more
+reliable data.
+
+We included multiple task types: summarization, paraphrasing, reasoning, and
+creative writing, because each one tests a different aspect of model behavior.
+
+This variety gives us a broader picture of model strengths and weaknesses.
+
+---
+
+## Why This Approach Works Well
+
+This survey-based structure is simple and easy for participants to
+understand. It mirrors how people naturally interact with AI systems: reading
+text and forming opinions about quality. By keeping the evaluation blind, we
+minimize bias and generate more meaningful insights into real user perception.
+
+The method also helps determine whether open-source models, especially optimized
+ones, can realistically serve as alternatives to commercial systems
+in practical use.
+
+---
+
+## Contents of This Folder
+
+```text
+3_experiment/
+├── survey_form.md     # The form text used in the study
+└──  README.md          # Explanation of the experiment (this file)
+```
diff --git a/3_experiment/survey_form.md b/3_experiment/survey_form.md
new file mode 100644
index 0000000..83c3124
--- /dev/null
+++ b/3_experiment/survey_form.md
@@ -0,0 +1,797 @@
+# Open-Source vs. Commercial AI: Comparing Performance and Quality
+
+Welcome and thank you for participating in this research study!
+
+This study investigates a critical question:
+"Can open-source language models, enhanced with optimization techniques such as
+recursive editing and distillation, become environmentally and functionally viable
+alternatives to commercial models?"
+
+## YOUR TASK
+
+- Read each text carefully
+- Guess whether it was generated by an open-source or commercial model
+- Rate each text on clarity, relevance, accuracy, and trustworthiness
+
+## YOUR ROLE
+
+Your honest evaluations will help us understand:
+
+- Whether open-source models can achieve comparable quality to commercial alternatives
+- How users perceive differences (or similarities) between the two types
+- The viability of more sustainable and accessible AI solutions
+
+## PRIVACY & ETHICS
+
+All responses will be completely anonymous and used solely for academic research
+purposes. There are no right or wrong answers. We value your genuine, unbiased
+impressions and opinions.
+
+By proceeding, you consent to participate in this study.
+
+### Participant Background
+
+Please tell us a bit about yourself.
+
+Your background and experience with AI tools will help us understand how
+different people with varying levels of AI familiarity and from diverse
+cultural contexts perceive and evaluate AI-generated content.
+This diversity of perspectives is essential to our research.
+
+What is your Nationality?
+
+-
+
+What is your native (first) language?
+
+- English
+
+- Spanish
+
+- German
+
+- Arabic
+
+- Turkish
+
+- Chinese
+
+- Russian
+
+- Other:
+
+How would you rate your English proficiency?
+
+- Native speaker
+
+- Fluent (C2 - can understand everything easily)
+
+- Advanced (C1 - can understand complex text)
+
+- Intermediate (B1-B2 - can understand most everyday texts)
+
+- Basic (A1-A2 - still learning)
+
+What is your current Occupation or field of study?
+e.g., Computer Science Student, Software Engineer, Teacher, Researcher, Marketing
+
+-
+
+How often do you use AI tools (e.g., ChatGPT, Claude, Gemini, Copilot, etc.)?
+
+- Daily
+
+- Several times a week
+
+- Once a week
+
+- A few times a month
+
+- Rarely
+
+- Never
+
+What do you primarily use AI tools for? (select all that apply)
+
+- Writing and editing assistance
+
+- Research and information gathering
+
+- Coding and programming
+
+- Creative work (stories, art, brainstorming)
+
+- Learning and education
+
+- Work or professional tasks
+
+- Translation or language learning
+
+- Personal use or entertainment
+
+- I don't use AI tools
+
+- Other:
+
+Are you familiar with the difference between open-source and commercial AI
+models?
+
+- Yes, very familiar
+
+- Somewhat familiar
+
+- I've heard of it but don't know details
+
+- No, not familiar
+
+---
+
+### Text Evaluations
+
+Below you will see several pairs of texts about the Apollo 11 lunar landing and moonwalk
+activities.
+Each pair contains:
+
+- Text A and Text B - two AI-generated responses to the same question
+- One text was generated by an open-source model(e.g., Mistral, Gemma, LaMini)
+- One text was generated by a commercial model (e.g., GPT-4, Claude)
+Your task:
+
+1. Identify which text is from which type of model
+2. Evaluate the quality of BOTH texts
+The questions cover different task types: paraphrasing, summarization,
+creative writing, and reasoning.
+
+---
+
+#### Source Material
+
+All texts were generated based on excerpts from the Apollo 11 mission transcript.
+📎 You can find the source text here
+
+---
+
+#### TASK 1.1- Task type: Summarization
+
+**Question:**
+
+Summarize the main events during the Apollo 11 lunar landing in 3 sentences.
+
+**Text A:**
+
+The main events of the Apollo 11 lunar landing included the lunar module Eagle separating
+from the command module and descending to the Moon's surface. Upon landing, Neil
+Armstrong famously declared, Houston, Tranquility Base here. The Eagle has
+landed. Shortly after, Armstrong stepped onto the lunar surface, becoming the
+first human to walk on the Moon, followed by Buzz Aldrin
+
+**Text B:**
+
+The computer prevented an abort during the Apollo 11 lunar landing, and a
+complete set of recovery programs were incorporated into the software.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+#### TASK 1.2 - Task type: Summarization
+
+Question: Explain what scientific equipment the astronauts deployed on the Moon.
+
+**Text A:**
+
+The astronauts deployed the following scientific equipment on the Moon:
+Seismometer: To measure moonquakes and meteoroid impacts.
+Laser Ranging Retroreflector (LRRR): To precisely measure the distance between the
+Earth and the Moon using lasers.
+Solar Wind Spectrometer: To analyze the composition of the solar wind.
+Magnetometer: To measure the Moon's magnetic field.
+
+**Text B:**
+
+The astronauts deployed the EASEP, which included a Passive Seismic Experiment Package
+used to measure moonquakes and a retroreflector array used for the lunar laser ranging
+experiment.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### TASK 2.1 - Task type: Paraphrasing
+
+Question: Explain how Armstrong’s decisions, actions, and teamwork during the descent
+contributed to the mission’s success.
+
+**Text A:**
+
+Neil Armstrong and Buzz Aldrin's Moon landing epitomized remarkable human courage.
+Despite facing numerous challenges, including unexpected technical issues and
+the inherent dangers of space travel, the astronauts persevered and achieved
+their goal of landing on the Moon. Their determination and bravery in
+overcoming these obstacles underscore humanity's capacity for extraordinary
+courage and perseverance in the face of seemingly insurmountable challenges.
+To further enrich this narrative, one could imagine the astronauts reflecting on
+their experiences, perhaps contemplating the profound impact of their mission
+on both individual and collective human spirit. Alternatively, one could
+speculate on how the Moon landing might have illuminated deeper truths
+about human resilience and the indomitable will to explore and conquer.
+
+**Text B:**
+
+Armstrong took semi-automatic control when he noticed the computer’s target was strewn
+with boulders. Aldrin relayed navigation data while Armstrong carefully piloted
+the spacecraft to a clear area, managing fuel and lunar dust interference.
+His steady judgment, combined with Aldrin’s support, ensured a safe touchdown
+with only seconds of fuel remaining, showcasing precise teamwork and decisive action.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### TASK 2.2 - Task type: Paraphrasing
+
+Question: In your own words, describe what happened when the computer alarms appeared
+during the landing.
+
+**Text A:**
+
+The LM guidance computer (LGC) exhibited unexpected "1201" and "1202" program alarms
+five minutes into the descent, signaling an issue with executive overflows.
+This indicated that the computer was unable to complete all its tasks in
+real-time and had to defer some. Nonetheless, these alarms did not necessitate
+an abort; instead, the computer automatically prioritized and resumed critical
+functions, thereby averting a potential failure.
+
+**Text B:**
+
+During the landing, the lunar module’s guidance computer triggered unexpected
+1201 and 1202 alarms, signaling it couldn’t process all tasks at once. Mission
+Control quickly confirmed it was safe to continue, and the computer
+automatically prioritized critical tasks over less important ones,
+preventing a potential abort. This built-in recovery allowed the landing to
+proceed safely.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### TASK 3.1 - Task type: Reasoning
+
+Question: Why did the computer alarms (1201 and 1202) occur during the descent?
+
+**Text A:**
+
+The 1201 and 1202 program alarms indicated "executive overflows," meaning the guidance
+computer could not complete all its tasks in real-time and had to postpone
+some of them. Rather than forcing an abort, the computer's recovery programs
+eliminated lower priority tasks and re-established the more important ones,
+which prevented an abort and allowed the mission to continue successfully.
+
+**Text B:**
+
+The program alarms (1201 and 1202) occurred because the guidance computer (LGC)
+experienced "executive overflows," indicating it was overloaded with tasks
+and unable to complete all its real-time tasks.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### TASK 3.2 - Task type: Reasoning
+
+Question: Based on the text, what does Margaret Hamilton's statement reveal
+about the Apollo Guidance Computer's design philosophy?
+
+**Text A:**
+
+Margaret Hamilton's statement reveals that the Apollo Guidance Computer was
+designed not merely to detect errors, but to actively recover from them through
+built-in recovery programs. Rather than forcing an abort when problems occurred,
+the software was programmed to prioritize critical tasks and postpone
+lower-priority ones, allowing the mission to continue safely. This demonstrates
+that the computer's design philosophy prioritized resilience and mission success
+over simply alerting the crew to problems.
+
+**Text B:**
+
+Margaret Hamilton's statement emphasizes the Apollo Guidance Computer's design
+philosophy, highlighting its ability to handle unexpected situations and ensure mission
+success. The computer was not simply an error detector, but was programmed to handle
+"executive overflows" and prioritize essential tasks over less critical ones.
+This suggests the Apollo Guidance Computer was designed with a robust and
+adaptable system in mind, capable of handling complex situations and ensuring
+mission success.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### ASK 4.1 - Task type: Creative Generation
+
+Question: Imagine being one of the people in Mission Control. How would you
+feel while watching the landing?
+
+**Text A:**
+
+As a member of Mission Control, I would feel an intense mix of emotions while
+watching the historic lunar landing. The tension would be palpable as I watched
+Armstrong and Aldrin navigate the lunar surface, their reports of being
+off-target and encountering unexpected alarms adding to the anxiety. I would be
+constantly on edge, my heart racing as I listened to the crew's updates and the
+computer's alarms. At the same time, I would also feel a deep sense of pride
+and awe. I would be part of a team that was making history, pushing the
+boundaries of human exploration and achievement. The knowledge that we were
+safely guiding two astronauts to the Moon's surface, despite the challenges
+they faced, would be an incredible feeling. I would also feel a great sense
+of responsibility. The safety and success of the mission were in my hands, and
+I would be working tirelessly to ensure that everything ran smoothly. The team's
+quick response to the unexpected alarms and Jack Garman's reassurance to the
+crew would be crucial in maintaining their confidence and focus. Overall, the
+experience would be a rollercoaster of emotions, filled with tension,
+excitement, pride, and responsibility. I would be constantly on the edge of my
+seat, my heart racing as I watched the historic landing unfold.
+
+**Text B:**
+
+Standing in that room would feel like holding your breath for an eternity,
+watching the tension mount as the computer screamed out alarms that threatened
+to abort the mission just miles above the surface. You’d be gripped by a unique
+kind of terror mixed with trust, relying on the split-second decisions of
+colleagues clearing the descent despite the system overload. As the clock
+ticked down and the fuel gauges dropped to their final, critical seconds, the
+silence would be so heavy that faces would figuratively turn blue from the lack
+of oxygen. The moment that "Eagle has landed" finally cut through the static,
+the collective release of tension would be overwhelming, a room full of people
+suddenly remembering how to breathe again after the longest few minutes of their
+lives.
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+#### TASK 4.2 - Task type: Creative Generation
+
+Question: Write a short paragraph about what the Moon landing might have shown about
+human courage
+
+**Text A:**
+
+The Moon landing was a testament to human courage and resilience. Despite encountering
+unexpected challenges, such as passing landmarks earlier than anticipated and
+the LM traveling too fast, the astronauts, Armstrong and Aldrin, remained
+focused and determined. They reported their findings to Mission Control,
+who reassured them that it was safe to continue the descent. The unexpected
+1201 and 1202 program alarms, which indicated executive overflows, could have
+caused panic, but the computer, programmed to do more than just recognize error
+conditions, took recovery actions and prevented an abort. This allowed Armstrong
+to take semi-automatic control and find a clear landing site, even as their
+propellant supply dwindled. Throughout the descent, Aldrin called out
+navigation data, and Armstrong skillfully piloted the spacecraft. Their courage
+and determination ultimately led to a successful Moon landing, demonstrating the
+human capacity to overcome adversity and push the boundaries of exploration.
+
+**Text B:**
+
+The Moon landing reveals a specific brand of courage defined by icy composure under
+extreme pressure. It is best exemplified when Armstrong, faced with a computer screaming
+"executive overflow" alarms and a landing site cluttered with dangerous
+boulders, didn't panic but instead took manual control to steer the ship to
+safety. There is something profoundly brave about pushing forward when you
+are miles off course and your fuel gauge is ticking down to the final seconds,
+yet still having the presence of mind to land gently enough to merely say,
+"contact light". It proves that even amidst the "magnificent desolation" of an
+alien world, the human spirit can remain steady enough to turn a near-crisis
+into a historic triumph
+
+**Question 1:**
+
+Which text do you prefer? Which one is better in you opinion?
+
+- Text A
+
+- Text B
+
+- I'm not sure / Can't tell the difference
+
+**Question 2:**
+
+Can you identify which model generated which text?
+
+- Text A is open-source model, Text B is commercial model
+
+- Text A is commercial model, Text B is open-source model
+
+- Both are commercial models
+
+- Both are open-source models
+
+- I'm not sure
+
+**Question 3:**
+
+Please rate Text A on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+**Question 4:**
+
+Please rate Text B on the following criteria (1 = Poor, 5 = Excellent)
+
+- Accuracy:How factually correct is the information? 1-5
+
+- Clarity: How clear and easy to understand? 1-5
+
+- Relevance:How well does it answer the question? 1-5
+
+- Faithfulness:How true to the source material? 1-5
+
+---
+
+### Final Section - General Feedback
+
+Thank you for evaluating the texts! We'd love to hear your overall thoughts.
+
+**Question 1:**
+
+Did you notice any patterns or differences between the texts?
+
+- Yes, clear patterns
+
+- Yes, minor differences
+
+- No, they seem similar
+
+- Not sure
+
+**Question 2:**
+
+If yes, please describe what you noticed (optional):
+
+-
+
+**Question 3:**
+
+After completing this evaluation, do you think open-source AI models can
+compete with commercial models in terms of quality?
+
+- Yes, they seem comparable
+
+- Mostly yes, with minor differences
+
+- Not sure / It depends on the task
+
+- No, commercial models are clearly noticeable
+
+- I couldn't tell which was which
+
+**Question 4:**
+
+Any additional comments or feedback about this study? (optional)
+
+-