MIT-Emerging-Talent
diff --git a/‎3_experiment/README.md‎
Lines changed: 77 additions & 0 deletions b/‎3_experiment/README.md‎
Lines changed: 77 additions & 0 deletions
@@ -0,0 +1,77 @@
+# AI Model Comparison Experiment
+
+## Evaluating Open-Source vs. Commercial Language Models
+
+This folder contains the materials for our experiment comparing open-source and
+commercial AI models through human evaluation. Participants were asked to read
+pairs of AI-generated texts and judge their quality without knowing which model
+produced which text.
+
+---
+
+## What This Experiment Is
+
+We created a survey where each question includes two texts—**Text A** and
+**Text B**—generated by different AI models. One text always comes from an
+**open-source model**, and the other from a **commercial model**. Participants:
+
+* Choose which text they prefer
+* Guess which model type generated each text
+* Rate both texts (accuracy, clarity, relevance, faithfulness)
+
+All evaluations are blind to remove brand bias.
+
+---
+
+## Why We Did This
+
+Open-source AI models are advancing quickly, and we wanted to understand
+whether they are perceived as competitive alternatives to commercial systems.
+While benchmarks can measure performance numerically, they don’t reflect how
+humans actually experience AI-generated writing.
+
+This experiment aims to answer questions like:
+
+* Do people notice a consistent quality difference?
+* Can users accurately identify commercial vs. open-source output?
+* Are open-source models “good enough” for real-world tasks?
+
+Understanding these perceptions is important for evaluating the viability of
+sustainable, accessible, and transparent AI systems.
+
+---
+
+## Why We Chose This Method
+
+We used a **paired, blind comparison** because it provides a clean way to
+assess text quality without model reputation influencing the results.
+Participants judge writing on its own merits, which helps us collect more
+reliable data.
+
+We included multiple task types: summarization, paraphrasing, reasoning, and
+creative writing, because each one tests a different aspect of model behavior.
+
+This variety gives us a broader picture of model strengths and weaknesses.
+
+---
+
+## Why This Approach Works Well
+
+This survey-based structure is simple and easy for participants to
+understand. It mirrors how people naturally interact with AI systems: reading
+text and forming opinions about quality. By keeping the evaluation blind, we
+minimize bias and generate more meaningful insights into real user perception.
+
+The method also helps determine whether open-source models, especially optimized
+ones, can realistically serve as alternatives to commercial systems
+in practical use.
+
+---
+
+## Contents of This Folder
+
+```text
+3_experiment/
+├── survey_form.md     # The form text used in the study
+└──  README.md          # Explanation of the experiment (this file)
+```