Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions 4_findings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Findings Folder

A cohesive overview of the study “Open-Source vs. Commercial AI: Comparing Performance
and Quality,” including the finalized report, raw responses, and key links.

## Overview

- Experiment dates: November 21–December 3, 2025
- Sample size: 42 responses
- Design: Blinded, side-by-side comparison of open-source vs. commercial AI models
across eight Apollo‑11–themed tasks (summarization, paraphrasing, reasoning,
creative writing)

## Contents

- `README.md`
Folder guide with overview, quick findings, and links
- `findings_report.md`
Final findings report (Markdown) with all charts embedded
- `responses.csv`
Raw Google Forms responses (CSV exported from the live Google Sheet)

## Quick findings

- Summarization and reasoning: commercial models generally rate higher, but margins
are modest.
- Paraphrasing and creative writing: preferences are more balanced; model identity
is often hard to distinguish.
- Uncertainty matters: “not sure / can’t tell” responses are informative and
suggest convergence in perceived quality under blind conditions.
- Identification difficulty: many participants struggled to correctly label outputs
as open-source or commercial, reinforcing that style and quality can overlap
depending on task and prompt.

## Data source (Google Sheets)

- Primary data source (cleaned headers, all responses):
[Google Sheets](https://docs.google.com/spreadsheets/d/1Bm4geFzEUw9qNFrUuG4MAFSxkr_JFJY8HObnniywYFY/edit?usp=sharing)

## Report (PDF, external)

- Shareable PDF version of the findings report:
[Findings Report — PDF](https://drive.google.com/file/d/1GsjNYVLDgeXjm95937c2C-epp8M82Ytc/view?usp=sharing)

## Notes

- Charts are embedded directly in `findings_report.md`.
- `responses.csv` mirrors the Google Sheet at the time of export.
- The study focuses on output-level evaluation under blind conditions; uncertainty
and task dependence are treated as meaningful signals in interpreting quality.
Loading
Loading