Skip to content

mrlnlms/obsidian-qualia-coding

Repository files navigation

Qualia Coding

Mixed-methods qualitative data analysis, inside Obsidian. Code text, PDFs, images, spreadsheets, audio, video, and Parquet — one codebook, 20 built-in analytics, REFI-QDA round-trip, your vault stays your data.

For researchers, UX professionals, students, and anyone who'd rather not pay $600/year for a desktop CAQDAS that locks data inside a proprietary container.

Avaliando o projeto em profundidade? Comece por docs/PROJECT-OVERVIEW.md — tour cravado por 3 camadas (produto, método, implementação) com ordem de leitura prescrita pra análises externas.

Why Qualia Coding

  • 6 formats, one codebook. Highlight a markdown paragraph, draw a region on a PDF, mark a 12-second video clip, and tag a CSV row — all with the same code. Cross-format analytics work out of the box.
  • 20 analytics views built-in. Frequency, co-occurrence, MCA, MDS 2D/3D, dendrogram, lag sequential, polar coordinates, chi-square, decision trees, relations network, word clouds. No competitor offers this natively.
  • Mixed methods, first-class. Case Variables (typed properties per file), magnitude coding (nominal / ordinal / continuous), and code groups (flat N:N tags) are designed in, not bolted on.
  • Vault = your data. Markers live in the same Obsidian vault as your sources. No proprietary container. Sync with iCloud, Git, Syncthing, whatever. Switch tools without "exporting your project."
  • REFI-QDA round-trip. Export QDPX to NVivo / ATLAS.ti / MAXQDA / Dedoose, and import their projects back into Qualia. Round-trip is verified, not just one-way export.
  • Parquet support, lazy mode. The only CAQDAS that opens columnar data files — and opens them at scale. 297 MB Parquet loads instantly via DuckDB-Wasm + OPFS streaming; filter, sort, and batch-code via SQL without loading the file into RAM. Export "enriched Parquet" with your codes joined as columns for downstream pipelines.
  • Free and open source. MIT licensed. No subscription, no seat license, no AI paywall.

Annotation engines

Format What you can do
Markdown Highlight text spans in the editor, MAXQDA-style margin panel, drag-resize handles
PDF Highlight text selections (cross-page), draw rectangles, polygons, freehand shapes
Image Draw rectangular and polygonal regions on PNG, JPG, SVG, WebP, GIF
CSV / Parquet Code individual cell text or entire rows in a spreadsheet grid (Parquet is read-only)
Audio Time-bounded regions on a waveform — MP3, WAV, OGG, FLAC, M4A
Video Same as audio, with synchronized video playback — MP4, WebM

Toggle coding mode on/off per file (PDF / Image / Audio / Video) — read your sources without coding overhead, then enable when you want to annotate.

Codebook

Hierarchical when you want structure, flat tags when you don't.

  • Parent/child codes — unlimited nesting, drag-and-drop. Parent codes aggregate child counts (Braun & Clarke style)
  • Code Groups — flat N:N layer for cross-cutting dimensions (e.g. "Affective", "RQ1", "Wave 2"). One code can belong to many groups. Filters Analytics
  • Virtual folders — purely cosmetic organization, no analytical effect
  • Visibility toggles — hide a code globally, or only inside a specific file (without deleting)
  • Merge codes — combine N codes with audit trail
  • Smart Codes — saved queries that match markers dynamically. Predicates combine codes, case variables, magnitude, folders, groups, engine type, relations, marker text (substring search with optional case-sensitive), and nested Smart Codes via AND/OR/NOT. First-class in Code Explorer, Analytics, and QDPX export
  • Virtual scrolling — scales to thousands of codes

Mixed methods

  • Case Variables — typed per-file properties (age, gender, experimental condition). Auto-detected types (number / date / checkbox / text). Side panel + popover. Filters Analytics. Full QDPX round-trip
  • Magnitude coding — intensity / direction / evaluation per segment (nominal, ordinal, continuous). Closed picker prevents typos
  • Code relations — theoretical assertions between codes ("Frustration causes Abandonment") plus segment-anchored interpretations. Free-form labels with autocomplete
  • Memos — free-text notes on any marker

Multi-coder & Inter-coder reliability (ICR)

First-class support for teams coding the same corpus — and for individual researchers benchmarking their own consistency or comparing human vs. LLM coding.

  • Active coder picker in the status bar — every marker created stamps codedBy. Switch between humans (Default / Carla / Joana / …) without leaving the editor
  • Compare Coders view — matrix / table / heatmap of Cohen κ, Fleiss κ, Krippendorff α, α-binary and cu-α. Covers all 8 annotation engines (markdown, PDF text, PDF shape, image, CSV row, CSV segment, audio, video) via a parametric κ engine over per-engine overlap geometry. Web Worker keeps the UI fluid on large corpora
  • Drill-down: Spatial / Cards / Workflow — see lanes per coder over the source, contested regions with type (codes diferentes / boundary disagreement / só 1 coder marcou), and a 4-column reconciliation queue (Open / In discussion / Resolved / Divergence accepted) with full audit trail and revert
  • Reconciliation actions — Adopt code X · Split into new code · Accept divergence. Decisions become "consensus" markers (consensus:default coder) with κ pre/post available in modal
  • Saved Comparisons — recurring scopes (e.g., "Wave 2 - Carla vs Joana") persisted with ribbon + contextual shortcut on any code (Ver κ deste código entre coders)
  • Cross-vault transportICR: Export my contribution writes a JSON payload of one coder's markers + codebook fingerprint; ICR: Open import ingests payloads from collaborators with cross-vault remap (matches sources by SHA-256, not just path), codebook divergence detection, manual file remap, source skip / trust-local overrides, dedup on re-apply, and a side-by-side / by-code chip for cherry-picking before merge
  • Provenance auditsourceHashAtCoding stamped at marker creation; reconciliation events tracked in audit log (reconciliation_opened / _decided / _reverted) and visible in the existing Codebook Timeline

The Tabular CSV zip carries coder on every segment plus a coders.csv standalone table and an Inter-coder reliability section in the README with R (irr::kappa2) and Python (sklearn.cohen_kappa_score) snippets — pipelines that prefer external stats tooling work out of the box.

Analytics — 20 views

Category Views
Descriptive Dashboard · Frequency · Co-occurrence matrix
Visual Force-directed graph · Word cloud · Relations network
Exploratory Document-code matrix · Source comparison · Code overlap
Multivariate MCA biplot · MDS scatter (2D / 3D) · Dendrogram
Sequential Evolution over time · Lag sequential · Polar coordinates
Inferential Chi-square · Decision tree
Retrieval Full-text search · Text statistics

All views accept the same filters: sources, codes, minimum frequency, groups, case variables. Export any view as CSV.

Smart Codes appear alongside regular codes in Frequency, Co-occurrence, Evolution, Lag Sequential, Polar, Code × Metadata, and the Memo View — saved queries become first-class analytic objects.

The Relations network view shows code-level relations (solid edges) and segment-level relations (dashed edges) with thickness proportional to frequency.

Research Board

A freeform canvas for synthesis:

  • Sticky notes · code cards (with live statistics) · excerpt nodes pulled from any marker · KPI cards · arrow connections · freehand drawing · cluster frames
  • Export to SVG (vector — for papers and slides) or PNG (retina — for web and decks)

Interoperability

  • Export QDPX — full project: sources + segments + memos + case variables + groups. Compatible with ATLAS.ti, NVivo, MAXQDA, Dedoose, and any REFI-QDA tool
  • Export QDC — codebook only (hierarchy, colors, descriptions)
  • Import QDPX / QDC — bring projects from other QDA tools. Source files extracted to the vault, codes and segments mapped to Qualia engines
  • Tabular CSV zip — relational flat files (segments, code_applications, codes, case_variables, relations, groups, smart_codes, coders) with an embedded README and R/tidyverse + Python/pandas snippets — including a full Cohen κ recipe. For when you want to run stats outside the plugin
  • Enriched Parquet — export the active Parquet (or CSV) with your codes and comments joined as <col>__codes_frow / <col>__codes_seg / <col>__comment columns. Single-file output, ready for pandas / Polars / DuckDB pipelines
  • Per-view CSV — every Analytics view exports its own table

Under the hood

A few technical choices worth knowing about:

  • 100% local. No telemetry, no cloud calls, no API keys required. Works fully offline. Your data never leaves the vault
  • REFI-QDA 1.0 spec compliant. QDPX export uses an xmlns:qualia extension namespace to preserve Qualia-specific metadata (custom colors, group descriptions) without breaking other tools' parsers
  • CodeMirror 6 native. Markdown highlights are real CodeMirror decorations, not DOM overlays. The margin panel is a custom ViewPlugin with column-resolved label layout — same UX as MAXQDA's
  • Per-engine viewers, no shared state. PDF (pdf.js), Image (Fabric.js), Audio/Video (WaveSurfer.js), CSV/Parquet (AG Grid Community + hyparquet WASM eager / DuckDB-Wasm + OPFS lazy above 50 MB). Each engine is self-contained — adding a format doesn't touch the others
  • Incremental analytics cache. Dirty flags per engine; analytics modes recompute only the affected slice. Stays fast on large projects
  • 3,600+ unit tests (Vitest + jsdom) covering pure helpers, engine models, registry CRUD, REFI-QDA round-trip, tabular export, Smart Codes evaluator/cache, lazy Parquet pipeline, ICR motor κ + reconciliation + transport, source size providers (PDF + CSV segment + media), and analytics consolidators
  • Two inline Web Workers keep UI fluid on heavy compute: kappa.worker (5 ICR coefficients off-main-thread for any cohort size) and cluster.worker (hierarchicalCluster for cooccurrence sort, overlap sort, dendrogram and files-dendrogram views — no UI freeze on 100+ codes)
  • TypeScript strict end-to-end, with ambient types for Obsidian internals where needed
  • No build-time secrets, no runtime servers. The entire plugin is the three files in your .obsidian/plugins/qualia-coding/ folder

Installation

Qualia Coding is pre-alpha (0.x) — distributed via BRAT for testing with selected researchers. Submission to the Obsidian Community Plugins directory is planned.

Via BRAT (recommended while pre-alpha)

  1. Install BRAT from Community Plugins
  2. BRAT settings → Add Beta Plugin → enter mrlnlms/obsidian-qualia-coding
  3. Enable Qualia Coding in Community Plugins

Manual

  1. Download main.js, manifest.json, styles.css from the latest release
  2. Place inside your-vault/.obsidian/plugins/qualia-coding/
  3. Enable in Settings → Community plugins

From Community Plugins (after approval)

Once approved, Settings → Community plugins → Browse → Qualia Coding → Install → Enable.

Desktop only. Requires Obsidian 1.5.0+.

Usage

Coding text — Select text in any Markdown file. The coding menu appears — type a code name or pick an existing one. Toggle codes on/off, add a memo, set magnitude, declare relations.

Coding other formats — Open a PDF, image, CSV, audio, or video file. The plugin opens it in the coding view (toggleable per file). Select regions and assign codes the same way.

Quick CodeCmd+Shift+C opens a fuzzy search modal to apply codes without a mouse.

Codebook Panel — Sidebar. Drag codes to create hierarchies, right-click for rename / merge / delete / move-to-folder. Toggle merge mode in the toolbar.

Code Explorer — Sidebar tree of every code across every file. Search, filter, click to open detail.

Case Variables — Open from sidebar or command palette. Add typed properties per file. Filter Analytics by these.

Analytics — Command palette → Open Analytics. Pick from 20+ modes. All filters apply globally.

Research Board — Command palette → Open Research Board. Drag excerpts from sidebar onto the canvas.

Export / Import — Command palette: Export project (QDPX), Export codebook (QDC), Export codes as tabular data, Import project (QDPX), Import codebook (QDC). Also accessible from the Analytics toolbar.

Settings

Setting Description
Default color Initial highlight color for new codes
Marker opacity Transparency of highlights (0–1)
Show handles on hover Display drag handles on marker edges
Show menu on selection Auto-show coding menu on text selection
Show menu on right-click Show coding menu on right-click
Show ribbon button Display the Qualia Coding icon in the ribbon
Show magnitude in popover Display magnitude picker in the coding popover
Show relations in popover Display relations section in the coding popover
Open toggle in a new tab Open coding view in a new tab when toggling on
Auto-open coding view Per-engine toggle: enable PDF / Image / Audio / Video coding view by default
Parquet size warning (MB) Show a banner before loading large Parquet files
CSV size warning (MB) Show a banner before loading large CSV files

In development

  • LLM-assisted coding + ICR Camada 2 (Bayesian Hierarchical Annotation Model) — next practical front, paired by design. 40 tools and 5 patterns surveyed (docs/_study/llm-coding/); methodology framework cravado em docs/ICR-MULTIMODAL-METHODOLOGY.md. The framing: heterogeneity of modality (cross-engine aggregation) and heterogeneity of coder (humans vs LLMs) are the same structural problem — facets in measurement design. Plugin positions as rigorous evaluation bench for LLM as coder in multimodal QDA — a category that doesn't exist on the market. Brainstorm dedicado precede primeira spec.

See docs/ROADMAP.md for the full feature roadmap.

Documentation

Doc Question it answers
docs/ARCHITECTURE.md Why is it built this way?
docs/ROADMAP.md What's planned next? (includes market research gaps)
docs/TECHNICAL-PATTERNS.md How do I fix this weird bug?
docs/DEVELOPMENT.md How do I contribute, port, or test?
docs/BACKLOG.md What technical debt is open?

License

MIT

About

A local-first CAQDAS for Obsidian: mixed-methods analysis with multi-modal qualitative coding, case variables, built-in analytics, and REFI-QDA interop (Atlas.ti / NVivo / MAXQDA).

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors