Skip to content

docs: define sensitivity and protection method#151

Open
asteier2026 wants to merge 1 commit into
mainfrom
asteier2026/docs/sensitivity
Open

docs: define sensitivity and protection method#151
asteier2026 wants to merge 1 commit into
mainfrom
asteier2026/docs/sensitivity

Conversation

@asteier2026
Copy link
Copy Markdown
Contributor

Changes include:

  • Documentation for how sensitivity and protection method are assigned.

@asteier2026 asteier2026 requested a review from a team as a code owner May 11, 2026 16:02
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR adds a "Key concepts" section to docs/concepts/rewrite.md that defines how Sensitivity and Protection method are assigned during the rewrite pipeline.

  • Sensitivity is introduced as a measure of intrinsic re-identification risk for an entity, feeding the downstream leakage scoring system — but the definition omits the discrete levels (high, medium, low) that the rest of the document already references in the leakage mass formula and output columns table.
  • Protection method covers five transformation strategies (replace, generalize, suppress\_inference, leave as-is, remove), presented as a dense single paragraph that could be harder to scan than a bulleted list.

Confidence Score: 4/5

Documentation-only change with no functional impact; safe to merge after addressing the missing sensitivity level details.

The added section is accurate and coherent, but the Sensitivity definition doesn't name the discrete levels (high/medium/low) that already appear in the leakage mass formula and output columns table — a reader following the document top-to-bottom will encounter those terms without having been introduced to them.

docs/concepts/rewrite.md — specifically the Sensitivity definition and the dense Protection method paragraph.

Important Files Changed

Filename Overview
docs/concepts/rewrite.md Adds a "Key concepts" section defining Sensitivity and Protection method; Sensitivity definition is missing the discrete levels (high/medium/low) used elsewhere in the document, and the Protection method paragraph is dense enough to benefit from a list format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Entity detected] --> B{Sensitivity assigned\nhigh / medium / low}
    B --> C{Requires protection?}
    C -- No --> D[Left as-is]
    C -- Yes --> E{Entity type}
    E -- Direct identifier --> F[Replace with synthetic alternative]
    E -- Quasi-identifier --> G[Generalize to broader form\ne.g. date → quarter, city → region]
    E -- Latent entity --> H[suppress_inference:\nrewrite surrounding text]
    E -- Cannot preserve meaning --> I[Remove outright]
    F & G & H & I --> J[Leakage scoring system]
    B --> J
Loading

Reviews (1): Last reviewed commit: "feature: add docs for sensitivity and pr..." | Re-trigger Greptile

Comment thread docs/concepts/rewrite.md

## Key concepts

**Sensitivity** measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Sensitivity definition omits discrete levels used throughout the document

The Sensitivity definition describes the concept in general terms but doesn't mention the concrete levels (high, medium, low) that the rest of the document already relies on. The leakage mass formula further down explicitly references high=1.0, medium=0.6, low=0.3, and the Output columns table references any_high_leaked. A reader encountering those terms after only having read this definition will not know what discrete values exist or how they are distinguished.

Comment thread docs/concepts/rewrite.md

**Sensitivity** measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.

**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The "Protection method" paragraph bundles five distinct transformation rules into a single long run-on sentence. Splitting the default behaviors into a bullet list makes each rule scannable and easier to compare at a glance.

Suggested change
**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.
**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are:
- **Direct identifiers** are replaced with plausible synthetic alternatives.
- **Quasi-identifiers** are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region).
- **Latent entities** receive `suppress_inference`: the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value.
- **Low-risk entities** that do not require protection are left as-is.
- Occasionally an entity is **removed outright** when neither replacement nor generalization can preserve meaning without retaining the identifying detail.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant