docs: define sensitivity and protection method by asteier2026 · Pull Request #151 · NVIDIA-NeMo/Anonymizer

asteier2026 · 2026-05-11T16:02:21Z

Changes include:

Documentation for how sensitivity and protection method are assigned.

greptile-apps · 2026-05-11T16:04:32Z

Greptile Summary

This PR adds a "Key concepts" section to docs/concepts/rewrite.md that defines how Sensitivity and Protection method are assigned during the rewrite pipeline.

Sensitivity is introduced as a measure of intrinsic re-identification risk for an entity, feeding the downstream leakage scoring system — but the definition omits the discrete levels (high, medium, low) that the rest of the document already references in the leakage mass formula and output columns table.
Protection method covers five transformation strategies (replace, generalize, suppress\_inference, leave as-is, remove), presented as a dense single paragraph that could be harder to scan than a bulleted list.

Confidence Score: 4/5

Documentation-only change with no functional impact; safe to merge after addressing the missing sensitivity level details.

The added section is accurate and coherent, but the Sensitivity definition doesn't name the discrete levels (high/medium/low) that already appear in the leakage mass formula and output columns table — a reader following the document top-to-bottom will encounter those terms without having been introduced to them.

docs/concepts/rewrite.md — specifically the Sensitivity definition and the dense Protection method paragraph.

Important Files Changed

Filename	Overview
docs/concepts/rewrite.md	Adds a "Key concepts" section defining Sensitivity and Protection method; Sensitivity definition is missing the discrete levels (high/medium/low) used elsewhere in the document, and the Protection method paragraph is dense enough to benefit from a list format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Entity detected] --> B{Sensitivity assigned\nhigh / medium / low}
    B --> C{Requires protection?}
    C -- No --> D[Left as-is]
    C -- Yes --> E{Entity type}
    E -- Direct identifier --> F[Replace with synthetic alternative]
    E -- Quasi-identifier --> G[Generalize to broader form\ne.g. date → quarter, city → region]
    E -- Latent entity --> H[suppress_inference:\nrewrite surrounding text]
    E -- Cannot preserve meaning --> I[Remove outright]
    F & G & H & I --> J[Leakage scoring system]
    B --> J

_{Reviews (1): Last reviewed commit: "feature: add docs for sensitivity and pr..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T16:04:36Z


+## Key concepts
+
+**Sensitivity** measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.


Sensitivity definition omits discrete levels used throughout the document

The Sensitivity definition describes the concept in general terms but doesn't mention the concrete levels (high, medium, low) that the rest of the document already relies on. The leakage mass formula further down explicitly references high=1.0, medium=0.6, low=0.3, and the Output columns table references any_high_leaked. A reader encountering those terms after only having read this definition will not know what discrete values exist or how they are distinguished.

greptile-apps · 2026-05-11T16:04:37Z

+
+**Sensitivity** measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.
+
+**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.


The "Protection method" paragraph bundles five distinct transformation rules into a single long run-on sentence. Splitting the default behaviors into a bullet list makes each rule scannable and easier to compare at a glance.

Suggested change

**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.

**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are:

- **Direct identifiers** are replaced with plausible synthetic alternatives.

- **Quasi-identifiers** are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region).

- **Latent entities** receive `suppress_inference`: the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value.

- **Low-risk entities** that do not require protection are left as-is.

- Occasionally an entity is **removed outright** when neither replacement nor generalization can preserve meaning without retaining the identifying detail.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

feature: add docs for sensitivity and protection method

861e645

asteier2026 requested a review from a team as a code owner May 11, 2026 16:02

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: define sensitivity and protection method#151

docs: define sensitivity and protection method#151
asteier2026 wants to merge 1 commit into
mainfrom
asteier2026/docs/sensitivity

asteier2026 commented May 11, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Key concepts

		Sensitivity measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.


		Sensitivity measures the intrinsic re-identification damage an entity causes if it appears in the output — independently of what else is retained. It is not the protection decision; it feeds the downstream leakage scoring system.

		Protection method describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.

-**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are: direct identifiers are replaced with plausible synthetic alternatives, quasi-identifiers are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region), and latent entities receive `suppress_inference`, meaning the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value. Entities that do not require protection are left as-is. Occasionally an entity is removed outright when neither replacement nor generalization can preserve meaning without retaining the identifying detail.
+**Protection method** describes how a sensitive entity is transformed. The choice reflects a holistic view of the document — what other entities are being protected and how shapes what each individual entity needs. The general defaults are:
+- **Direct identifiers** are replaced with plausible synthetic alternatives.
+- **Quasi-identifiers** are generalized to a broader form (e.g., an exact date becomes a quarter, a city becomes a region).
+- **Latent entities** receive `suppress_inference`: the surrounding text is rewritten to remove the cues that enable the inference rather than replacing a stated value.
+- **Low-risk entities** that do not require protection are left as-is.
+- Occasionally an entity is **removed outright** when neither replacement nor generalization can preserve meaning without retaining the identifying detail.

Conversation

asteier2026 commented May 11, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant