Cosmian KMS exposes a set of data anonymization methods through a dedicated REST endpoint. These methods are stateless, require no cryptographic key, and are available under /tokenize/{method} on any KMS instance built with the non-fips feature.
All endpoints accept a JSON body and return a JSON object with a single result field. Errors come back as HTTP 422 with { "code": 422, "message": "..." }.
The ckms tokenize CLI command wraps each of these endpoints if you prefer working from the command line.
[TOC]
POST /tokenize/hash
Produces an irreversible, base64-encoded hash of a string. Three algorithms are available: SHA2 (SHA-256), SHA3 (SHA3-256), and Argon2. A salt can be passed as a base64-encoded byte string; for Argon2 it is required.
{
"data": "test sha2",
"method": "SHA2"
}Response:
{ "result": "Px0txVYqBePXWF5K4xFn0Pa2mhnYA/jfsLtpIF70vJ8=" }Use Argon2 when the input has low entropy (passwords, short identifiers) and you need resistance to brute-force lookup. Use SHA2 or SHA3 for high-entropy values or when throughput matters.
POST /tokenize/noise
Adds random noise to a numeric or date value. Three distributions are supported: Gaussian, Laplace, and Uniform. You can parameterize the distribution either by mean and standard deviation, or by lower and upper bounds.
The data_type field controls what kind of input to expect: float, integer, or date. Dates must be in RFC 3339 format (2023-04-07T12:34:56Z); the timezone is preserved in the output.
{
"data": 42000.0,
"data_type": "float",
"method": "Laplace",
"mean": 0.0,
"std_dev": 500.0
}{
"data": "2023-04-07T12:34:56+02:00",
"data_type": "date",
"method": "Uniform",
"min_bound": -3600.0,
"max_bound": 3600.0
}Laplace noise is the standard choice for differential privacy budgets. Uniform noise is appropriate when you need a hard bound on the perturbation. Gaussian is a reasonable default for datasets where outliers are acceptable.
POST /tokenize/word-mask
Replaces every occurrence of a listed word with XXXX, case-insensitively, matching only on word boundaries.
{
"data": "Confidential: contains -secret- documents",
"words": ["confidential", "secret"]
}{ "result": "XXXX: contains -XXXX- documents" }Non-word characters surrounding a target word (dashes, punctuation, spaces) are preserved.
POST /tokenize/word-tokenize
Replaces listed words with random 16-byte hex tokens. All occurrences of the same word receive the same token within a single request, so joins across a single document remain consistent. Across separate requests, the tokens are different.
{
"data": "confidential meeting with confidential sources",
"words": ["confidential"]
}The output might be:
{ "result": "3A9F1C8B2E7D4A60 meeting with 3A9F1C8B2E7D4A60 sources" }POST /tokenize/word-pattern-mask
Replaces all substrings matching a regular expression with a fixed replacement string. The pattern is limited to 1024 characters.
{
"data": "Call me at +33 6 12 34 56 78 or +1-555-123-4567",
"pattern": "\\+[\\d\\s\\-]+",
"replace": "[PHONE]"
}{ "result": "Call me at [PHONE] or [PHONE]" }POST /tokenize/aggregate-number
Rounds a number to the nearest power of ten. power_of_ten: 2 rounds to the nearest hundred; power_of_ten: -1 keeps one decimal place.
{
"data": 1234567,
"data_type": "integer",
"power_of_ten": 3
}{ "result": "1235000" }data_type is either float or integer.
POST /tokenize/aggregate-date
Truncates a date to the specified precision. The time_unit field accepts Second, Minute, Hour, Day, Month, or Year. The timezone offset is preserved.
{
"data": "2023-04-07T12:34:56+02:00",
"time_unit": "Month"
}{ "result": "2023-04-01T00:00:00+02:00" }POST /tokenize/scale-number
Applies a z-score normalization followed by a linear transformation. Given the mean and standard deviation of the original distribution, each value is standardized to zero mean and unit variance, then multiplied by scale and shifted by translate.
{
"data": 75.0,
"data_type": "float",
"mean": 50.0,
"std_deviation": 15.0,
"scale": 10.0,
"translate": 100.0
}{ "result": 116.6666666666667 }data_type is either float or integer. For integers the result is rounded to the nearest integer.
The methods above are one-way or statistically approximate transformations. They do not require a key and cannot be reversed to recover the original value.
For reversible, key-based field-level encryption that preserves the format of the original value (digit strings stay digit strings, alphanumeric stays alphanumeric), see Format-Preserving Encryption (FPE FF1). FPE is accessed through the KMIP Encrypt and Decrypt operations, not through the /tokenize endpoint.
| Property | Anonymization (/tokenize) |
FPE (KMIP) |
|---|---|---|
| Reversible | No | Yes |
| Requires a key | No | Yes (256-bit AES) |
| Format-preserving | Partial (noise, rounding) | Exact |
| FIPS mode | No | No |
| Use case | Analytics, data sharing | Tokenized storage, round-trip |