Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 39 additions & 2 deletions content/blog/database-audit-logging.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: 'Database Audit Logging Best Practices for Compliance'
author: Adela
updated_at: 2026/03/19 09:00:00
updated_at: 2026/04/02 09:00:00
feature_image: /content/blog/database-audit-logging/banner.webp
tags: Explanation
description: 'How to set up database audit logging for SOC 2, HIPAA, and ISO 27001 compliance across PostgreSQL, MySQL, SQL Server, and Oracle.'
Expand Down Expand Up @@ -150,6 +150,35 @@ SQL is routed through a centralized gateway or workflow before executing.
*For example:*
A workflow platform like **Bytebase** produces complete, contextual audit logs because all SQL flows through a single, identity-aware pipeline.

## How Bytebase Handles Audit Logging

[Bytebase](https://docs.bytebase.com/security/audit-log/) takes the proxy/workflow approach: SQL executed through Bytebase's SQL Editor or change workflows — DDL, DML, and SELECT — is logged before reaching the database. Because Bytebase manages user identity, every audit record is tied to a real person, not a shared `admin` account. Direct database connections that bypass Bytebase are not captured in these logs.

### What gets logged

Bytebase records:

- **SQL execution** — every query that flows through the system, including the full SQL text, target database, and execution result
- **Schema changes** — issue creation, approval decisions, rollout status
- **Data access** — data queries and exports, with the requesting user's identity
- **Authentication** — login, logout, SSO token exchange
- **Permission changes** — role grants, project membership updates, policy modifications
- **System configuration** — instance connection changes, environment settings, workspace policies

Each entry includes the user's email, IP address, timestamp, operation duration, affected resource, and request/response payloads. Sensitive fields (passwords, certificates, SSH keys) are automatically redacted.

### Export and integration

Three ways to get audit data out:

1. **GUI** — filter by user, action type, resource, and date range in Settings → Audit Log
2. **API** — query `/v1/auditLogs:search` (workspace-level) or `/v1/projects/{project}/auditLogs:search` (project-level). Returns structured JSON ready for any SIEM. See the [API audit log tutorial](https://docs.bytebase.com/tutorials/api-audit-log) for examples.
3. **Log streaming** — enable audit log export to stdout in Settings → General → Audit Log Export. Add the `--enable-json-logging` flag to output structured JSON, which a Datadog/Splunk/Grafana agent can ingest directly

### Availability

Audit logging is available on [Pro and Enterprise plans](https://www.bytebase.com/pricing/). The Pro plan covers most audit needs; Enterprise adds custom approval workflows and advanced access control that generate additional audit events.

## Recommended Best Practices

Regardless of database engine or auditing method, strong audit practices share the same foundations:
Expand Down Expand Up @@ -185,4 +214,12 @@ SOC 2, ISO 27001, HIPAA, PCI DSS, and GDPR all require some form of database aud

**How do I export database audit logs to Datadog or Splunk?**

Most engines write audit logs to files or system tables. For PostgreSQL, configure `pgaudit` to write to `csvlog` and use a Datadog or Splunk agent to ingest the files. For MySQL, enable the audit plugin and point the log file at your SIEM collector. For SQL Server, parse the `.sqlaudit` files with `fn_get_audit_file()` and forward via a log shipper. Bytebase provides a built-in [audit log API](/docs/security/audit-log/) that exports structured JSON, ready for any SIEM.
Most engines write audit logs to files or system tables. For PostgreSQL, configure `pgaudit` to write to `csvlog` and use a Datadog or Splunk agent to ingest the files. For MySQL, enable the audit plugin and point the log file at your SIEM collector. For SQL Server, parse the `.sqlaudit` files with `fn_get_audit_file()` and forward via a log shipper. Bytebase provides a built-in [audit log API](https://docs.bytebase.com/security/audit-log/) that exports structured JSON, ready for any SIEM.

**How does Bytebase handle database audit logging?**

All SQL executed through Bytebase — via the SQL Editor or change workflows — is automatically logged with the real user's identity, full SQL text, target database, timestamp, and execution result. Direct database connections that bypass Bytebase are not captured. Logs can be queried via the GUI, exported via API (`/v1/auditLogs:search`), or streamed as JSON to any SIEM. Available on Pro and Enterprise plans.

**Do I still need engine-native auditing if I use Bytebase?**

It depends on your compliance scope. Bytebase captures all SQL that flows through its gateway — schema changes, data queries, exports, and admin actions. If you also have direct database connections that bypass Bytebase (e.g., emergency SSH access or application service accounts), you should keep engine-native auditing enabled for those paths. Many teams use Bytebase as the primary audit trail and engine-native logs as a secondary safety net.
117 changes: 73 additions & 44 deletions content/blog/what-is-dynamic-data-masking.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: 'What is Dynamic Data Masking (DDM)'
author: Tianzhou
updated_at: 2024/09/02 09:00
updated_at: 2026/04/02 09:00
feature_image: /content/blog/what-is-dynamic-data-masking/cover.webp
tags: Explanation
featured: true
Expand All @@ -14,50 +14,28 @@ Dynamic Data Masking (DDM) protects sensitive data in real-time by dynamically a

DDM contrasts with Static Data Masking (SDM). While SDM involves creating a permanently altered, non-reversible copy of the original data, DDM modifies the data on-the-fly as it is accessed in real-time. This dynamic approach ensures that sensitive data remains protected during query execution without changing the underlying data at rest.

## Use Case
## When to Use Dynamic Data Masking vs Static Data Masking

The primary use case for SDM is to create safe, sanitized versions of production data for use in non-production environments.
On the other hand, DDM is primarily used in production environments to control and limit access to sensitive data dynamically, based on user roles, permissions, or other contextual factors. This allows organizations to protect sensitive information without needing to alter the underlying data, making it a powerful tool for maintaining security and compliance in real-time data access scenarios.
Static Data Masking (SDM) creates sanitized copies of production data for dev/test environments. DDM is different — it masks data in real-time in production, controlling what each user sees based on their role and permissions. The underlying data stays untouched.

## DDM Complexity
| | Static Data Masking | Dynamic Data Masking |
|---|---|---|
| **Environment** | Non-production (dev, test, staging) | Production |
| **Data altered?** | Yes — permanent copy | No — masked on-the-fly |
| **Use case** | Safe test data | Role-based access control |

The complexity of DDM arises primarily from its dynamic nature, where the system must make real-time decisions about how and when to mask data based on various runtime contexts. These contexts include:
## What Makes Dynamic Data Masking Hard

### User Context
DDM has to make real-time decisions about what each user sees. The complexity comes from the number of variables involved:

- **Role-Based Access**: Different users or roles may have varying levels of access to data. DDM must dynamically adjust the visibility of data based on the user’s identity, ensuring that only authorized users can see sensitive information in its unmasked form.
- **User role and identity** — a DBA sees unmasked data, an analyst sees partial masks, a contractor sees full masks. The same query returns different results depending on who runs it.
- **Temporary access** — an on-call engineer needs unmasked access to debug a production incident, then the access should expire.
- **Column-level granularity** — an `email` column might need partial masking while a `phone` column needs full masking, even in the same table.
- **Multiple databases and environments** — masking rules in production differ from staging. If you run MySQL, PostgreSQL, and Oracle, each has different (or no) native DDM support.
- **Masking algorithm choice** — partial masking keeps data useful for debugging (`john@****`), but full masking or hashing is needed for compliance. Picking the wrong algorithm makes the data either too exposed or too useless.
- **Performance** — masking happens on every query at runtime. A poorly implemented DDM layer adds latency to every SELECT.

- **User Location and Device**: In some scenarios, data access might be influenced by the user's location (e.g., within or outside a corporate network) or the device being used. DDM must be capable of factoring in these variables dynamically.

### Temporal Context

- **Temporary Access**: User may require temporary access to solve emergencies.

- **Date and Time Sensitivity**: Certain data might only be considered sensitive during specific time periods, requiring DDM to adapt its behavior accordingly.

### Target Database Column

- **Column-Specific Masking**: Different columns in a database might require different masking techniques or rules. DDM must dynamically apply the appropriate masking algorithm based on the specific column being accessed.

- **Complex Data Types**: Handling complex data types, such as JSON or XML within columns, adds additional layers of complexity as DDM must parse and selectively mask content within these structures.

### Application Context

- **Environment-Specific Masking**: The masking rules may need to vary depending on the environment in which the application is running (e.g., dev, test, UAT, prod). DDM must recognize the environment and apply the appropriate level of masking.

- **Business Project or Use Case**: Different business projects or use cases might have unique data access requirements.

### Masking Algorithm

- **Algorithm Selection**: DDM must dynamically choose the most suitable masking algorithm based on the context, ensuring that the data remains useful while still protecting sensitive information. Algorithms might include techniques like partial masking, randomization, or tokenization.

- **Algorithm Complexity and Performance**: The choice of masking algorithm has a direct impact on performance. DDM needs to balance the security provided by the algorithm with the need to minimize performance overhead, ensuring that query execution times remain acceptable.

### Performance

Given the dynamic nature of DDM, one of the critical challenges is minimizing the performance overhead associated with real-time masking. This involves optimizing the masking logic to ensure that it is both efficient and scalable, particularly in high-traffic environments.

## Database Support
## Which Databases Support Dynamic Data Masking

| Databases | Supported |
| ---------- | --------------------------------------------------------------------------------------------------- |
Expand All @@ -82,11 +60,62 @@ CREATE OR REPLACE MASKING POLICY email_mask AS (val string) RETURNS string ->
ALTER TABLE IF EXISTS user_info MODIFY COLUMN email SET MASKING POLICY email_mask;
```

Database engine only provides the data masking primitives. Holistically configuring the masking policy for
an entire organization is still a big challenge.
Database engines only provide masking primitives. Holistically configuring masking policies for an entire organization — across multiple databases, environments, and user roles — is still a big challenge. For database-specific guides, see [Data Masking for MySQL](/blog/mysql-data-masking/) and [Data Masking for PostgreSQL](/blog/postgres-data-masking/). For Snowflake specifically, see [Snowflake Dynamic Data Masking and Alternatives](/blog/snowflake-dynamic-data-masking-and-alternatives/).

## How Bytebase Handles Dynamic Data Masking

[Bytebase](https://docs.bytebase.com/security/data-masking/overview/) implements DDM at the application layer rather than relying on database-native features. All queries through Bytebase's SQL Editor are masked in real-time based on policies you define. This is particularly valuable for MySQL and PostgreSQL, which have no native DDM support.

### Supported databases

Bytebase DDM works with MySQL, PostgreSQL, Oracle, TiDB, and others — the same masking policies apply across all of them, regardless of whether the engine has native DDM.

### How masking is configured

Bytebase uses a three-level policy system:

1. **Global masking rules** — workspace admins apply batch masking to columns matching a name pattern (e.g., all columns named `ssn` or `email` across every database)
2. **Column-level masking** — project owners set masking on specific table columns
3. **Masking exemptions** — grant specific users access to unmasked data when needed

Precedence: exemptions > global rules > column masking.

Policies are organized around **semantic types** — you classify columns (e.g., "PII-email", "PII-phone") and attach a masking algorithm to the type. Changing one semantic type updates masking for all columns tagged with it.

### Masking algorithms

Five built-in algorithms:

| Algorithm | Example | Use case |
|-----------|---------|----------|
| Full mask | `123456789` → `*` | Completely hide the value |
| Range mask | `john@example.com` → `john@****` | Preserve prefix for usability |
| Inner mask | `123456` → `12**56` | Show edges, hide middle |
| Outer mask | `123456` → `**34**` | Show middle, hide edges |
| MD5 mask | `value` → `2063c1608d6e0baf80249c42e2be5804` | Irreversible hash for analytics |

### Infrastructure as code

Masking policies can be managed via [Bytebase's Terraform provider](https://docs.bytebase.com/tutorials/manage-data-masking-with-terraform/) — define semantic types, global rules, and column masking in HCL and apply across environments.

### Availability

Dynamic Data Masking is available on the [Enterprise plan](https://www.bytebase.com/pricing/). DDM is one part of Bytebase's broader [database access control](/blog/database-access-control-best-practices/) capabilities, which also include role-based access, [just-in-time access](/blog/just-in-time-database-access/), and [audit logging](/blog/database-audit-logging/).

## FAQ

**What is Dynamic Data Masking?**

Dynamic Data Masking (DDM) protects sensitive data by altering query results in real-time based on user roles and policies, without changing the data at rest. Unlike static data masking, which creates a permanent sanitized copy, DDM applies masking on-the-fly during query execution.

**Which databases support Dynamic Data Masking natively?**

Oracle, SQL Server, BigQuery, and Snowflake have built-in DDM features. MySQL and PostgreSQL do not support DDM natively. Bytebase provides application-layer DDM for MySQL, PostgreSQL, Oracle, TiDB, and others, using the same policies across all engines.

**How does Bytebase implement DDM for MySQL and PostgreSQL?**

<HintBlock type="info">
Bytebase applies masking at the application layer when queries run through its SQL Editor. No database extensions, views, or plugins are required. You define masking policies centrally in Bytebase, and they apply consistently across all connected databases.

Bytebase provides an UI interface as well as API to [configure Dynamic Data Masking](https://docs.bytebase.com/security/data-masking/overview/). In particular, Bytebase supports MySQL and PostgreSQL.
**What is the difference between Dynamic Data Masking and Static Data Masking?**

</HintBlock>
Static Data Masking (SDM) creates a permanent, altered copy of production data for use in non-production environments. Dynamic Data Masking (DDM) modifies data on-the-fly as it is queried, without changing the underlying data. SDM is for dev/test environments; DDM is for production access control.