Skip to content

Latest commit

 

History

History
128 lines (95 loc) · 2.81 KB

File metadata and controls

128 lines (95 loc) · 2.81 KB

MongoDB Setup

Database Structure

  • Database: userdb
  • Collection: users
  • Records: 10 users with mock PII data

PII Data Included

Each user document contains:

  • First Name & Last Name
  • Email Address
  • Phone Number
  • Social Security Number (SSN)
  • Date of Birth
  • Full Address (street, city, state, zip, country)
  • Timestamps (createdAt, updatedAt)

Available Scripts

  1. setup_database.js - Creates the database with mock PII data
  2. anonymize_data.js - Simple anonymization (User 1, User 2, etc.)
  3. anonymize_with_hash.js - Hash-based anonymization (non-reversible)
  4. restore_original_data.js - Restores original mock PII data

Setup Instructions

1. Create Database with Mock PII Data

# Connect to MongoDB and run the setup script
mongosh < setup_database.js

Or run it directly in mongosh:

# Connect to MongoDB
mongosh

# Load and execute the script
load("setup_database.js")

2. Anonymize PII Data

Option A: Simple Anonymization

# Anonymizes to User 1, User 2, etc.
mongosh < anonymize_data.js

Option B: Hash-Based Anonymization

# Uses one-way hash functions for consistent anonymization
mongosh < anonymize_with_hash.js

3. Restore Original Data (if needed)

# Restores the original mock PII data
mongosh < restore_original_data.js

Verify the Setup

// Connect to the database
use userdb;

// Count documents
db.users.countDocuments();


## Anonymization Details

### Simple Anonymization (anonymize_data.js)
- Names  "User [ID]"
- Email  "user[ID]@anonymized.local"
- Phone  "+1-XXX-XXX-[ID]"
- SSN  "XXX-XX-[ID]"
- Date of Birth  Year only (YYYY-01-01)
- Address  Redacted values

### Hash-Based Anonymization (anonymize_with_hash.js)
- Uses one-way hash functions
- Creates consistent but non-reversible values
- Original data cannot be recovered
- Maintains referential consistency
- Stores hash of original email for tracking

### Comparison

| Method | Pros | Cons | Use Case |
|--------|------|------|----------|
| Simple | Easy to understand, predictable | Pattern visible | Testing, demos |
| Hash-Based | More realistic, non-reversible | Less predictable | Production-like testing |

// View all users db.users.find().pretty();

// Find a specific user by email db.users.findOne({ email: "john.smith@email.com" });

// View indexes db.users.getIndexes();


## Sample Queries

```javascript
// Find users by city
db.users.find({ "address.city": "New York" });

// Find users born after 1990
db.users.find({ dateOfBirth: { $gte: new Date("1990-01-01") } });

// Update a user's phone number
db.users.updateOne(
  { email: "john.smith@email.com" },
  { $set: { phone: "+1-555-9999", updatedAt: new Date() } }
);

// Delete a user
db.users.deleteOne({ email: "john.smith@email.com" });