|
| 1 | +# On-Prem Data Verifier |
| 2 | + |
| 3 | +A robust validation tool written in Go, designed to verify Looker On-Premise backup artifacts before migration to Looker |
| 4 | +Cloud (hosted). |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +Migrations are sensitive operations. This tool ensures that the data provided by the customer is: |
| 9 | + |
| 10 | +1. **Complete:** Verifies the presence of all required encrypted and decrypted artifacts. |
| 11 | + |
| 12 | +2. **Integrity Verified:** Matches the MD5 checksums generated at the source. |
| 13 | + |
| 14 | +3. **Secure:** Confirms files are encrypted for the correct **Looker Public Key** (dynamically resolved via the |
| 15 | + Customer's LUID = Looker Unique ID). |
| 16 | + |
| 17 | +4. **Valid:** The SQL dump is structurally sound, performant (Extended Inserts), uses the correct Charset (utf8mb4), and |
| 18 | + is for supported MYSQL version. |
| 19 | + |
| 20 | +5. **Recoverable:** Customer Master Key (CMK), used for decrypting DB contents, is correct. |
| 21 | + |
| 22 | +## Prerequisites |
| 23 | + |
| 24 | +The tool relies on the system's GPG installation to verify encryption keys. |
| 25 | + |
| 26 | +* **Go** 1.20+ (to build) |
| 27 | +* **GnuPG (gpg)** installed and on the system PATH (ie. `apt install gpg`). |
| 28 | +* **Imported Key:** You must import the customer's specific GPG Public Key into your local keyring before running the |
| 29 | + tool. |
| 30 | + ```bash |
| 31 | + # Example (This is usually done via the customer script instructions) |
| 32 | + echo "PUBLIC_KEY_BLOCK" | gpg --import |
| 33 | + ``` |
| 34 | + |
| 35 | +## Build |
| 36 | + |
| 37 | +```bash |
| 38 | +go build -o onprem-verifier main.go |
| 39 | +``` |
| 40 | + |
| 41 | +## Usage |
| 42 | + |
| 43 | +The tool operates on a **Workspace Directory**. This is a dedicated local directory on your machine where you |
| 44 | +consolidate |
| 45 | +all the backup files related to a single customer migration. This directory must contain both the *encrypted bundle* |
| 46 | +sent by the customer and the *decrypted artifacts* you extracted from that bundle. Think of it as a staging area for all |
| 47 | +the files the verifier needs to check. |
| 48 | + |
| 49 | +### Command |
| 50 | + |
| 51 | +```bash |
| 52 | +./onprem-verifier \ |
| 53 | + --backupDir /path/to/migration_workspace \ |
| 54 | + --customerName "lookersre-instance-1" \ |
| 55 | + --luid "7b973058-f7e0-49f7-9262-54e1987659bc" |
| 56 | +``` |
| 57 | + |
| 58 | +| Flag | Shorthand | Required | Description | |
| 59 | +|:-----------------|:----------|:---------|:--------------------------------------------------------------------------| |
| 60 | +| `--backupDir` | `-b` | **Yes** | Path to the workspace directory containing ALL exported backup files. | |
| 61 | +| `--customerName` | `-c` | **Yes** | Customer Name (e.g., `lookersre-instance-1`). Used to validate filenames. | |
| 62 | +| `--luid` | `-l` | **Yes** | Looker User ID (e.g., `7b97...`). Used to resolve the GPG Public Key. | |
| 63 | + |
| 64 | +## Workspace Requirements |
| 65 | + |
| 66 | +The tool enforces a strict naming convention. The directory provided via `--backupDir` must contain **exactly** the |
| 67 | +following 7 files (where `${customer}` matches the `--customerName` flag): |
| 68 | + |
| 69 | +1. **Encrypted Artifacts (Source)** |
| 70 | + |
| 71 | +* `${customer}_looker_db_backup.sql.gz.enc` |
| 72 | +* `${customer}_looker_fs_backup.tar.gz.enc` |
| 73 | +* `${customer}_looker_cmk_key.enc` |
| 74 | +* `${customer}_backup.md5` |
| 75 | + |
| 76 | +2. **Decrypted Artifacts (Target)** |
| 77 | + |
| 78 | +* `${customer}_looker_db_backup.sql.gz` |
| 79 | +* `${customer}_looker_fs_backup.tar.gz` |
| 80 | +* `${customer}_looker_cmk_key` |
| 81 | + |
| 82 | +## Validation Pipeline |
| 83 | + |
| 84 | +The tool executes checks in the following order. If any step fails, the process terminates. |
| 85 | + |
| 86 | +### 1. Workspace Structure |
| 87 | + |
| 88 | +* **Action:** Checks if all 7 required files exist in `--backupDir`. |
| 89 | +* **Goal:** Fail fast if data is missing or named incorrectly. |
| 90 | + |
| 91 | +### 2. Integrity Check |
| 92 | + |
| 93 | +* **Action:** Parses `${customer}_backup.md5` and verifies the hash of every file listed. |
| 94 | +* **Goal:** Ensure no file corruption occurred during transfer. |
| 95 | + |
| 96 | +### 3. Security Check (Dynamic GPG) |
| 97 | + |
| 98 | +* **Action:** |
| 99 | + 1. Constructs the expected migration email: `looker-devops+migration-{LUID}@google.com`. |
| 100 | + 2. Queries your local GPG keyring (`gpg --list-keys`) to find **ALL** Key IDs (Primary + Subkeys) associated with |
| 101 | + that email. |
| 102 | + 3. Inspects the headers of `.enc` files (`gpg --list-packets`) to ensure they are encrypted for one of those valid |
| 103 | + Key IDs. |
| 104 | +* **Goal:** Prevent importing data encrypted with the wrong key. |
| 105 | + |
| 106 | +### 4. Database Validation |
| 107 | + |
| 108 | +* **Action:** Streams the `.sql.gz` file (single-pass scan) to analyze content without full decompression. |
| 109 | +* **Checks:** |
| 110 | + * **Looker Version:** Must match the supported version list. |
| 111 | + * **Charset:** Must be `utf8mb4` (Looker Requirement). |
| 112 | + * **Extended Inserts:** Ensures `INSERT` statements are batched (Critical for performance). |
| 113 | + * **Critical Tables:** Verifies existence of `user`, `dashboard`, `db_connection`. |
| 114 | + |
| 115 | +### 5. CMK Validation |
| 116 | + |
| 117 | +* **Action:** Reads the `${customer}_looker_cmk_key`. |
| 118 | +* **Check:** Validates the key is either 32 bytes (Raw) or 44 bytes (Base64). |
| 119 | + |
| 120 | +### 6. FileSystem Analysis |
| 121 | + |
| 122 | +* **Action:** Analyzes the `${customer}_looker_fs_backup.tar.gz` archive size and metadata. |
| 123 | + |
| 124 | +## Output |
| 125 | + |
| 126 | +### Console (STDOUT) |
| 127 | + |
| 128 | +Clean, colored, step-by-step logs indicating progress and specific validation results. |
| 129 | + |
| 130 | +```text |
| 131 | +=== Looker On-Prem Verification Pipeline === |
| 132 | +
|
| 133 | +>> [1/6] Checking Workspace Structure |
| 134 | + Directory: /workspace |
| 135 | + [OK] Workspace structure verified |
| 136 | +
|
| 137 | +>> [2/6] Verifying MD5 Checksums |
| 138 | + [OK] All files match their checksums |
| 139 | +
|
| 140 | +>> [3/6] Resolving Security Keys |
| 141 | + [OK] Found Valid Key IDs: [000ECF...] |
| 142 | + [OK] sm-restore_looker_db_backup.sql.gz.enc is encrypted correctly |
| 143 | + ... |
| 144 | +
|
| 145 | +>> [4/6] Analyzing Database: sm-restore_looker_db_backup.sql.gz |
| 146 | + [OK] Version 25.18.33 is supported |
| 147 | + [OK] Database Charset: utf8mb4 |
| 148 | + [OK] Extended Inserts detected |
| 149 | + [OK] Critical tables verified: [user dashboard db_connection] |
| 150 | +
|
| 151 | +[SUCCESS] VERIFICATION COMPLETE |
| 152 | +Customer: sm-restore |
| 153 | +LUID: d5c8... |
| 154 | +Duration: 6.54s |
| 155 | +``` |
| 156 | + |
| 157 | +### Report File |
| 158 | + |
| 159 | +A JSON report is generated at `metadata.json` in the current directory or specified path. |
| 160 | + |
| 161 | +```json |
| 162 | +{ |
| 163 | + "customer_name": "lookersre-instance-1", |
| 164 | + "instance_id": "7b973058-f7e0-49f7-9262-54e1987659bc", |
| 165 | + "generated_at": "2026-01-16T15:30:00Z", |
| 166 | + "fs_total_size_bytes": 5368709120, |
| 167 | + "db_total_size_bytes": 1073741824, |
| 168 | + "table_count": 142, |
| 169 | + "cmk_status": "Valid", |
| 170 | + "cmk_encoding": "Base64", |
| 171 | + "looker_version": "25.18.33", |
| 172 | + "duration_in_seconds": 6.54 |
| 173 | +} |
| 174 | +``` |
0 commit comments