Fix: Better anonymization#49
Conversation
Signed-off-by: Squidly271 <unraidsquid@gmail.com>
WalkthroughThe diagnostics configuration for the Tailscale plugin was updated to enhance data redaction by adding new regex filters for sensitive information. The log collection process was refined to target specific log files, and the output of a diagnostic command was limited to its first line. Certain configuration files were excluded from collection. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DiagnosticsCollector
participant System
User->>DiagnosticsCollector: Initiate diagnostics collection
DiagnosticsCollector->>System: Run "tailscale lock status" (capture first line)
DiagnosticsCollector->>System: Collect specified log files
DiagnosticsCollector->>System: Exclude SSH/nginx configs
DiagnosticsCollector->>DiagnosticsCollector: Apply regex redaction filters
DiagnosticsCollector-->>User: Provide redacted diagnostics data
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/usr/local/emhttp/plugins/tailscale/diagnostics.json (1)
5-11: Regex filters expanded for enhanced anonymization
The new filters cover magicDNS domains, email addresses, MAC addresses, and multiple IPv6 formats. These additions significantly improve redaction coverage—but please verify each pattern against edge cases (e.g., uncommon IPv6 shorthand, bracketed addresses, mixed‐case hex, varied delimiters).
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/usr/local/emhttp/plugins/tailscale/diagnostics.json(3 hunks)
🔇 Additional comments (2)
src/usr/local/emhttp/plugins/tailscale/diagnostics.json (2)
28-28: Limittailscale lock statusoutput to first line
Piping throughhead -1correctly restricts the output. Confirm thatheadis present in all supported environments or consider using a built‐in shell fallback (sed '1q') if portability is a concern.
76-78: Restrict log file collection to explicit Tailscale logs
Explicitly listingtailscale.log,tailscale-utils.log, andtailscale-error.logavoids inadvertently including un‐anonymized rotated files. Please verify these filenames exist on target systems and consider whether any rotated variants (e.g.,.1,.gz) should also be captured and masked.
@coderabbitai - Already did except for some very sharp edge cases |
|
Tip For best results, initiate chat on the files or code changes. @Squidly271 Great work on implementing the enhanced anonymization! I appreciate your transparency about the edge cases that might still remain. The regex filters you've added for email addresses, MAC addresses, and IPv6 addresses in the diagnostics configuration look comprehensive. If you'd like me to help identify any potential edge cases that might still need coverage, or if you want to discuss specific patterns that are proving challenging to anonymize, feel free to let me know. The current implementation appears to handle the common cases well, and it's perfectly reasonable to address edge cases incrementally as they're discovered in practice. |
| "/etc/ssh/sshd_config", | ||
| "/etc/nginx/conf.d/servers.conf", | ||
| "/etc/samba/smb-names.conf", | ||
| "/var/log/tailscale*" |
There was a problem hiding this comment.
Did you check to see if the .gz files were anonymized? plugin-diagnostics is smarter than I think you gave it credit for :)
| "file": "tailscale-debug-prefs.txt" | ||
| }, | ||
| { | ||
| "command": "tailscale lock status", |
There was a problem hiding this comment.
tailscale lock status doesn't contain any sensitive information (public keys, but nothing more than that), and this removes valuable information (specifically, nodes that are currently locked out).
If the public keys are really a concern, then it would be better to apply a command-specific filter to substitute just the ID part of the keys.
|
Apologies with regards to the .gz files. You are correct. At the time of creating the regex I was informed that they were not being touched (my installs didn't have any rotated logs). Will revert that. But, just noticed that these regex's are over anonymizing and a change which I had already tested somehow didn't make it into this (timestamps are getting anonymized in this version). Re lock status, while we both realize that a public key is no problem, general users would likely not understand the difference between public / private and would question simply seeing something like that in, hence the removal. But, our installs used for testing didn't show any of the additional information -> only the status and the key so we never realized it was there. I'll revamp. Thanks! |
Anonymization email, IPv6, and MAC addresses
Do not include non-anonymized rotated log files
Summary by CodeRabbit