feat: add "Include defanged" arg to Extract IP addresses#2395
Open
HarelKatz wants to merge 7 commits into
Open
feat: add "Include defanged" arg to Extract IP addresses#2395HarelKatz wants to merge 7 commits into
HarelKatz wants to merge 7 commits into
Conversation
Adds an "Include defanged" boolean arg (default false) and refactors the IPv4/IPv6/removeLocal regex assembly to use dotSep/colonSep variables. With the arg off the variables collapse to the original literal separators, so the regex string is byte-identical to the pre-change op. Behaviour unchanged; no new tests yet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds an opt-in
Include defangedboolean argument (defaultfalse) to theExtract IP addressesoperation. When enabled, the op also extracts IPv4 addresses defanged with[.]and IPv6 addresses defanged with[:], including partial defangs (e.g.192.168[.]1.1or1[.]2.3[.]4).The default behaviour is unchanged — when the new arg is off, the same regex strings are used as before, byte-for-byte.
Examples (with the new arg on):
192[.]168[.]1[.]1→ matched as-is192.168[.]1.1→ matched as-is (partial defang)2001[:]db8[:][:]1→ matched as-is0123[.]0177[.]0234[.]0377→ matchedplain 10.0.0.1 defanged 8[.]8[.]8[.]8→ both extractedremoveLocalfilters defanged local addresses (10[.]0[.]0[.]1, partials too)Matches are returned verbatim (no auto-refanging) so the op stays composable: chain
Fang IP Addressesafter if clean output is needed.Existing Issue
No issue specifically requests this feature. Related: #1721 (open, pre-existing IPv6 regex bug) — see "Known limitation" below.
Screenshots
N/A — no visual changes; the op gets one extra boolean in the args panel.
AI disclosure
I used Claude Code (Anthropic's CLI) to assist with implementation: researching defanging notations in prior art (iocextract, the in-progress
draft-grimminck-safe-ioc-sharingIETF draft, etc.), drafting the design spec, writing the tests, and verifying behaviour end-to-end via the dev server in a browser. I have reviewed all code being submitted and can answer questions about it.Test Coverage
20 new tests in
tests/operations/tests/ExtractIPAddresses.mjscovering: backwards-compat (defang off ignores defanged input), fully-defanged IPv4/IPv6/IPv6-shorthand, partial defangs (decimal + octal), mixed plain+defanged input,removeLocalfiltering full and partial defanged RFC1918 addresses,uniquekeeping plain and defanged as distinct entries,sortinteraction (plain numeric first, defanged lexical after), IPv6 URI-literal handling, empty input, trailing-digit trap,displayTotalcount, IPv4-toggle-off, IPv6-toggle-off, and pure-ASCII-no-IPs.All 11 pre-existing Extract IP tests still pass unchanged.
npm run lint,npm test(1952/1952), andnpm run testnodeconsumerall pass locally. Behaviour was also verified end-to-end in the dev server via 11 browser scenarios.Scope note
This PR intentionally only supports
[.]and[:](CyberChef's own Defang IP output). Other notations like(.),\., etc. that other tools (e.g. iocextract) produce are out of scope — narrower scope means lower false-positive risk. Easy to add later as additive options if requested.Known limitation (pre-existing — see #1721)
The IPv6 regex contains a
(?!.*::.+::)negative lookahead that spans the entire input via.*. If the input contains two::-shortened IPv6 addresses, only one matches. This is the bug already tracked in #1721 (introduced by #1661) and affects plain inputs today — e.g.2001:db8::1 and 2001:db8::2currently returns only2001:db8::2. With this PR, the trigger surface grows slightly because defanged[:][:]also counts toward the two-::count. I did not address this here because fixing the IPv6 regex is out of scope for an additive feature change, but happy to do a follow-up PR targeting #1721 if useful.