Skip to content

Commit 0498ae2

Browse files
committed
updates
1 parent 7427ddd commit 0498ae2

8 files changed

Lines changed: 92 additions & 36 deletions

File tree

docs/checks/base64_check.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,43 @@
11
# Base64 Statements
22

3-
Python Code Audit checks for obfuscated text, particularly content encoded with `base64`:
3+
The Python Code Audit tool detects obfuscated content, particularly code that uses `base64` (and related encodings) for encoding or decoding data.
44

5-
* `base64` Encoding / Decoding.
5+
It specifically checks for the following calls:
66

7+
* `base64.b64decode`
8+
* `base64.b64encode`
9+
* `base64.b85encode`
10+
* `base64.z85decode`
711

812
## Rationale
913

14+
Obfuscation using Base64 is a **long-standing and simple technique** commonly employed to conceal malicious code in Python projects. It enables attackers to hide payloads that would otherwise be easily identified.
1015

11-
Obfuscation is a long-standing and straightforward technique often used to conceal malicious code within Python projects. This technique allows attackers to easily hide malware within Python programs.
16+
The use of obfuscated content is uncommon in well-structured, legitimate Python code and is therefore considered a strong indicator of potential security risks.
1217

13-
The presence of obfuscated content is atypical in well-structured, non-malicious Python code and is a significant indicator of potential security risks.
18+
It is strongly recommended that any code containing Base64 encoding/decoding be carefully reviewed before deployment to production. **Python Code Audit** performs this check automatically.
1419

20+
**Key red flags include:**
21+
* `base64.b64decode` followed immediately by `exec()` or `eval()`
22+
* Long Base64 strings embedded in Python scripts
23+
* Constructs such as `exec(base64.b64decode(...))` from untrusted sources
1524

16-
It’s recommended to review any code deployed to production using `base64` encoding. **Python Code Audit** does this automatically.
25+
## Common Malware Patterns
26+
27+
Base64 encoding patterns are frequently found in Python-based malware and droppers:
28+
29+
| Pattern | Code Snippet | Why It Is Detected | Implemented |
30+
|----------------------|---------------------------------------------------|--------------------------------------------------|-------------|
31+
| Standard b64 + exec | `exec(base64.b64decode(long_string))` | Extremely common obfuscation technique ||
32+
| Compressed | `exec(zlib.decompress(base64.b64decode(...)))` | Suggests larger hidden payload and evasion ||
33+
| Multi-layer | `base64.b64decode(base64.b64decode(...))` | Attempts to bypass simple pattern matching ||
34+
| Bytes decode | `exec(base64.b64decode(data).decode())` | Hides intent by decoding to string ||
35+
| Using aliases | `b64 = base64.b64decode; exec(b64(payload))` | Evasion of basic static analysis ||
36+
| Z85 / b85 | `base64.b85decode(...)` or `base64.z85decode(...)` | Non-standard encodings often indicate stealth ||
37+
38+
## Security Considerations
39+
40+
Base encoding does not provide confidentiality. As noted in RFC 4648 (Section 12), care must be taken when implementing base encoding and decoding to avoid introducing vulnerabilities.
1741

1842
Security considerations section from RFC 4648 (section 12):
1943

@@ -55,8 +79,10 @@ Security Considerations
5579
distribution.
5680
```
5781

58-
## More information
5982

60-
* https://docs.python.org/3/library/base64.html#base64-security
61-
* https://datatracker.ietf.org/doc/html/rfc4648.html#page-14
62-
* [Base64 Malleability in Practice](https://eprint.iacr.org/2022/361.pdf)
83+
## References
84+
85+
* [Python Documentation – base64](https://docs.python.org/3/library/base64.html)
86+
* [RFC 4648 – Security Considerations](https://datatracker.ietf.org/doc/html/rfc4648#section-12)
87+
* [Base64 Malleability in Practice](https://eprint.iacr.org/2022/361.pdf)
88+

docs/features.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Features
22

3+
**Python Code Audit** is a modern security-focused source code analysis tool for Python, built on a **zero-trust** mindset. It identifies security risks, hidden behaviours, and trust boundaries without ever executing the code. This makes it safe to use on both your own projects and third-party code.
34

4-
**Python Code Audit** is a modern Python **security** source code analysis tool built on a *zero-trust* mindset. It focuses on identifying security risks, hidden behaviors, and trust boundaries in Python code—without executing it.
55

66
:::{admonition} Key Features of Python Code Audit
77
:class: tip

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ name = "codeaudit"
77
dynamic = ["version"] # This tells Hatch that version is dynamically determined
88
description = 'A modern Python security source code analyzer (SAST) based on distrust.'
99
readme = "README.md"
10-
dependencies = ["fire==0.7.1","pandas==3.0.2","altair==6.0.0"]
10+
dependencies = ["fire==0.7.1","pandas>=2.3","altair==6.0.0"]
1111
requires-python = ">=3.11"
1212
license = "GPL-3.0-or-later"
1313
keywords = ["SAST", "Python SAST", "SAST API", "Complexity Checker"]

src/codeaudit/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# SPDX-FileCopyrightText: 2025-present Maikel Mardjan
22
#
33
# SPDX-License-Identifier: GPL-3.0-or-later
4-
__version__ = "1.6.5"
4+
__version__ = "1.6.6a0"

src/codeaudit/data/sastchecks.csv

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,10 @@ Subprocess Usage,subprocess.check_output,Medium,Requires careful input validatio
5252
Subprocess Usage,subprocess.getstatusoutput,Medium,Requires careful input validation to prevent command injection vulnerabilities.
5353
Subprocess Usage,subprocess.getoutput,Medium,Requires careful input validation to prevent command injection vulnerabilities.
5454
Tarfile Extraction,tarfile.TarFile,High,Vulnerable to path traversal attacks if used with untrusted archives.
55-
Base64 Encoding ,base64,Low,"Base64 encoding is not for security. It only visually hides data and provides no confidentiality. Often used to obfuscate malware in code."
55+
Base64 Decoding ,base64.b64decode,Medium,"Base64 encoding/decoding is not for security. It only visually hides data and provides no confidentiality. Often used to obfuscate malware in code."
56+
Base64 Decoding ,base64.z85decode,Medium,Base64 encoding/decoding is not for security. It only visually hides data and provides no confidentiality. Often used to obfuscate malware in code.
57+
Base64 Decoding ,base64.b85encode,Low,Base64 encoding/decoding is not for security. It only visually hides data and provides no confidentiality. Often used to obfuscate malware in code.
58+
Base64 Decoding ,base64.b64encode,Low,Base64 encoding/decoding is not for security. It only visually hides data and provides no confidentiality. Often used to obfuscate malware in code.
5659
XML-RPC Client,xmlrpc.client,High,Vulnerable to denial-of-service via decompression bombs.
5760
XML-RPC Server,xmlrpc.server.SimpleXMLRPCServer,High,Vulnerable to denial-of-service via decompression bombs.
5861
Cryptographically Unsafe Randomness,random.random,Low,The pseudo-random generators in this module are not suitable for security purposes.

tests/test_base64.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SPDX-FileCopyrightText: 2025-present Maikel Mardjan(https://nocomplexity.com/) and all contributors!
2+
# SPDX-License-Identifier: GPL-3.0-or-later
3+
4+
import pytest
5+
from pathlib import Path
6+
7+
from codeaudit.security_checks import perform_validations
8+
9+
10+
def test_base64_use():
11+
current_file_directory = Path(__file__).parent
12+
13+
# validation1.py is in a subfolder:
14+
validation_file_path = current_file_directory / "validationfiles" / "base64.py"
15+
16+
result = perform_validations(validation_file_path)
17+
18+
# actual_data = find_constructs(source, constructs)
19+
actual_data = result["result"]
20+
21+
# This is the expected dictionary
22+
expected_data = {
23+
"base64.b64encode": [9],
24+
"base64.b64decode": [12, 15],
25+
"base64.z85decode": [13],
26+
"exec": [16],
27+
"base64.b85encode": [13],
28+
}
29+
30+
# Assert that the actual data matches the expected data
31+
assert actual_data == expected_data

tests/test_standardlibconstructs.py

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -42,26 +42,6 @@ def test_os_interfaces():
4242
assert actual_data == expected_data
4343

4444

45-
def test_base64encoding():
46-
current_file_directory = Path(__file__).parent
47-
48-
# validation1.py is in a subfolder:
49-
validation_file_path = current_file_directory / "validationfiles" / "base64.py"
50-
51-
source = read_in_source_file(validation_file_path)
52-
53-
constructs = {"base64"}
54-
actual_data = find_constructs(source, constructs)
55-
56-
# This is the expected dictionary
57-
expected_data = {
58-
"base64": [2, 3],
59-
}
60-
61-
# Assert that the actual data matches the expected data
62-
assert actual_data == expected_data
63-
64-
6545
def test_httpserver_usage():
6646
current_file_directory = Path(__file__).parent
6747

tests/validationfiles/base64.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
1+
# SPDX-FileCopyrightText: 2026-present Maikel Mardjan(https://nocomplexity.com/) and all contributors!
2+
# SPDX-License-Identifier: GPL-3.0-or-later
3+
14
import base64
2-
encoded = base64.b64encode(b'data to be encoded')
3-
data = base64.b64decode(encoded)
4-
print(data)
5+
6+
payload = b"import os; os.system('malicious command')"
7+
8+
# Encoding (attacker side)
9+
encoded_b64 = base64.b64encode(payload)
10+
11+
# Decoding patterns (common in malware)
12+
decoded1 = base64.b64decode(encoded_b64) # Most common
13+
decoded2 = base64.z85decode(base64.b85encode(payload)) # Less common
14+
15+
b64 = base64.b64decode
16+
exec(b64(payload)) # alias use should be detected!
17+
18+
19+
print("b64 encoded :", encoded_b64)
20+
print("b64 decoded :", decoded1)

0 commit comments

Comments
 (0)