Skip to content

feat: implement session-based tracking for malicious behavior#430

Open
Jazzcort wants to merge 1 commit into
rhel-lightspeed:mainfrom
Jazzcort:hardening-security-for-malicious-attemps
Open

feat: implement session-based tracking for malicious behavior#430
Jazzcort wants to merge 1 commit into
rhel-lightspeed:mainfrom
Jazzcort:hardening-security-for-malicious-attemps

Conversation

@Jazzcort
Copy link
Copy Markdown
Contributor

Adds BehaviorRecord and BehaviorRecordManager to monitor gatekeeper verdicts per session. If a session accumulates 10 total or 3 consecutive malicious actions, it is permanently flagged as malicious. Once flagged, all subsequent scripts executed in that session will be forced to require user confirmation, regardless of the script's default requirements.

@github-actions
Copy link
Copy Markdown

For team members: test commit 5439742 in internal GitLab

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 84.90566% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/linux_mcp_server/tools/run_script.py 84.90% 4 Missing and 4 partials ⚠️
Flag Coverage Δ
unittests 97.10% <84.90%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/linux_mcp_server/tools/run_script.py 93.72% <84.90%> (-2.60%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@owtaylor owtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure this is going to be effective:

  • A malicious model can easily evade a "consecutive malicious scripts" checks - also a malicious script for readonly that modifies the system or installs things from the internet is currently often tagged as MODIFIES_SYSTEM or POLICY, so a malicious
  • 10 tries is a lot, by then the game might be up.

Maybe:

  • Any MALICIOUS script triggers confirmation of the next run_script (but not the whole session)

  • For the mcp-apps case, we display a boxed warning in that confirmation:

    Suspicious activity detected - your chat client may be under attack. Please examine previous tool calls in detail, and if you have any doubts, do not approve this command and terminate this chat session.

There's not much we can do about the non-mcp-apps case.

Comment on lines +163 to +164
A session is flagged as malicious if it accumulates 10+ total malicious actions
or MAX_CONSECUTIVE_MALICIOUS_ACTIONS consecutive malicious actions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little odd to have one a constant and the other not

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I'll have a MAX_TOTAL_MALICIOUS_ACTIONS for that. What would be the suitable count for it? 4 maybe? 🤔

"""

def __init__(self):
self._recent_record: deque[GatekeeperStatus] = deque(maxlen=MAX_CONSECUTIVE_MALICIOUS_ACTIONS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deque seems unnecessary - can you just have _consecutive_malicious_action_acounts and reset it to zero if you get some other status.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! It can be simplified 😁

@Jazzcort Jazzcort force-pushed the hardening-security-for-malicious-attemps branch from 5439742 to 0951ca0 Compare May 21, 2026 19:28
@github-actions
Copy link
Copy Markdown

For team members: test commit 0951ca0 in internal GitLab

@Jazzcort Jazzcort marked this pull request as ready for review May 21, 2026 19:47
@Jazzcort Jazzcort requested a review from a team as a code owner May 21, 2026 19:47

@property
def is_previous_action_malicious(self) -> bool:
return self._previous_action_status == GatekeeperStatus.MALICIOUS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be cleared except by an approval - that is:

validate_script => MALICIOUS - sets flag
validate_script => OK => flag stays set
execute_script => flag is cleared

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense! Will change!


@property
def is_security_compromised(self) -> bool:
return self._is_security_compromised
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model is security compromised, then we have no business allowing the user to execute anything.

What if we unified the two flags into "malicious_activity_warning" . which gets set transiently on a single MALICIOUS and permanently passed the thresholds.

This differs from your setup in that we'd display the scary warning for a single malicious activity detection, but I think that's probably appropriate. An attacker only has to get through once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I'm always bad at naming these. Will change!

@Jazzcort Jazzcort force-pushed the hardening-security-for-malicious-attemps branch from 0951ca0 to cf62e35 Compare May 25, 2026 15:33
@github-actions
Copy link
Copy Markdown

For team members: test commit cf62e35 in internal GitLab

Add BehaviorRecord and BehaviorRecordManager to track gatekeeper
verdicts per session. A single MALICIOUS verdict sets a temporary
warning that forces human confirmation on the next script; the warning
clears once the human approves. If a session accumulates 4 total or 3
consecutive malicious verdicts, it is permanently flagged and all
subsequent scripts require confirmation.

Integrate behavior tracking into validate_script, run_script,
run_script_interactive, run_script_with_confirmation, and
execute_script. Emit malicious_activity_warning through the
RunScriptInteractiveResult for the mcp-app UI.

Add documentation for session-based behavior tracking in
guarded-command-execution.md.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@Jazzcort Jazzcort force-pushed the hardening-security-for-malicious-attemps branch from cf62e35 to bf4a203 Compare May 26, 2026 13:23
@github-actions
Copy link
Copy Markdown

For team members: test commit bf4a203 in internal GitLab

@Jazzcort Jazzcort requested a review from owtaylor May 26, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants