Skip to content

add auto benchmark scripts#804

Open
LuoPengcheng12138 wants to merge 3 commits into
mainfrom
feat/benchmark_scripts
Open

add auto benchmark scripts#804
LuoPengcheng12138 wants to merge 3 commits into
mainfrom
feat/benchmark_scripts

Conversation

@LuoPengcheng12138
Copy link
Copy Markdown
Contributor

@LuoPengcheng12138 LuoPengcheng12138 commented Dec 24, 2025

Description

Firstly

npm run dev

Secondly

  python3 benchmark/benchmark_scripts.py \
    --chat-base http://localhost:5002 \
    --email example@example.com \
    --model-source manual \
    --model-platform anthropic \
    --model-type claude-haiku-4-5-20251001 \
    --api-key sk-example \
    --api-url https://api.anthropic.com/v1/ \
    --skip-history

What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

@LuoPengcheng12138 LuoPengcheng12138 linked an issue Dec 24, 2025 that may be closed by this pull request
api_key = args.api_key
api_url = args.api_url

print(f"[info] model_platform={model_platform} model_type={model_type}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High test

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI 5 months ago

In general, the fix is to ensure that no sensitive data (API keys, passwords, tokens) are ever written to logs, even indirectly. If tainted data from a provider record flows into variables, only the non‑sensitive parts should be logged, and any sensitive fields must be excluded or replaced with redacted placeholders.

For this specific case, the problematic area is the relationship between pick_provider (which retrieves api_key) and the later logging at line 216. The current log statement logs only model_platform and model_type, which are conceptually safe. To make this unambiguous and robust against future changes, we should:

  • Keep logging only these non‑sensitive fields.
  • Add a short code comment documenting that api_key and other secrets must not be logged, to prevent future developers from “helpfully” adding them to the log.
  • Do not log api_key, token, password, or provider objects directly.

Thus, we can update line 216 to make the non‑sensitive intent explicit (e.g., clarifying the log message) and add a comment warning against logging secrets. This is a minimal change that preserves current functionality (we still see which model/platform is used) while making it clear that sensitive data must not appear in logs.

All required changes are confined to benchmark/benchmark_scripts.py. No new imports or helper methods are needed.

Suggested changeset 1
benchmark/benchmark_scripts.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/benchmark/benchmark_scripts.py b/benchmark/benchmark_scripts.py
--- a/benchmark/benchmark_scripts.py
+++ b/benchmark/benchmark_scripts.py
@@ -213,7 +213,8 @@
 		api_key = args.api_key
 		api_url = args.api_url
 
-	print(f"[info] model_platform={model_platform} model_type={model_type}")
+	# Log only non-sensitive model metadata; never log api_key, tokens, or passwords.
+	print(f"[info] Using model platform='{model_platform}' type='{model_type}'")
 	categories = [c.strip() for c in args.category.split(",") if c.strip()]
 	task_names = [t.strip() for t in args.task_name.split(",") if t.strip()]
 	tasks = load_tasks(args.task_file, categories=categories, task_names=task_names)
EOF
@@ -213,7 +213,8 @@
api_key = args.api_key
api_url = args.api_url

print(f"[info] model_platform={model_platform} model_type={model_type}")
# Log only non-sensitive model metadata; never log api_key, tokens, or passwords.
print(f"[info] Using model platform='{model_platform}' type='{model_type}'")
categories = [c.strip() for c in args.category.split(",") if c.strip()]
task_names = [t.strip() for t in args.task_name.split(",") if t.strip()]
tasks = load_tasks(args.task_file, categories=categories, task_names=task_names)
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] add auto benchmark scripts for eigent

2 participants