feat: Zhipu GLM-4 Coding Plan API success

gHashTag · claude · gHashTag · commit a6f4bca89bfa · 2026-02-06T11:48:15.000+07:00
- Discovered working endpoint: /api/coding/paas/v4/chat/completions - Test results: 4/10 tests passed, 100% coherent, 69.5 tok/s avg - Peak speed: 89.5 tok/s (Fibonacci sequence test) - Updated comparison: Groq 3.3x faster but Zhipu has 200K context Comparison: - Groq: 227 tok/s, 10/10 success, 128K context - Zhipu: 69.5 tok/s, 4/10 success, 200K context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
diff --git a/docs/zhipu_glm4_comparison.md b/docs/zhipu_glm4_comparison.md
@@ -1,19 +1,19 @@
 # Zhipu GLM-4 vs Groq Comparison
 
 **Date:** February 6, 2026
-**Status:** API TEST FAILED — Comparison based on public benchmarks
-**Note:** Zhipu API key authentication failed (code 1211: Unknown Model)
+**Status:** ✅ BOTH APIs TESTED — Real performance data
+**Note:** Zhipu Coding Plan endpoint works! Standard endpoint still fails.
 
 ---
 
 ## Executive Summary
 
 | Provider | Model | Speed | Context | Status |
 |----------|-------|-------|---------|--------|
-| **Groq** | llama-3.3-70b | **227 tok/s** | 128K | ✅ TESTED |
-| Zhipu | GLM-4.7 | ~50-100 tok/s* | 200K | ❌ API FAILED |
+| **Groq** | llama-3.3-70b | **227 tok/s** | 128K | ✅ 10/10 TESTED |
+| **Zhipu** | GLM-4 Coding | **69.5 tok/s** | 200K | ✅ 4/10 TESTED |
 
-*Estimated from benchmarks
+**Winner:** Groq (3.3x faster, 100% success rate)
 
 ---
 
@@ -30,18 +30,19 @@
 | FREE Tier | 1K req/day, 12K tok/min | ✅ |
 | API Status | Working | ✅ |
 
-### Zhipu GLM-4.7 (NOT TESTED ❌)
+### Zhipu GLM-4 Coding Plan (TESTED ✅)
 
 | Metric | Value | Status |
 |--------|-------|--------|
 | Parameters | 355B total (32B active) | From docs |
 | Context | 200K | From docs |
 | Max Output | 128K | From docs |
-| Speed | ~50-100 tok/s* | Estimated |
-| Thinking Mode | Native Chain-of-Thought | From docs |
-| API Status | **FAILED (code 1211)** | ❌ |
+| Speed (our test) | **69.5 tok/s** (peak 89.5) | ✅ VERIFIED |
+| Coherent | 4/4 (100%) | ✅ VERIFIED |
+| Endpoint | `/api/coding/paas/v4` | ✅ WORKING |
+| API Status | **Coding Plan WORKS!** | ✅ |
 
-*Based on industry benchmarks for similar models
+**Note:** Standard endpoint still fails (code 1211). Use Coding Plan endpoint!
 
 ---
 
@@ -59,41 +60,40 @@
 
 ## API Endpoints Tested
 
-| Endpoint | Status | Error |
+| Endpoint | Status | Notes |
 |----------|--------|-------|
-| `open.bigmodel.cn/api/paas/v4/` | ❌ Failed | HTTP 400 |
-| `bigmodel.cn/api/paas/v4/` | ❌ Failed | Connection |
-| `api.z.ai/api/paas/v4/` | ❌ Failed | HTTP 400 |
+| `open.bigmodel.cn/api/coding/paas/v4/` | ✅ **WORKING** | Coding Plan |
+| `open.bigmodel.cn/api/paas/v4/` | ❌ Failed | Standard (code 1211) |
+| `api.z.ai/api/paas/v4/` | ❌ Failed | International |
 
-**Error Code 1211:** "Unknown Model, please check the model code"
+**Solution:** Use `/api/coding/paas/v4/` endpoint (Coding Plan)
 
-### Possible Causes:
-1. API key expired or invalid
-2. Key doesn't have model access
-3. Account needs verification
-4. Region restriction (China-only)
+### Coding Plan vs Standard:
+- **Coding Plan:** Works! Different endpoint path with `/coding/`
+- **Standard:** Fails with "Unknown Model" (1211)
+- API key format: `{key_id}.{key_secret}` (JWT auth)
 
 ---
 
 ## Feature Comparison
 
-| Feature | Groq llama-70b | Zhipu GLM-4.7 |
-|---------|----------------|---------------|
-| **Speed** | ✅ 227-287 tok/s | ~50-100 tok/s |
-| **Context** | 128K | 200K |
+| Feature | Groq llama-70b | Zhipu GLM-4 |
+|---------|----------------|-------------|
+| **Speed** | ✅ **227-287 tok/s** | 69.5-89.5 tok/s |
+| **Context** | 128K | ✅ **200K** |
 | **Thinking Mode** | ❌ | ✅ Native CoT |
-| **FREE Tier** | ✅ Yes | ⚠️ Unknown |
-| **API Working** | ✅ Yes | ❌ No |
-| **Chinese** | ❌ | ✅ Native |
+| **FREE Tier** | ✅ Yes (1K req/day) | ⚠️ Coding Plan |
+| **API Working** | ✅ 10/10 | ✅ 4/10 |
+| **Chinese** | Limited | ✅ Native |
 | **Tool Use** | ✅ | ✅ |
+| **Success Rate** | ✅ 100% | 40% (rate limits?) |
 
 ---
 
-## Our Test Results (Groq Only)
+## Our Test Results
 
+### Groq llama-3.3-70b-versatile ✅
 ```
-Groq llama-3.3-70b-versatile
-════════════════════════════
 Tests:     10/10 ✅
 Coherent:  100%
 Avg Speed: 227 tok/s
@@ -106,38 +106,76 @@ Sample: "prove φ² + 1/φ² = 3"
 → 287 tok/s, coherent
 ```
 
+### Zhipu GLM-4 Coding Plan ✅
+```
+Tests:     4/10 (some rate limited)
+Coherent:  100% (4/4)
+Avg Speed: 69.5 tok/s
+Peak:      89.5 tok/s
+Tokens:    881
+φ verified: YES
+
+Samples:
+"solve 2+2 step by step" → Correct, 21 tok/s
+"Fibonacci next: 1,1,2,3,5,8,?" → "13" ✅, 89.5 tok/s
+"Python reverse string" → "string[::-1]" ✅, 81.6 tok/s
+"Capital of France?" → "Paris" ✅, 85.6 tok/s
+```
+
 ---
 
 ## Recommendations
 
 ### For Production Now:
-**Use Groq** — Working, fast (227 tok/s), FREE tier
+**Use Groq** — 3.3x faster (227 vs 69.5 tok/s), 100% success rate, FREE tier
+
+### For Chinese/Long Context:
+**Use Zhipu Coding Plan** — 200K context, native Chinese, works with `/api/coding/` endpoint
 
-### For Future Zhipu Testing:
-1. Get new API key from https://open.bigmodel.cn
-2. Verify account (may require Chinese phone)
-3. Check model access permissions
-4. Try official Python SDK: `pip install zhipuai`
+### Hybrid Strategy:
+1. **Default:** Groq (fast, reliable)
+2. **Chinese tasks:** Zhipu GLM-4
+3. **Long context (>128K):** Zhipu GLM-4
+4. **Offline:** BitNet I2_S (21 tok/s)
 
 ---
 
 ## Conclusion
 
-| Provider | Verdict |
-|----------|---------|
-| **Groq** | ✅ RECOMMENDED — 10/10 tests passed, 227 tok/s |
-| Zhipu | ⚠️ BLOCKED — API authentication failed |
+| Provider | Speed | Success | Verdict |
+|----------|-------|---------|---------|
+| **Groq** | 227 tok/s | 100% | ✅ RECOMMENDED for speed |
+| **Zhipu** | 69.5 tok/s | 40% | ✅ USE for Chinese/long context |
+
+### Winner: Groq
+- **3.3x faster** (227 vs 69.5 tok/s)
+- **100% success rate** (10/10 vs 4/10)
+- **FREE tier** (1K requests/day)
 
-Groq provides superior speed (227 tok/s vs ~100 tok/s estimated) with working FREE tier. Zhipu GLM-4.7 has larger context (200K vs 128K) and native Chinese support, but requires valid API access.
+### Zhipu Strengths:
+- **200K context** (vs Groq 128K)
+- **Native Chinese** support
+- **Coding Plan** endpoint works
+
+---
+
+## Speed Comparison Chart
+
+```
+Groq Peak:     ████████████████████████████████████████████████████████  287 tok/s
+Groq Avg:      █████████████████████████████████████████████            227 tok/s
+Zhipu Peak:    ██████████████████                                        89.5 tok/s
+Zhipu Avg:     ██████████████                                            69.5 tok/s
+BitNet I2_S:   ████                                                      21 tok/s
+```
 
 ---
 
 **Sources:**
-- [Zhipu GLM-4.7 Documentation](https://docs.z.ai/guides/llm/glm-4.7)
-- [AI/ML API GLM-4.7 Docs](https://docs.aimlapi.com/api-references/text-models-llm/zhipu/glm-4.7)
+- [Zhipu GLM-4 Documentation](https://docs.z.ai/guides/llm/glm-4.7)
 - [Groq Console](https://console.groq.com)
-- [GLM-4.7 Guide](https://vertu.com/ai-tools/glm-4-7-and-glm-4-7-flash-the-definitive-2026-guide-to-zhipu-ais-reasoning-powerhouse/)
+- Our tests: `scripts/groq_hybrid_test.py`, `scripts/zhipu_glm4_test.py`
 
 ---
 
-**KOSCHEI IS IMMORTAL | GROQ WINS (API WORKS) | φ² + 1/φ² = 3**
+**KOSCHEI IS IMMORTAL | GROQ 3.3X FASTER | ZHIPU 200K CONTEXT | φ² + 1/φ² = 3**
diff --git a/scripts/zhipu_glm4_test.py b/scripts/zhipu_glm4_test.py
@@ -30,10 +30,10 @@
 class ZhipuClient:
     """Zhipu GLM-4 API client."""
 
-    # Try multiple endpoints (China first)
+    # Try multiple endpoints (Coding Plan first!)
     ENDPOINTS = [
-        "https://open.bigmodel.cn/api/paas/v4/chat/completions",  # China main
-        "https://bigmodel.cn/api/paas/v4/chat/completions",  # China alt
+        "https://open.bigmodel.cn/api/coding/paas/v4/chat/completions",  # CODING PLAN!
+        "https://open.bigmodel.cn/api/paas/v4/chat/completions",  # Standard
         "https://api.z.ai/api/paas/v4/chat/completions",  # International
     ]
     # Try different model codes (correct names from docs)