Skip to content

Commit 54a5a36

Browse files
lmeyerovclaude
andauthored
docs(gfql): add GFQL specification documentation (#698)
* docs(gfql): add specification documentation - Add complete GFQL specification documentation - language.md: Core language specification with grammar and operations - wire_protocol.md: JSON serialization format for client-server communication - cypher_mapping.md: Cypher to GFQL translation with Python and wire protocol - python_embedding.md: Python-specific implementation details - index.md: Specification overview and navigation - Update main gfql/index.rst to include Developer Resources section with spec link - Add ai_code_notes/gfql/README.md with GFQL quick reference for AI assistants This establishes the documentation foundation for GFQL specifications, supporting both human developers and AI-assisted code generation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): use headers for Core Concepts to enable TOC navigation - Convert Core Concepts from numbered list to headers (h4) - This allows each concept to appear in the table of contents - Makes it easier to navigate directly to specific concepts like Graph Model, Chains, Operations, etc. * docs(gfql): remove incomplete JSON Schema from wire protocol spec The JSON Schema was incomplete (missing FilterDict and predicate definitions) and not used by the actual implementation. Removing it to avoid confusion. * docs(gfql): remove Protocol Extensions section from wire protocol spec Remove speculative content about future extensions to keep documentation focused on current implementation. * docs(gfql): remove incorrect Error Handling section from wire protocol The documented error response format does not match the implementation. The actual implementation uses HTTP status codes for remote errors and Python exceptions for local validation, not structured JSON error objects. * docs(gfql): fix missing closing backticks in cypher_mapping.md Add missing closing triple backticks for JSON code block before Pattern Translations section to fix HTML rendering. * docs(gfql): fix code block formatting in cypher_mapping.md Ensure all JSON code blocks have proper closing backticks to prevent markdown rendering issues. * docs: Add GFQL specification documentation to changelog - Added entry for PR #698 in Dev section - Listed key documentation improvements and fixes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 3aef009 commit 54a5a36

8 files changed

Lines changed: 1420 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
99

1010
### Docs
1111
* Update copyright year from 2024 to 2025 in documentation and LICENSE.txt
12+
* GFQL: Add comprehensive specification documentation (#698)
13+
* Core language specification with formal grammar, operations, predicates, and type system
14+
* Cypher to GFQL translation guide with Python and wire protocol examples
15+
* Python embedding guide with pandas/cuDF integration details
16+
* Wire protocol JSON format for client-server communication
17+
* Fix terminology: clarify g._node (node ID column) vs g._nodes (DataFrame)
18+
* Emphasize GFQL's declarative nature for graph-to-graph transformations
1219

1320
## [0.39.1 - 2025-07-07]
1421

ai_code_notes/gfql/README.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# GFQL AI Assistant Guide
2+
3+
Guide for AI assistants working with GFQL (Graph Frame Query Language) in PyGraphistry.
4+
5+
## 🎯 Quick Reference
6+
7+
### Essential GFQL Operations
8+
```python
9+
# Node matching
10+
n() # All nodes
11+
n({"type": "person"}) # Filter by property
12+
n({"age": gt(30)}) # With predicate
13+
n(name="result") # Named results
14+
15+
# Edge traversal
16+
e_forward() # Forward direction
17+
e_reverse() # Reverse direction
18+
e() or e_undirected() # Both directions
19+
e_forward(hops=2) # Multi-hop
20+
e_forward(to_fixed_point=True) # All reachable
21+
22+
# Chaining
23+
g.chain([n(), e_forward(), n()]) # Pattern matching
24+
```
25+
26+
### Key Predicates
27+
- Comparison: `gt()`, `lt()`, `ge()`, `le()`, `eq()`, `ne()`
28+
- Membership: `is_in([...])`
29+
- Range: `between(lower, upper)`
30+
- String: `contains()`, `startswith()`, `endswith()`
31+
- Null: `is_null()`, `not_null()`
32+
- Temporal: `is_month_start()`, `is_year_end()`, etc.
33+
34+
### Performance Tips
35+
- Filter early in the chain
36+
- Use specific hop counts vs `to_fixed_point`
37+
- Prefer `filter_dict` over `query` strings
38+
- Use appropriate engine: `pandas` (CPU) or `cudf` (GPU)
39+
40+
## 📋 When to Use GFQL
41+
42+
### Use GFQL When
43+
- Performing graph traversals or path queries
44+
- Finding patterns in connected data
45+
- Need efficient multi-hop operations
46+
- Working with node/edge dataframes
47+
48+
### Use Pandas/Aggregations When
49+
- Need sorting (`sort_values()`)
50+
- Need limiting (`head()`, `tail()`)
51+
- Aggregating results (`groupby()`, `count()`)
52+
- Complex transformations
53+
54+
## 🚀 Common Patterns
55+
56+
### User 360 Query
57+
```python
58+
# Customer's recent activity
59+
g.chain([
60+
n({"customer_id": "C123"}),
61+
e_forward({
62+
"type": is_in(["purchase", "view", "support"]),
63+
"timestamp": gt(pd.Timestamp.now() - pd.Timedelta(days=30))
64+
})
65+
])
66+
```
67+
68+
### Cyber Security Pattern
69+
```python
70+
# Lateral movement detection
71+
g.chain([
72+
n({"status": "compromised"}),
73+
e_forward({"type": "login", "success": True}, hops=3),
74+
n({"criticality": "high"}, name="at_risk")
75+
])
76+
```
77+
78+
### Business Intelligence
79+
```python
80+
# Cross-sell opportunities
81+
g.chain([
82+
n({"product_id": "P123"}),
83+
e_reverse({"type": "purchased"}),
84+
n({"type": "customer"}),
85+
e_forward({"type": "purchased"}),
86+
n({"product_id": ne("P123")}, name="also_bought")
87+
])
88+
```
89+
90+
## 🔧 Code Style Guidelines
91+
92+
### Preferred Style
93+
```python
94+
# ✅ Good - Clean, code-golfed chains
95+
g.chain([n({"type": "user"}), e({"active": True}), n()])
96+
97+
# ❌ Avoid - Overly verbose
98+
result = g.chain([
99+
n(filter_dict={"type": "user"}),
100+
e_forward(edge_match={"active": True}, hops=1),
101+
n(filter_dict={})
102+
])
103+
```
104+
105+
### Naming Conventions
106+
- Use descriptive names for `name` parameters
107+
- Keep filter keys consistent with dataframe columns
108+
- Use snake_case for all identifiers
109+
110+
## 🐛 Common Errors and Fixes
111+
112+
### Schema Errors
113+
```python
114+
# ❌ Wrong - Column doesn't exist
115+
n({"username": "Alice"})
116+
117+
# ✅ Fix - Use correct column name
118+
n({"name": "Alice"})
119+
```
120+
121+
### Type Errors
122+
```python
123+
# ❌ Wrong - String predicate on number
124+
n({"age": contains("30")})
125+
126+
# ✅ Fix - Use numeric predicate
127+
n({"age": gt(30)})
128+
```
129+
130+
### Temporal Errors
131+
```python
132+
# ❌ Wrong - Raw string for datetime
133+
n({"created": gt("2024-01-01")})
134+
135+
# ✅ Fix - Use proper datetime
136+
n({"created": gt(pd.Timestamp("2024-01-01"))})
137+
```
138+
139+
## 📝 Natural Language to GFQL
140+
141+
### Translation Patterns
142+
- "recent" → `gt(pd.Timestamp.now() - pd.Timedelta(days=N))`
143+
- "between X and Y" → `between(X, Y)`
144+
- "any of" → `is_in([...])`
145+
- "connected to" → `e()` or `e_undirected()`
146+
- "from X to Y" → X with `e_forward()` to Y
147+
- "within N hops" → `hops=N`
148+
149+
### Example Translations
150+
151+
**NL**: "Find all employees who report to managers in NYC"
152+
```python
153+
g.chain([
154+
n({"type": "employee"}),
155+
e_forward({"type": "reports_to"}),
156+
n({"type": "manager", "office": "NYC"})
157+
])
158+
```
159+
160+
**NL**: "Show me high-value customers from last week"
161+
```python
162+
g.chain([
163+
n({"customer_tier": "high_value"}),
164+
e_forward({
165+
"type": "purchase",
166+
"date": gt(pd.Timestamp.now() - pd.Timedelta(days=7))
167+
})
168+
])
169+
```
170+
171+
## 🔄 Cypher to GFQL
172+
173+
### Basic Mappings
174+
| Cypher | GFQL |
175+
|--------|------|
176+
| `(n)` | `n()` |
177+
| `(n:Label)` | `n({"type": "Label"})` |
178+
| `-[r]->` | `e_forward()` |
179+
| `<-[r]-` | `e_reverse()` |
180+
| `-[r*2]-` | `e_forward(hops=2)` |
181+
| `WHERE n.prop = val` | `n({"prop": val})` |
182+
183+
### Unsupported in GFQL
184+
- `OPTIONAL MATCH` - Handle nulls in post-processing
185+
- `WITH` clauses - Use intermediate chains
186+
- `ORDER BY/LIMIT` - Use pandas after
187+
- `CREATE/DELETE` - GFQL is read-only
188+
189+
## 🧪 Validation Checklist
190+
191+
Before generating GFQL:
192+
1. ✓ Check column names exist in schema
193+
2. ✓ Verify predicate types match column types
194+
3. ✓ Ensure temporal values use proper types
195+
4. ✓ Validate operation names (n, e_forward, etc.)
196+
5. ✓ Check chain structure is valid
197+
198+
## 📚 Additional Resources
199+
200+
- Full specifications in: `AI_PROGRESS/gfql_llm_specs/`
201+
- `gfql_language_spec.md` - Complete language specification
202+
- `gfql_wire_protocol_spec.md` - JSON wire format
203+
- `cypher_to_gfql_mapping_spec.md` - Cypher translation
204+
205+
## 🎯 Key Takeaways
206+
207+
1. **GFQL is functional**: Chain operations, don't mutate
208+
2. **Filter early**: Put selective conditions first
209+
3. **Think patterns**: Focus on graph patterns, not procedures
210+
4. **Post-process**: Use pandas for sorting/aggregating
211+
5. **Code golf**: Keep queries concise and elegant

docs/source/gfql/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ See also:
1010

1111
.. toctree::
1212
:maxdepth: 1
13+
:caption: User Guide
1314

1415
about
1516
overview
@@ -21,3 +22,9 @@ See also:
2122
predicates/quick
2223
datetime_filtering
2324
wire_protocol_examples
25+
26+
.. toctree::
27+
:maxdepth: 2
28+
:caption: Developer Resources
29+
30+
spec/index

0 commit comments

Comments
 (0)