Format: .md files with structured sections
Core: Document IS the graph
# Graph Title
Description (optional)
## Node: Title (ID: uuid)
Description (optional)
### Metadata
```json
{"uuid": "id", "title": "Title", "pos": [x,y], "size": [w,h]}
import module
from typing import Tuple
class HelperClass:
def process(self, data): return data
def helper_function(x): return x * 2
@node_entry
def function_name(param: type) -> return_type:
helper = HelperClass()
result = helper_function(param)
return result# Execution context: parent (QWidget), layout (QVBoxLayout), widgets (dict)
from PySide6.QtWidgets import QLabel, QLineEdit
layout.addWidget(QLabel('Text:', parent))
widgets['input'] = QLineEdit(parent)
layout.addWidget(widgets['input'])def get_values(widgets): return {}
def set_values(widgets, outputs): pass
def set_initial_state(widgets, state): pass{"requirements": ["package>=1.0"], "python": ">=3.8"}[{"uuid": "id", "name": "Name", "member_node_uuids": ["id1"]}][{"start_node_uuid": "id1", "start_pin_name": "output_1",
"end_node_uuid": "id2", "end_pin_name": "param_name"}]@node_entry decorator:
- REQUIRED on exactly one function per Logic block
- Entry point: Only decorated function called during execution
- Pin generation: Function signature parsed to create node pins automatically
- Runtime behavior: No-op decorator, returns function unchanged
- Parameters → input pins (names become pin names, type hints determine colors)
- Default values supported for optional parameters
- Return type → output pins
Pin generation:
param: str→ input pin "param" (type: str)param: str = "default"→ optional input pin with default-> str→ output pin "output_1"-> Tuple[str, int]→ pins "output_1", "output_2"-> None→ no output pins
Execution pins: Always present
exec_in(input),exec_out(output)
Pin colors:
- Execution pins: Fixed colors (exec_in: dark gray #A0A0A0, exec_out: light gray #E0E0E0)
- Data pins: Generated from type string using consistent hashing in HSV color space
- Same type always produces same color across all nodes (bright, distinguishable colors)
- Color generation: type string → hash → HSV values → RGB color
- Ensures visual consistency for type matching across the entire graph
Basic types: str, int, float, bool
Container types: list, dict, tuple, set
Generic types: List[str], List[Dict], List[Any], Dict[str, int], Dict[str, Any], Tuple[str, int], Tuple[float, ...]
Optional types: Optional[str], Optional[int], Union[str, int], Union[float, None]
Special types: Any (accepts any data), None (no data)
Complex nested: List[Dict[str, Any]], Dict[str, List[int]]
ML types: torch.Tensor, np.ndarray, pd.DataFrame (native object passing)
Metadata:
uuid: unique string identifiertitle: display name
Optional metadata:
pos: [x, y] positionsize: [width, height]colors: {"title": "#hex", "body": "#hex"}gui_state: widget values dictis_reroute: boolean (for reroute nodes)
Standard: ## Node: Human Title (ID: unique-id)
Sections: ## Dependencies, ## Groups, ## Connections
Widget storage: All interactive widgets MUST be in widgets dict
Data flow merging:
- GUI values from get_values() merged with connected pin values
- Connected pin values take precedence over GUI values for same parameter
- GUI values provide defaults or additional inputs not available through pins
- @node_entry function receives merged inputs
- Return values distributed to both output pins and set_values() for GUI display
State persistence: gui_state in metadata ↔ set_initial_state()
Data: "output_N" to parameter name
Execution: "exec_out" to "exec_in"
Required fields: uuid, name, member_node_uuids Optional fields: description, position, size, padding, is_expanded, colors
{
"uuid": "group-id",
"name": "Display Name",
"member_node_uuids": ["node1", "node2"],
"description": "Group description",
"position": {"x": 0, "y": 0},
"size": {"width": 200, "height": 150},
"padding": 20,
"is_expanded": true,
"colors": {
"background": {"r": 45, "g": 45, "b": 55, "a": 120},
"border": {"r": 100, "g": 150, "b": 200, "a": 180},
"title_bg": {"r": 60, "g": 60, "b": 70, "a": 200},
"title_text": {"r": 220, "g": 220, "b": 220, "a": 255},
"selection": {"r": 255, "g": 165, "b": 0, "a": 100}
}
}Required fields: requirements (array of pip-style package specs) Optional fields: optional, python, system, notes
{
"requirements": ["torch>=1.9.0", "numpy>=1.21.0"],
"optional": ["cuda-toolkit>=11.0"],
"python": ">=3.8",
"system": ["CUDA>=11.0"],
"notes": "Additional info"
}Package formats: package>=1.0, package==1.2.3, package~=1.0
Purpose: Connection waypoints for visual organization Characteristics:
"is_reroute": truein metadata- No Logic/GUI components needed
- Single input/output pins
- Small circular appearance
Batch Mode (Default):
- One-shot execution of entire graph in dependency order
- All nodes execute once per run, no state persistence
- Fresh interpreter state for each execution
- GUI buttons in nodes are inactive
- Suitable for data processing pipelines
Live Mode (Interactive):
- Event-driven execution triggered by GUI interactions
- Partial execution paths based on user events
- Maintains persistent state between runs
- GUI event handlers active in nodes
- ML objects (tensors, DataFrames) persist across executions
- Immediate feedback, no startup delays
Runtime Behavior Differences:
- Mode controlled at runtime, not stored in file
- Same graph can run in either mode without modification
- Live mode enables button clicks and widget interactions
- Batch mode optimized for throughput, Live mode for responsiveness
Single Process: All nodes execute in shared Python interpreter Native Objects: Direct references, zero-copy data transfer, no serialization overhead
ML Framework Integration:
- PyTorch: GPU tensors with device preservation, automatic CUDA cleanup, grad support
- NumPy: Direct ndarray references, dtype/shape preservation, memory views, broadcasting
- Pandas: DataFrame/Series objects, index preservation, method chaining, large dataset efficiency
- TensorFlow: tf.Tensor and tf.Variable support, session management, eager execution
- JAX: jax.numpy arrays, JIT compilation, device arrays, functional transformations
Zero-Copy Mechanisms:
- Object references stored in shared object_store dictionary
- Same object instance shared across all node references
- GPU tensors manipulated directly without device transfers
- Memory-mapped files for >RAM datasets
Auto-imports: numpy as np, pandas as pd, torch, tensorflow as tf, jax, jax.numpy as jnp GPU Memory Management: Automatic CUDA cache clearing, tensor cleanup, device synchronization
File Structure:
- Exactly one h1 (graph title)
- Node headers must follow:
## Node: Title (ID: uuid) - Required sections: Connections (must be present)
- Optional sections: Dependencies, Groups
Node Requirements:
- Each node needs unique uuid
- Required components: Metadata, Logic
- One @node_entry function per Logic block
- Logic blocks can contain imports, classes, helper functions, and module-level code
- Only @node_entry function is called as entry point
- Valid JSON in all metadata/groups/connections/dependencies blocks
- Node UUIDs must be valid identifiers
GUI Rules (when present):
- GUI Definition must create valid PySide6 widgets
- All interactive widgets MUST be stored in
widgetsdict - get_values() must return dict
- set_values() and set_initial_state() should handle missing keys gracefully
- Widget names in get_values() must match GUI Definition keys
Groups Rules (when present):
- Group UUIDs must be unique across all groups (not just unique within groups array)
- member_node_uuids must reference existing node UUIDs (validated against nodes array)
- Colors must use RGBA format: {"r": 0-255, "g": 0-255, "b": 0-255, "a": 0-255}
- Groups support transparency and visual layering (alpha channel)
- Groups maintain undo/redo history for property changes
- member_node_uuids determines group membership (nodes move when group moves)
Connection Rules:
- start_node_uuid and end_node_uuid must reference existing node UUIDs
- Pin names must exactly match: function parameters for inputs, "output_N" for outputs
- Execution connections: "exec_out" (source) to "exec_in" (destination)
- Data connections: "output_1", "output_2", etc. to parameter names from @node_entry function
- Connection validation: pin names parsed from actual function signatures
- Invalid connections: mismatched types, non-existent pins, circular exec dependencies
Basic Node:
## Node: Process Data (ID: processor)
### Metadata
```json
{"uuid": "processor", "title": "Process Data", "pos": [100, 100], "size": [200, 150]}@node_entry
def process(input_text: str) -> str:
return input_text.upper()GUI Node:
## Node: Input Form (ID: form)
### Metadata
```json
{"uuid": "form", "title": "Input Form", "pos": [0, 0], "size": [250, 200],
"gui_state": {"text": "default", "count": 5}}@node_entry
def get_input(text: str, count: int) -> str:
return text * countfrom PySide6.QtWidgets import QLabel, QLineEdit, QSpinBox
layout.addWidget(QLabel('Text:', parent))
widgets['text'] = QLineEdit(parent)
layout.addWidget(widgets['text'])
widgets['count'] = QSpinBox(parent)
layout.addWidget(widgets['count'])def get_values(widgets):
return {'text': widgets['text'].text(), 'count': widgets['count'].value()}
def set_values(widgets, outputs):
pass
def set_initial_state(widgets, state):
widgets['text'].setText(state.get('text', ''))
widgets['count'].setValue(state.get('count', 1))Connection Examples:
[
{"start_node_uuid": "input", "start_pin_name": "exec_out",
"end_node_uuid": "processor", "end_pin_name": "exec_in"},
{"start_node_uuid": "input", "start_pin_name": "output_1",
"end_node_uuid": "processor", "end_pin_name": "input_text"}
]Tokenization: Use markdown-it-py, parse tokens (not HTML) State machine: Track current node/component Section detection: h1=title, h2=node/section, h3=component Data extraction: fence blocks by language tag Pin generation: Parse @node_entry function signature
Environment Errors:
- Virtual environment not found or Python executable missing
- Package import failures or dependency conflicts
Execution Errors:
- Syntax errors in node code, runtime exceptions, type mismatches
- Missing required inputs or invalid function signatures
Flow Control Errors:
- No entry point nodes found (no nodes without incoming exec connections)
- Infinite loops detected (execution count limit exceeded)
- Circular dependencies in execution graph
Memory Management Errors:
- Out of memory conditions with large objects (>RAM datasets)
- GPU memory exhaustion (CUDA tensors), uncleaned GPU cache
- Memory leaks from uncleaned object references
Error Format: ERROR in node 'Name': description
Limits: Max execution count prevents infinite loops, timeout protection, memory monitoring
Directory Structure:
PyFlowGraph/
├── venv/ # Main application environment
└── venvs/ # Project-specific environments
├── project1/ # Environment for project1 graph
├── project2/ # Environment for project2 graph
├── default/ # Shared default environment
└── ...
Isolation: Each graph can have its own Python environment with isolated packages
Dependency Management: Per-graph package versions prevent conflicts between graphs
Execution Context: All nodes run in single persistent Python interpreter
Package Availability: Virtual environment packages automatically available in shared namespace
Environment Selection: Configurable through application's environment manager
Benefits: Zero-copy object passing + isolated dependencies + no startup delays
Bidirectional: Lossless conversion between .md ↔ .json formats Use cases: .md for human editing, .json for programmatic processing Equivalence: Both formats represent identical graph information
Quantitative Benchmarks:
- PyTorch 100MB tensor: 5000x faster (0.1ms vs 500ms serialization)
- NumPy 50MB array: 4000x faster (0.05ms vs 200ms list conversion)
- Pandas 10MB DataFrame: 7500x faster (0.02ms vs 150ms dict conversion)
- TensorFlow 100MB tensor: 4000x faster (0.1ms vs 400ms serialization)
Memory Efficiency:
- Zero-copy between nodes (same object instance shared)
- Memory usage scales linearly with unique objects, not references
- Direct memory access for >RAM datasets via memory-mapped files
- Automatic cleanup when references cleared
GPU Performance:
- Direct CUDA tensor manipulation without device transfers
- GPU memory automatically freed for CUDA tensors
- torch.cuda.empty_cache() and synchronize() called automatically
Scalability: O(1) object passing time regardless of data size