Skip to content

Latest commit

 

History

History
747 lines (500 loc) · 36.9 KB

File metadata and controls

747 lines (500 loc) · 36.9 KB
title Anomaly Detection Report
generated 2026-03-11
model_version v3.3.2
dataset AxonFramework-5.0.2
authors
JohT/code-graph-analysis-pipeline

📊 Anomaly Detection Report

1. Executive Overview

This report analyzes structural and dependency anomalies across multiple abstraction levels of the codebase. The goal is to detect potential software quality, design, and architecture issues using graph-based features, anomaly detection (Isolation Forest), and SHAP explainability.

📚 Table of Contents

  1. Executive Overview
  2. Deep Dives by Abstraction Level
  3. Plot Interpretation Guide
  4. Taxonomy of Anomaly Archetypes
  5. Recommendations
  6. Appendix

1.1 Anomalies in total

Analyzed Units Anomalies Authorities Bottlenecks Bridges Hubs Outliers
1343 66 24 22 16 12 9

1.2 Overview of Analyzed Structures

Abstraction Level Units Anomalies Authorities Bottlenecks Bridges Hubs Outliers
Type,Java,Class 777 32 7 2 8 2 6
Type,Java,Interface 253 21 2 6 0 7 1
Package,Java 146 6 10 10 6 0 1
Type,Java,Record 56 3 1 2 1 0 0
Type,Java,Class,Throwable 42 2 0 0 1 0 1
Type,Java,Annotation 41 2 0 0 0 0 0
Type,Java,Enum 17 0 0 0 0 1 0
Artifact,Jar,Archive,Zip,Java 11 0 4 2 0 2 0

1.3 Overview Charts

Treemap Charts

JavaTreemap1AverageAnomalyScorePerDirectory

JavaTreemap2ArchetypesOverviewPerDirectory

JavaTreemap3ArchetypeAuthorityPerDirectory

JavaTreemap4ArchetypeBottleneckPerDirectory

JavaTreemap5ArchetypeBridgePerDirectory

JavaTreemap6ArchetypeHubPerDirectory

JavaTreemap7ArchetypeOutlierPerDirectory


2. Deep Dives by Abstraction Level

Each abstraction level includes anomaly statistics, SHAP feature importance, archetype distribution, and example anomalies.

2.1 Java Artifact

Anomaly Results

Total anomalies
Anomalies Authorities Bottlenecks Bridges Hubs Outliers
0 4 2 0 2 0
Top global contributing features (via SHAP)

⚠️ No anomaly detection and SHAP data available for this level (model skipped or insufficient samples).

Archetype Distribution

Archetype Count Max. Score Model Status Examples
Authority 4 null Undetermined /axon-common-5.0.2.jar, /axon-spring-boot-autoconfigure-5.0.2.jar, /axon-metrics-micrometer-5.0.2.jar
Bottleneck 2 null Undetermined /axon-conversion-5.0.2.jar, /axon-eventsourcing-5.0.2.jar
Hub 2 null Undetermined /axon-common-5.0.2.jar, /axon-messaging-5.0.2.jar

Top anomalies with their local contributing features (via SHAP)

⚠️ No anomaly detection and SHAP data available for this level (model skipped or insufficient samples).

Visualizations

See Plot Interpretation Guide on how to read the plots in detail.

⚠️ No anomaly detection and SHAP data available for this level (model skipped or insufficient samples).

Graph Visualizations

TopHub Graph Visualizations

TopHub 1

TopHub 2


TopBottleneck Graph Visualizations

TopBottleneck 1

TopBottleneck 2


TopAuthority Graph Visualizations

TopAuthority 1

TopAuthority 2

TopAuthority 3

TopAuthority 4

--

2.2 Java Package

Anomaly Results

Total anomalies
Anomalies Authorities Bottlenecks Bridges Hubs Outliers
6 10 10 6 0 1
Top global contributing features (via SHAP)
Feature Mean absolute SHAP value
Node embeddings aggregated 0.021162
pageRank 0.020607
articleRank 0.018014
pageToArticleRankDifference 0.017204
incomingDependencies 0.012914
localClusteringCoefficient 0.007655
degree 0.005704
betweenness 0.005452
nodeEmbeddingPCA_17 0.004519
nodeEmbeddingPCA_13 0.002340
nodeEmbeddingPCA_12 0.001808

Archetype Distribution

Archetype Count Max. Score Model Status Examples
6 0.026 Anomalous org.axonframework.common.annotation, org.axonframework.common.configuration, org.axonframework.messaging.core
Authority 1 0.0063 Anomalous org.axonframework.common
Bottleneck 3 0.0144 Anomalous org.axonframework.messaging.core, org.axonframework.messaging.core.unitofwork, org.axonframework.messaging.core.annotation
Bridge 6 0.026 Anomalous org.axonframework.common.annotation, org.axonframework.common.configuration, org.axonframework.messaging.core
110 -0.0006 Typical org.axonframework.messaging.eventhandling, org.axonframework.common.io, org.axonframework.common.util
Authority 9 -0.005 Typical org.axonframework.common.io, org.axonframework.common.function, org.axonframework.extension.springboot.autoconfig
Bottleneck 7 -0.0213 Typical org.axonframework.axonserver.connector, org.axonframework.axonserver.connector.event, org.axonframework.messaging.core.conversion
Outlier 1 -0.0592 Typical org.axonframework.messaging.core.unitofwork.transaction

Top anomalies with their local contributing features (via SHAP)

Name Contained in Anomaly Score Archetypes Top Feature 1 Top Feature 1 SHAP Top Feature 2 Top Feature 2 SHAP Top Feature 3 Top Feature 3 SHAP Model Status
org.axonframework.common.annotation axon-common-5.0.2 0.026 Bridge, pageRank -0.1697 articleRank -0.1647 pageToArticleRankDifference -0.1553 Anomalous
org.axonframework.common.configuration axon-common-5.0.2 0.0252 , Bridge pageRank -0.1746 articleRank -0.1502 pageToArticleRankDifference -0.1496 Anomalous
org.axonframework.messaging.core axon-messaging-5.0.2 0.0144 Bottleneck, , Bridge pageRank -0.1873 articleRank -0.1755 pageToArticleRankDifference -0.1501 Anomalous
org.axonframework.messaging.core.unitofwork axon-messaging-5.0.2 0.0111 Bottleneck, , Bridge pageRank -0.1732 articleRank -0.1716 pageToArticleRankDifference -0.1424 Anomalous
org.axonframework.common axon-common-5.0.2 0.0063 Authority, Bridge, pageRank -0.2001 articleRank -0.186 pageToArticleRankDifference -0.1751 Anomalous
org.axonframework.messaging.core.annotation axon-messaging-5.0.2 0.0018 Bottleneck, Bridge, pageRank -0.1212 betweenness -0.1182 articleRank -0.0996 Anomalous

Visualizations

See Plot Interpretation Guide on how to read the plots in detail.

Anomalies

Anomalies

Global feature importance SHAP summary plots

Anomaly feature importance explained (global)

Feature dependence plots for top important features

Anomaly feature dependence explained (global)


Local SHAP Force Plots – Top 6 Anomalies

Top 1 anomaly - local feature importance Top 2 anomaly - local feature importance Top 3 anomaly - local feature importance Top 4 anomaly - local feature importance Top 5 anomaly - local feature importance Top 6 anomaly - local feature importance


Cluster Diagnostics

Cluster Overall


Cluster Membership Strength

Cluster probabilities


Cluster Noise and Bridge Analysis

Cluster Noise: Highly central and popular Cluster Noise: Poorly integrated bridges Cluster Noise: Role inverted bridges


Feature Distributions

Betweenness Centrality Distribution Clustering coefficient distribution PageRank minus ArticleRank distribution


Feature Relationships

Clustering coefficient versus PageRank


Graph Visualizations

TopBottleneck Graph Visualizations

TopBottleneck 1

TopBottleneck 2

TopBottleneck 3

TopBottleneck 4

TopBottleneck 5


TopAuthority Graph Visualizations

TopAuthority 1

TopAuthority 2

TopAuthority 3

TopAuthority 4

TopAuthority 5


TopBridge Graph Visualizations

TopBridge 1

TopBridge 2

TopBridge 3

TopBridge 4

TopBridge 5


TopOutlier Graph Visualizations

TopOutlier 1

--

2.3 Java Type

Anomaly Results

Total anomalies
Anomalies Authorities Bottlenecks Bridges Hubs Outliers
60 10 10 10 10 8
Top global contributing features (via SHAP)
Feature Mean absolute SHAP value
Node embeddings aggregated 0.050229
pageToArticleRankDifference 0.018039
articleRank 0.013770
pageRank 0.013687
nodeEmbeddingPCA_26 0.005107
degree 0.005066
incomingDependencies 0.004857
nodeEmbeddingPCA_29 0.004183
nodeEmbeddingPCA_11 0.003009
betweenness 0.002593
nodeEmbeddingPCA_32 0.002527

Archetype Distribution

Archetype Count Max. Score Model Status Examples
60 0.1081 Anomalous org.axonframework.messaging.core.unitofwork.ProcessingContext, org.axonframework.messaging.core.Message, org.axonframework.common.infra.DescribableComponent
Authority 8 0.0816 Anomalous org.axonframework.common.infra.DescribableComponent, org.axonframework.common.TypeReference, org.axonframework.common.infra.ComponentDescriptor
Bottleneck 8 0.1081 Anomalous org.axonframework.messaging.core.unitofwork.ProcessingContext, org.axonframework.messaging.core.Message, org.axonframework.messaging.core.MessageStream
Bridge 10 0.0169 Anomalous org.axonframework.extension.springboot.autoconfig.ObjectMapperAutoConfiguration$JacksonConfiguredCondition$EventsJacksonCondition, org.axonframework.extension.springboot.autoconfig.ObjectMapperAutoConfiguration$JacksonConfiguredCondition$MessagesJacksonCondition, org.axonframework.extension.springboot.autoconfig.ObjectMapperAutoConfiguration$JacksonConfiguredCondition$GeneralJacksonCondition
Hub 8 0.1081 Anomalous org.axonframework.messaging.core.unitofwork.ProcessingContext, org.axonframework.messaging.core.Message, org.axonframework.common.infra.DescribableComponent
Outlier 2 0.0543 Anomalous org.axonframework.conversion.Converter, org.axonframework.common.Assert
1126 -0.0001 Typical org.axonframework.extension.springboot.autoconfig.AxonTimeoutAutoConfiguration$AxonTimeoutConfigurerModule, org.axonframework.messaging.core.timeout.HandlerTimeoutHandlerEnhancerDefinition, org.axonframework.extension.springboot.autoconfig.AvroSchemaStoreAutoConfiguration$AvroConfiguredCondition$EventsAvroCondition
Authority 2 -0.0106 Typical org.axonframework.messaging.core.Metadata$MetadataCollector, org.axonframework.common.StringUtils
Bottleneck 2 -0.0114 Typical org.axonframework.modelling.entity.annotation.AnnotatedEntityMetamodel, org.axonframework.messaging.core.unitofwork.ProcessingLifecycle
Hub 2 -0.0339 Typical org.axonframework.common.BuilderUtils, org.axonframework.axonserver.connector.ErrorCode
Outlier 6 -0.0271 Typical org.axonframework.messaging.core.timeout.AxonTaskJanitor, org.axonframework.messaging.core.conversion.DelegatingMessageConverter, org.axonframework.messaging.eventhandling.conversion.DelegatingEventConverter

Top anomalies with their local contributing features (via SHAP)

Name Contained in Anomaly Score Archetypes Top Feature 1 Top Feature 1 SHAP Top Feature 2 Top Feature 2 SHAP Top Feature 3 Top Feature 3 SHAP Model Status
org.axonframework.messaging.core.unitofwork.ProcessingContext axon-messaging-5.0.2 0.1081 , Hub, Bottleneck articleRank -0.2565 pageRank -0.1884 pageToArticleRankDifference -0.1766 Anomalous
org.axonframework.messaging.core.Message axon-messaging-5.0.2 0.1063 Bottleneck, , Hub articleRank -0.2539 pageRank -0.1876 pageToArticleRankDifference -0.18 Anomalous
org.axonframework.common.infra.DescribableComponent axon-common-5.0.2 0.0816 , Authority, Hub articleRank -0.2654 pageRank -0.1926 pageToArticleRankDifference -0.1816 Anomalous
org.axonframework.common.TypeReference axon-common-5.0.2 0.0805 Authority, articleRank -0.279 pageRank -0.2149 pageToArticleRankDifference -0.2126 Anomalous
org.axonframework.messaging.core.MessageStream axon-messaging-5.0.2 0.0768 Bottleneck, , Hub articleRank -0.252 pageRank -0.1793 pageToArticleRankDifference -0.1724 Anomalous
org.axonframework.messaging.eventhandling.EventMessage axon-messaging-5.0.2 0.076 Hub, , Bottleneck articleRank -0.2689 pageRank -0.1708 pageToArticleRankDifference -0.1702 Anomalous
org.axonframework.messaging.core.Context$ResourceKey axon-messaging-5.0.2 0.0571 articleRank -0.2903 pageRank -0.2157 pageToArticleRankDifference -0.199 Anomalous
org.axonframework.conversion.Converter axon-conversion-5.0.2 0.0543 Outlier, articleRank -0.3219 pageRank -0.2333 pageToArticleRankDifference -0.2195 Anomalous
org.axonframework.common.infra.ComponentDescriptor axon-common-5.0.2 0.0541 Authority, articleRank -0.2814 pageRank -0.196 pageToArticleRankDifference -0.1905 Anomalous
org.axonframework.messaging.core.QualifiedName axon-messaging-5.0.2 0.0518 Bottleneck, articleRank -0.2509 pageRank -0.1697 pageToArticleRankDifference -0.1595 Anomalous
org.axonframework.common.annotation.Internal axon-common-5.0.2 0.0506 articleRank -0.2835 pageRank -0.1881 pageToArticleRankDifference -0.184 Anomalous
org.axonframework.messaging.core.Metadata axon-messaging-5.0.2 0.0438 Authority, articleRank -0.3055 pageRank -0.2092 pageToArticleRankDifference -0.2025 Anomalous
org.axonframework.messaging.eventstreaming.EventCriteria axon-messaging-5.0.2 0.0438 articleRank -0.3045 pageToArticleRankDifference -0.1787 pageRank -0.1727 Anomalous
org.axonframework.extension.springboot.autoconfig.ObjectMapperAutoConfiguration$JacksonConfiguredCondition axon-spring-boot-autoconfigure-5.0.2 0.0414 pageToArticleRankDifference -0.1894 pageRank -0.1611 nodeEmbeddingPCA_29 -0.104 Anomalous
org.axonframework.common.Assert axon-common-5.0.2 0.0402 Outlier, Hub, articleRank -0.3171 pageRank -0.2226 pageToArticleRankDifference -0.2137 Anomalous
org.axonframework.messaging.core.MessageType axon-messaging-5.0.2 0.0363 Bottleneck, articleRank -0.2629 degree -0.134 incomingDependencies -0.12 Anomalous
org.axonframework.extension.springboot.DistributedCommandBusProperties$JGroupsProperties$Gossip axon-spring-boot-autoconfigure-5.0.2 0.0299 pageToArticleRankDifference -0.2274 nodeEmbeddingPCA_29 -0.1483 nodeEmbeddingPCA_33 -0.0735 Anomalous
org.axonframework.extension.springboot.autoconfig.ConverterAutoConfiguration axon-spring-boot-autoconfigure-5.0.2 0.0258 nodeEmbeddingPCA_19 -0.1975 nodeEmbeddingPCA_23 -0.1012 nodeEmbeddingPCA_28 -0.0588 Anomalous
org.axonframework.eventsourcing.eventstore.Position axon-eventsourcing-5.0.2 0.0245 pageToArticleRankDifference -0.2846 pageRank -0.28 nodeEmbeddingPCA_30 -0.0186 Anomalous
org.axonframework.common.ReflectionUtils axon-common-5.0.2 0.0245 Bottleneck, pageToArticleRankDifference -0.2035 betweenness -0.1466 pageRank -0.0933 Anomalous

Visualizations

See Plot Interpretation Guide on how to read the plots in detail.

Anomalies

Anomalies

Global feature importance SHAP summary plots

Anomaly feature importance explained (global)

Feature dependence plots for top important features

Anomaly feature dependence explained (global)


Local SHAP Force Plots – Top 6 Anomalies

Top 1 anomaly - local feature importance Top 2 anomaly - local feature importance Top 3 anomaly - local feature importance Top 4 anomaly - local feature importance Top 5 anomaly - local feature importance Top 6 anomaly - local feature importance


Cluster Diagnostics

Clusters largest average radius Clusters largest max radius Clusters largest size


Cluster Membership Strength

Cluster probabilities


Cluster Noise and Bridge Analysis

Cluster Noise: Highly central and popular Cluster Noise: Poorly integrated bridges Cluster Noise: Role inverted bridges


Feature Distributions

Betweenness Centrality Distribution Clustering coefficient distribution PageRank minus ArticleRank distribution


Feature Relationships

Clustering coefficient versus PageRank


Graph Visualizations

TopHub Graph Visualizations

TopHub 1

TopHub 2

TopHub 3

TopHub 4

TopHub 5


TopBottleneck Graph Visualizations

TopBottleneck 1

TopBottleneck 2

TopBottleneck 3

TopBottleneck 4

TopBottleneck 5


TopAuthority Graph Visualizations

TopAuthority 1

TopAuthority 2

TopAuthority 3

TopAuthority 4

TopAuthority 5


TopBridge Graph Visualizations

TopBridge 1

TopBridge 2

TopBridge 3

TopBridge 4

TopBridge 5


TopOutlier Graph Visualizations

TopOutlier 1

TopOutlier 2

TopOutlier 3

TopOutlier 4

TopOutlier 5

--

3. Plot Interpretation Guide

Purpose: Understand each plot type’s diagnostic value.
Applies to: All abstraction levels.

Plot Type Best For Adds Why It Matters
Anomalies Plot Seeing distribution of anomalies in clusters Context of clusters & outliers Reveals isolation or cluster-based anomalies
SHAP Summary Global feature importance Feature impact direction Shows what drives anomalies overall
Local SHAP Force Explaining a single anomaly Feature contribution breakdown Useful for debugging individual outliers
Dependence Plot Understanding feature influence Interaction visualization Reveals nonlinear feature effects
Cluster Metrics Cluster characteristics Radius, cohesion, noise Identifies weakly defined or noisy clusters

3. Plot Interpretation Guide

Purpose: Provide a direct mapping between all plots and their analytical meaning.
Scope: Applies to plots for Java Type, Java Package, and similar abstraction levels.
Format: Each entry includes Best for, Adds, and Why, matching the in-report descriptions.


📘 Main Plots

Plot Description Best For Adds Why
Anomalies 2D visualization of all code units showing clusters and anomalies. Understanding the overall distribution of anomalies in relation to clusters. Context of clusters and outliers. Reveals whether anomalies are isolated or cluster-based, guiding investigation.
Global Feature Importance (SHAP Summary) Mean absolute SHAP values ranking global feature impact. Global understanding of which features drive anomalies. Direction of impact (color shows feature value). Explains which metrics consistently influence anomaly detection.
Feature Dependence (Top Important Features) Shows how specific feature values affect anomaly score; colored by interacting feature. Understanding how one feature affects anomaly scores. Color shows feature interaction or threshold effect. Helps identify nonlinear relationships and feature interactions.

📙 Local Explanation Plots

Plot Description Best For Adds Why
Local SHAP Force Plots (Top Anomalies 1–6) Visualizes per-feature contributions to each anomaly’s score relative to baseline. Explaining why a specific data point is anomalous. Visual breakdown of how each feature contributes to anomaly score. Enables debugging of individual anomalies through transparent explanation.

📗 Cluster-Level Diagnostic Plots

Plot Description Best For Adds Why
Clusters – Overall Shows all clusters since they all fit into one plot. Gaining a holistic view of cluster characteristics in the dataset. An overall summary of how all clusters are distributed and their key metrics. Understanding the general structure and properties of clusters can help identify patterns and potential anomalies in the data.
Clusters – Largest Average Radius Ranks clusters by mean distance of members from their centroid. Getting an overview of clusters that are more dispersed. Identifies clusters with internal variability. Large average radius suggests less cohesion and potential outliers.
Clusters – Largest Max Radius Shows clusters with the farthest outlying member. Identifying clusters that have members farthest from cluster center. Highlights clusters containing extreme outliers. Indicates clusters that may contain hidden anomalies.
Clusters – Largest Size Displays cluster membership counts. Understanding which clusters contain the most code units. Provides sense of frequency of code structures. Large clusters may represent common design patterns; small clusters are specialized.
Cluster Probabilities Distribution of HDBSCAN membership probabilities. Detecting code units that don’t strongly belong to any cluster. Measures how well-defined clusters are. Highlights noisy or weakly defined clusters.

📒 Cluster Noise & Bridge Diagnostics

Plot Description Best For Adds Why
Cluster Noise – Highly Central and Popular Central nodes that don’t fit any cluster. Detecting code units that are highly connected but anomalous. Reveals influential but misfit nodes. Such nodes may be key but unstable integration points.
Cluster Noise – Poorly Integrated Bridges Nodes connecting clusters but weakly integrated. Detecting code units that bridge modules unusually. Identifies cross-cutting or leaking dependencies. May reveal architectural boundary violations.
Cluster Noise – Role Inverted Bridges Bridges with reversed structural roles compared to expected topology. Detecting code units connecting clusters in unexpected ways. Highlights anomalous coupling roles. Indicates architectural inversion or misuse of interfaces.

📙 Feature Distribution & Relationship Plots

Plot Description Best For Adds Why
Betweenness Centrality Distribution Histogram of betweenness values. Identifying code units that act as structural bridges. Insight into flow of dependency control. Detects potential bottlenecks or single points of failure.
Clustering Coefficient Distribution Histogram of local clustering coefficients. Identifying modularity and local cohesion. Insight into how tightly code units cluster. Reveals how cohesive or isolated different regions of the graph are.
PageRank – ArticleRank Difference Distribution Distribution of PageRank - ArticleRank. Identifying influential nodes beyond local connectivity. Shows imbalance between influence and popularity. Highlights components with disproportionate architectural impact.
Clustering Coefficient vs PageRank Scatterplot comparing local clustering to global influence. Identifying relationships between cohesion and centrality. Visualizes trade-offs between modularity and reach. Helps spot code units that are both locally and globally critical.

📕 Graph Visualizations (Archetype-Level Network Views)

Plot Description Best For Adds Why
Top Hub Graph Visualization Displays the most connected node (e.g., #1 Hub) at the center, surrounded by its direct dependencies. Incoming nodes show who is dependent on the hub. Understanding highly connected code units or components that serve as central integrators. Highlights nodes that act as major dependency aggregators. Helps detect over-centralized modules or potential architectural bottlenecks.
Top Bottleneck Graph Visualization Shows the node with the highest betweenness centrality (e.g., #1 Bottleneck) and its local neighborhood. Identifying code units that control information or dependency flow. Emphasizes nodes that mediate critical paths between modules. Reveals single points of failure or routing constraints in dependency flow.
Top Authority Graph Visualization Centers the most authoritative node (e.g., #1 Authority) with incoming and outgoing links from dependent nodes with high PageRank and emphasized PageRank to ArticleRank difference. Detecting key knowledge or functionality providers. Highlights components with high centrality. Indicates structural or semantic “sources of truth” in the system.
Top Bridge Graph Visualization Displays a node acting as a structural bridge between clusters (e.g., #1 Bridge) and its cross-cluster connections based on node embeddings encoding the Graph structure. Understanding cross-cutting dependencies between modules. Reveals links connecting distinct architectural domains. Useful for spotting boundary leaks or undesired coupling between subsystems.
Top Outlier Graph Visualization Centers an unusual or isolated node (e.g., #1 Outlier) that can hardly be assigned to a cluster and visualizes its sparse or unexpected dependency patterns. Identifying structurally or behaviorally anomalous nodes. Highlights nodes with rare or unexpected connection patterns. Helps pinpoint code units that deviate from established dependency norms.

Note:

  • In all Graph Visualizations, the central node represents the selected Top Archetype (e.g., Top 1 Hub).
  • Darker nodes indicate incoming dependencies, while brighter nodes indicate outgoing dependencies.
  • Emphasized nodes (thicker borders or larger size) mark particularly influential or anomalous dependencies, depending on the archetype.
  • These visualizations are most effective for interpreting local dependency topology and role significance of key components.

📔 Summary Categories

Category Included Plots Typical Usage
Main Diagnostic Anomalies, Global SHAP, Feature Dependence High-level anomaly review
Local Explanation Local SHAP Force Plots Case-by-case anomaly debugging
Cluster Diagnostics Cluster Radius / Size / Probability Assess cluster cohesion and outliers
Cluster Noise Analysis Cluster Noise (3 types) Identify special structural anomalies
Feature Distributions Betweenness, Clustering, Rank Difference Assess feature-based structure patterns
Feature Relationships Clustering vs PageRank Evaluate global vs local influence balance
Archetype Graphs Top Hub / Bottleneck / Authority / Bridge / Outlier Visualizing key dependency roles and structural importance

💡 Reading Guidance

  • Color Conventions:
    Red = anomalous, Green = typical, Light grey = noise, Pale colors = clusters.
  • Scales:
    SHAP values are normalized (mean absolute); graph metrics standardized by z-score.
  • How to Use:
    1. Start with Main Diagnostic plots to identify anomalies and drivers.
    2. Use Local SHAP for detailed case analysis.
    3. Check Cluster Diagnostics and Noise Plots to verify grouping quality.
    4. Use Feature Distributions to contextualize metrics.
    5. Cross-reference Feature Relationships for architectural interpretation.

📄 Structured Form (YAML Summary)

You can include this in your appendix for machine-readable mapping:

plots:
  main:
    - name: Anomalies
      purpose: Distribution of anomalies and clusters
    - name: Global Feature Importance (SHAP)
      purpose: Global feature ranking
    - name: Feature Dependence
      purpose: Feature–score relationship
  local:
    - name: Local SHAP Force Plots
      purpose: Local explanations for top anomalies
  cluster:
    - name: Clusters Largest Average Radius
      purpose: Identify dispersed clusters
    - name: Clusters Largest Max Radius
      purpose: Identify extreme outlier clusters
    - name: Clusters Largest Size
      purpose: Identify dominant cluster types
    - name: Cluster Probabilities
      purpose: Assess cluster definition strength
  cluster_noise:
    - name: Cluster Noise – Highly Central and Popular
      purpose: Central anomalies without cluster fit
    - name: Cluster Noise – Poorly Integrated Bridges
      purpose: Weakly integrated bridges
    - name: Cluster Noise – Role Inverted Bridges
      purpose: Inverted bridge roles
  feature_distributions:
    - name: Betweenness Centrality Distribution
      purpose: Bridge and bottleneck detection
    - name: Clustering Coefficient Distribution
      purpose: Cohesion and modularity measurement
    - name: PageRank – ArticleRank Difference Distribution
      purpose: Influence vs popularity analysis
  feature_relationships:
    - name: Clustering Coefficient vs PageRank
      purpose: Local vs global influence comparison

4. Taxonomy of Anomaly Archetypes

Archetype Feature Profile Architectural Risk
Hub High degree, low clustering coefficient Central dependency; fragile hotspot
Bottleneck High betweenness, low redundancy Single point of failure; slows evolution
Outlier High cluster distance, small cluster size Misfit or irregular dependency pattern
Authority High PageRank, low ArticleRank Over-relied utility; low local stability
Bridge Cross-cluster connection Risky coupling; weak modular boundaries

Structured form (for LLM parsing):

archetypes:
  - name: Hub
    profile: High degree, low clustering coefficient
    risk: Central dependency, fragile hotspot
  - name: Bottleneck
    profile: High betweenness, low redundancy
    risk: Single point of failure
  - name: Outlier
    profile: High cluster distance, small cluster size
    risk: Misfit component
  - name: Authority
    profile: High PageRank, low ArticleRank
    risk: Over-relied utility
  - name: Bridge
    profile: Cross-cluster connector
    risk: Risky coupling

5. Recommendations

  • Refactor hubs: Decompose large or over-connected utilities.
  • Mitigate bottlenecks: Introduce redundancy or alternative communication paths.
  • Investigate outliers: Determine if anomalies are justified exceptions.
  • Raise cohesion: Increase local clustering by improving modular boundaries.
  • Stabilize authorities: Encapsulate frequently used but fragile components.
  • Validate bridges: Confirm cross-cluster connectors are intentional and safe.

6. Appendix

6.1 Methodology Overview

  1. Build dependency graph (types, packages, artifacts).
  2. Compute graph metrics: degree, PageRank, betweenness, clustering coefficient, etc.
  3. Generate embeddings via Fast Random Projection.
  4. Reduce embeddings with PCA (retain 90% variance).
  5. Train Isolation Forest for anomaly detection.
  6. Explain results using SHAP (via Random Forest proxy).
  7. Cluster anomalies via HDBSCAN, tuned with Leiden reference communities (AMI score).
  8. Hyperparameter optimization for both Isolation Forest and Random Forest proxy with their F1 score

6.2 Feature Set

  • Degree (in/out)
  • PageRank
  • ArticleRank
  • Page-to-Article Rank Difference
  • Betweenness Centrality
  • Local Clustering Coefficient
  • Cluster Outlier Score (1.0 - cluster probability)
  • Cluster Radius (avg, max)
  • Cluster Size
  • Node Embedding (PCA 20–35 dims)

6.3 Architecture Diagram

Anomaly Detection Architecture