|
37 | 37 | "| `1.0 - HDBSCAN membership probability` | Cluster Confidence | How confidently HDBSCAN clustered this node, 1-x inverted | High score = likely anomaly |\n", |
38 | 38 | "| `Average Cluster Radius` | Cluster Context | How tight or spread out the cluster is | Highly spread clusters may be a less meaningful one |\n", |
39 | 39 | "| `Abstractness` (Robert C. Martin) | Design / OO Metric | Ratio of abstract types (interfaces, abstract classes) to total types | Indicates architectural intent; supports Dependency Inversion Principle and stability balance |\n", |
40 | | - "| `Relative Strong Component Size (vs WCC Median)` | Structural / Graph Topology | Size of the node’s strongly connected component normalized by the median SCC size within its weakly connected component | Highlights unusually large cyclic dependency groups relative to local context; high values often indicate architectural tangles or stability issues |\n" |
| 40 | + "| `Relative Strong Component Size (vs WCC Median)` | Structural / Graph Topology | Size of the node’s strongly connected component normalized by the median SCC size within its weakly connected component | Highlights unusually large cyclic dependency groups relative to local context; high values often indicate architectural tangles or stability issues |\n", |
| 41 | + "| `Max Topological Distance from Source (SCC DAG)` | Structural / Graph Topology | Longest path from any source SCC to the node’s SCC in the condensed DAG | Approximates architectural depth or layering; high values indicate deeply nested components and potential rigidity or change amplification |" |
41 | 42 | ] |
42 | 43 | }, |
43 | 44 | { |
|
236 | 237 | " ,codeUnit.clusteringHDBSCANLabel AS clusterLabel\n", |
237 | 238 | " ,codeUnit.clusteringHDBSCANMedoid AS clusterMedoid\n", |
238 | 239 | " ,coalesce(stronglyConnectedComponent.size / weaklyConnectedComponent.stronglyConnectedComponentSizePercentile50, 1.0) AS stronglyConnectedComponentSizeRatio\n", |
| 240 | + " ,coalesce(stronglyConnectedComponent.topologicalSortMaxDistanceFromSource, 0) AS topologicalComponentLayer\n", |
239 | 241 | " ,codeUnit.embeddingsFastRandomProjectionTunedForClusteringVisualizationX AS embeddingVisualizationX\n", |
240 | 242 | " ,codeUnit.embeddingsFastRandomProjectionTunedForClusteringVisualizationY AS embeddingVisualizationY\n", |
241 | 243 | " \"\"\"\n", |
|
676 | 678 | " study.enqueue_trial({'isolation_max_samples': 0.42726366840740576, 'isolation_n_estimators': 141, 'proxy_n_estimators': 190, 'proxy_max_depth': 5})\n", |
677 | 679 | " study.enqueue_trial({'isolation_max_samples': 0.40638732079782663, 'isolation_n_estimators': 108, 'proxy_n_estimators': 191, 'proxy_max_depth': 9})\n", |
678 | 680 | " \n", |
| 681 | + " study.enqueue_trial({'isolation_max_samples': 0.10105966483207725, 'isolation_n_estimators': 271, 'proxy_n_estimators': 237, 'proxy_max_depth': 9})\n", |
679 | 682 | " study.enqueue_trial({'isolation_max_samples': 0.10010443935999927, 'isolation_n_estimators': 350, 'proxy_n_estimators': 344, 'proxy_max_depth': 8})\n", |
680 | 683 | " study.enqueue_trial({'isolation_max_samples': 0.10015063610944819, 'isolation_n_estimators': 329, 'proxy_n_estimators': 314, 'proxy_max_depth': 8})\n", |
681 | 684 | "\n", |
|
2067 | 2070 | "metadata": {}, |
2068 | 2071 | "outputs": [], |
2069 | 2072 | "source": [ |
2070 | | - "# TODO delete when finished tweaking\n", |
2071 | | - "top10=get_top_anomalies(java_type_anomaly_detection_features, top_n=25).reset_index(drop=True)\n", |
2072 | | - "print(top10.to_csv(index=False, columns=['shortCodeUnitName', 'anomalyScore']))" |
| 2073 | + "# For debugging purposes\n", |
| 2074 | + "# top10=get_top_anomalies(java_type_anomaly_detection_features, top_n=25).reset_index(drop=True)\n", |
| 2075 | + "# print(top10.to_csv(index=False, columns=['shortCodeUnitName', 'anomalyScore']))" |
2073 | 2076 | ] |
2074 | 2077 | }, |
2075 | 2078 | { |
|
0 commit comments