Skip to content

Commit 1e4fff3

Browse files
committed
📝 add metrics documents
1 parent 25b456b commit 1e4fff3

2 files changed

Lines changed: 111 additions & 0 deletions

File tree

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ or GitHub repository:
186186
ontologizer/ontology_hosting
187187
ontologizer/new_ontologies
188188
ontologizer/metadata
189+
ontologizer/metrics
189190

190191
.. toctree::
191192
:maxdepth: 1
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Metrics
2+
=================
3+
4+
.. sidebar:: Metric Space:
5+
6+
There are a dedicated Hugging Face space for `OntoLearner Benchmark Metrics <https://huggingface.co/spaces/SciKnowOrg/OntoLearner-Benchmark-Metrics>`_ with analysis and live plots.
7+
8+
The ``Analyzer`` class in OntoLearner provides a unified interface for computing **ontology metrics**, which can be divided into two main categories: **Topology Metrics** (capture the structural characteristics of the ontology graph) and **Dataset Metrics** (assess the quality and distribution of the extracted learning datasets). Additionally, a **complexity score** can be derived from these metrics to summarize the overall ontology richness and complexity.
9+
10+
Topology Metrics
11+
----------------
12+
Topology metrics describe the structure and organization of an ontology. The ``Analyzer`` computes the following key metrics:
13+
14+
- **Total nodes** (``total_nodes``): Total number of nodes in the ontology graph.
15+
- **Total edges** (``total_edges``): Total number of edges representing relations between nodes.
16+
- **Root nodes** (``num_root_nodes``): Nodes with no incoming edges, representing top-level concepts.
17+
- **Leaf nodes** (``num_leaf_nodes``): Nodes with no outgoing edges, representing bottom-level concepts.
18+
- **Classes** (``num_classes``): Number of distinct ontology classes.
19+
- **Properties** (``num_properties``): Number of distinct properties (object or datatype properties).
20+
- **Individuals** (``num_individuals``): Number of instances associated with classes.
21+
- **Depth metrics**:
22+
23+
- ``max_depth``: Maximum hierarchical depth in the ontology.
24+
- ``min_depth``: Minimum hierarchical depth.
25+
- ``avg_depth``: Average hierarchical depth across all nodes.
26+
- ``depth_variance``: Variance of depth distribution.
27+
28+
- **Breadth metrics**:
29+
30+
- ``max_breadth``: Maximum number of nodes at any single hierarchy level.
31+
- ``min_breadth``: Minimum number of nodes at any hierarchy level.
32+
- ``avg_breadth``: Average number of nodes per hierarchy level.
33+
- ``breadth_variance``: Variance of breadth distribution.
34+
35+
Dataset Metrics
36+
---------------
37+
38+
Dataset metrics evaluate the characteristics of machine-learning datasets extracted from the ontology. These metrics include:
39+
40+
- **Number of term-type mappings** (``num_term_types``): Number of terms associated with types.
41+
- **Number of taxonomic (is-a) relations** (``num_taxonomic_relations``): Count of hierarchical relations.
42+
- **Number of non-taxonomic relations** (``num_non_taxonomic_relations``): Count of semantic associations not in the hierarchy.
43+
- **Average terms per type** (``avg_terms``): Measures dataset balance across classes.
44+
45+
46+
Complexity Score
47+
----------------
48+
49+
The **complexity score** combines topology and dataset metrics into a single normalized score in ``[0, 1]``. First, metrics are **log-normalized** and weighted by category:
50+
51+
.. list-table::
52+
:header-rows: 1
53+
:widths: 25 50 25
54+
55+
* - Metric Category
56+
- Example Metrics
57+
- Weight
58+
* - Graph structure
59+
- ``total_nodes``, ``total_edges``, ``num_root_nodes``, ``num_leaf_nodes``
60+
- 0.3
61+
* - Knowledge coverage
62+
- ``num_classes``, ``num_properties``, ``num_individuals``
63+
- 0.25
64+
* - Hierarchy
65+
- ``max_depth``, ``min_depth``, ``avg_depth``, ``depth_variance``
66+
- 0.10
67+
* - Breadth
68+
- ``max_breadth``, ``min_breadth``, ``avg_breadth``, ``breadth_variance``
69+
- 0.20
70+
* - Dataset (LLMs4OL)
71+
- ``num_term_types``, ``num_taxonomic_relations``, ``num_non_taxonomic_relations``, ``avg_terms``
72+
- 0.15
73+
74+
75+
Next, the weighted sum of metrics is passed through a **logistic function** to normalize the final complexity score.
76+
77+
78+
Example Usage
79+
-------------
80+
81+
Here is a simple example demonstrating how to compute metrics and complexity for an ontology:
82+
83+
.. code-block:: python
84+
85+
from ontolearner.tools import Analyzer
86+
from ontolearner.ontology import Wine
87+
88+
# Step 1 — Load ontology
89+
ontology = Wine()
90+
ontology.build_graph()
91+
92+
# Step 2 — Create the analyzer
93+
analyzer = Analyzer()
94+
95+
# Step 3 — Compute topology and dataset metrics
96+
topology_metrics = analyzer.compute_topology_metrics(ontology)
97+
dataset_metrics = analyzer.compute_dataset_metrics(ontology)
98+
99+
# Step 4 — Compute overall complexity score
100+
complexity_score = analyzer.compute_complexity_score(
101+
topology_metrics=topology_metrics,
102+
dataset_metrics=dataset_metrics
103+
)
104+
# Step 5 — Display results
105+
print("Topology Metrics:", topology_metrics)
106+
print("Dataset Metrics:", dataset_metrics)
107+
print("Ontology Complexity Score:", complexity_score)
108+
109+
110+
This workflow allows ontology engineers and researchers to **quantify structural quality, dataset richness, and overall complexity**, providing actionable insights for ontology evaluation, benchmarking, and improvement.

0 commit comments

Comments
 (0)