Centrality Metrics
Measure node importance using PageRank, betweenness, Katz, HITS, and other centrality algorithms.
Centrality metrics quantify the importance of nodes in a graph. In Lean dependency graphs, they reveal which theorems, lemmas, and definitions are most critical to the codebase.
Start with PageRank for a quick overview of important nodes, then use betweenness centrality to find critical "bridge" lemmas.
PageRank
PageRank measures the importance of a node based on the structure of incoming links. A node is important if other important nodes depend on it.
Formula
Where:
- is the damping factor (probability of following a link vs random jump)
- is the total number of nodes
- is the set of nodes pointing to (predecessors)
- is the set of nodes points to (successors)
Parameters
| Parameter | Default | Description |
|---|---|---|
alpha | 0.85 | Damping factor |
max_iter | 100 | Maximum iterations for convergence |
tol | 1e-6 | Convergence tolerance |
Interpretation in Lean
- High PageRank: Fundamental theorems that many other proofs depend on (e.g., basic lemmas in
Mathlib.Algebra) - Low PageRank: Specialized results at the "leaves" of the dependency tree
Betweenness Centrality
Betweenness measures how often a node lies on the shortest path between other nodes. High betweenness nodes are "bridges" connecting different parts of the graph.
Formula
Where:
- is the number of shortest paths from to
- is the number of those paths passing through
Parameters
| Parameter | Default | Description |
|---|---|---|
normalized | true | Normalize by for directed graphs |
k | null | Sample size for approximation (auto-set for large graphs) |
Optimization
For graphs with more than 5,000 nodes, Astrolabe automatically uses sampling to speed up computation. Results are approximate but sufficient for identifying key nodes.
For large graphs, sampling uses min(1000, n/5) random source nodes.
Interpretation in Lean
- High betweenness: "Gateway" lemmas that connect different mathematical domains
- Low betweenness: Nodes in densely connected local clusters
Katz Centrality
Katz centrality measures influence based on the total number of walks from a node, not just shortest paths. It's better suited for DAGs than PageRank.
Formula
Where:
- is the attenuation factor (should be )
- is the base centrality for all nodes
- is the adjacency matrix
Parameters
| Parameter | Default | Description |
|---|---|---|
alpha | 0.1 | Attenuation factor |
beta | 1.0 | Base centrality |
max_iter | 1000 | Maximum iterations |
normalized | true | Normalize scores |
Katz vs PageRank
| Aspect | Katz | PageRank |
|---|---|---|
| Link weight | All walks count | Only direct links |
| DAG handling | Native support | May need adaptation |
| Computation | Can diverge if too high | More stable |
HITS (Hub/Authority)
HITS identifies two types of important nodes: hubs that point to many authorities, and authorities pointed to by many hubs.
Formulas
Authority score (iterative):
Hub score (iterative):
Parameters
| Parameter | Default | Description |
|---|---|---|
max_iter | 100 | Maximum power method iterations |
tol | 1e-8 | Convergence tolerance |
normalized | true | Normalize scores |
Interpretation in Lean
| Role | Description | Example |
|---|---|---|
| Authority | Fundamental theorems pointed to by many proofs | Core type class instances |
| Hub | Comprehensive proofs that cite many sources | Major theorems combining many lemmas |
Closeness Centrality
Closeness measures how close a node is to all other nodes. High closeness nodes can reach the rest of the graph quickly.
Formula
Where is the shortest path distance from to .
Interpretation in Lean
- High closeness: Central definitions that are within short dependency chains of most theorems
- Low closeness: Peripheral or highly specialized results
Eigenvector Centrality
A node is important if it's connected to other important nodes. This is the principal eigenvector of the adjacency matrix.
Formula
Where is the largest eigenvalue of the adjacency matrix.
Parameters
| Parameter | Default | Description |
|---|---|---|
max_iter | 1000 | Maximum power method iterations |
Fallback
If the power method fails to converge (common in sparse DAGs), Astrolabe falls back to PageRank.
Comparison Summary
| Metric | Best For | Complexity | Lean Interpretation |
|---|---|---|---|
| PageRank | General importance | O(E) per iteration | Foundational theorems |
| Betweenness | Bridge detection | O(VE) | Gateway lemmas |
| Katz | DAG influence | O(E) per iteration | Total downstream impact |
| HITS | Role differentiation | O(E) per iteration | Hubs vs authorities |
| Closeness | Accessibility | O(V(V+E)) | Central definitions |
| Eigenvector | Network effects | O(E) per iteration | Recursive importance |
API Endpoints
GET /api/project/analysis/pagerank
GET /api/project/analysis/betweenness
GET /api/project/analysis/katz
GET /api/project/analysis/closeness
GET /api/project/analysis/eigenvector
Each returns a dictionary mapping node IDs to centrality scores, plus the top-k highest-scoring nodes.