Topological Data Analysis
Analyze graph topology using Betti numbers, persistent homology, and the Mapper algorithm.
Topological Data Analysis (TDA) reveals the "shape" of data using tools from algebraic topology. In dependency graphs, TDA identifies cycles, holes, and the overall connectivity structure.
Betti Numbers
Betti numbers count the topological features at each dimension.
Definition
| Betti Number | Counts | Graph Interpretation |
|---|---|---|
| β₀ | Connected components | Separate subgraphs |
| β₁ | 1-dimensional holes | Independent cycles |
| β₂ | 2-dimensional voids | (Not applicable to graphs) |
Formulas
For a graph treated as a 1-dimensional simplicial complex:
This follows from Euler's formula for the Euler characteristic:
Cyclomatic Complexity
Measures the number of independent paths through the graph. Higher values indicate more complex control flow.
Interpretation in Lean
| Metric | High Value | Low Value |
|---|---|---|
| β₀ | Disconnected modules | Single connected codebase |
| β₁ | Circular dependencies | Clean DAG structure |
| Cyclomatic | Complex proof dependencies | Linear proof chains |
Persistent Homology
Persistent homology tracks how topological features appear and disappear as we "grow" the graph through a filtration.
Filtration
A filtration is a nested sequence of subgraphs:
Astrolabe supports three filtration types:
| Type | Node Value | Interpretation |
|---|---|---|
| degree | 1 - deg(v)/max_deg | High-degree nodes appear last |
| centrality | 1 - PR(v)/max_PR | Important nodes appear last |
| distance | Shortest path from hub | Grow outward from central node |
Birth-Death Diagram
Each topological feature has:
- Birth time: When the feature appears
- Death time: When the feature merges or fills in
- Persistence: death - birth
Features with high persistence are "real" structure; low persistence are noise.
Parameters
| Parameter | Default | Description |
|---|---|---|
| filtration | "degree" | Filtration method |
| max_dimension | 1 | Maximum homology dimension |
Output
{
"persistence_diagrams": {
"0": [{"birth": 0.0, "death": 0.3, "persistence": 0.3}],
"1": [{"birth": 0.5, "death": 0.8, "persistence": 0.3}]
},
"betti_curve": [{"threshold": 0.1, "beta_0": 5, "beta_1": 2}],
"summary": {"total_features": 42, "long_lived_features": 7}
}
Persistence Entropy
Persistence entropy measures the diversity of topological feature lifespans.
Formula
Where p_i = persistence_i / Σ persistence_j
Interpretation
| Entropy | Meaning |
|---|---|
| High | Many features with similar persistence (diverse topology) |
| Low | Few dominant features (simple topology) |
Persistence Landscape
Persistence landscapes provide a stable vectorization of persistence diagrams for machine learning.
Construction
For each persistence pair (b, d) and parameter t:
This creates "tent" functions over each feature.
Parameters
| Parameter | Default | Description |
|---|---|---|
| dimension | 1 | Homology dimension |
| num_landscapes | 5 | Number of landscape functions |
| resolution | 100 | Sample points |
Use Case
Landscapes can be used as feature vectors for:
- Comparing graph structures
- Clustering similar projects
- Anomaly detection
Mapper Algorithm
The Mapper algorithm creates a simplified topological skeleton of the graph by combining filtration, covering, and clustering.
Algorithm
- Filter: Compute a function f: V → ℝ (degree, PageRank, closeness, or depth)
- Normalize: Scale values to [0, 1]
- Cover: Create overlapping intervals
- Cluster: Within each interval, find connected components
- Connect: Link clusters that share original nodes
Parameters
| Parameter | Default | Description |
|---|---|---|
| filter_func | "degree" | Filter function |
| num_intervals | 10 | Number of covering intervals |
| overlap | 0.3 | Fraction overlap (0-0.5) |
| clustering | "components" | Within-interval clustering |
Filter Functions
| Function | Description | Best For |
|---|---|---|
| degree | Node degree | Connectivity structure |
| pagerank | PageRank score | Importance hierarchy |
| closeness | Closeness centrality | Core-periphery structure |
| depth | Topological depth | Layered dependencies |
Output
The Mapper graph is a simplified representation:
{
"mapper_nodes": [
{"id": 0, "interval": 2, "members": ["node1", "node2"], "size": 2}
],
"mapper_edges": [
{"source": 0, "target": 1, "weight": 1}
],
"summary": {
"num_mapper_nodes": 15,
"num_mapper_edges": 20,
"mapper_beta_1": 3
}
}
Interpretation in Lean
| Mapper Feature | Meaning |
|---|---|
| Long chain | Linear dependency progression |
| Branch point | Theorem used by multiple proof paths |
| Loop | Circular or mutual dependencies |
| Isolated node | Self-contained module |
Application Example
Consider analyzing a Lean project's dependency structure:
- Betti numbers: β₀ = 1 (connected), β₁ = 5 (has cycles)
- Persistent homology: Long-lived H₀ features reveal major components
- Mapper with PageRank: Shows hierarchy from core theorems to applications
API Endpoints
GET /api/project/analysis/topology # Betti numbers and Euler characteristic
GET /api/project/analysis/mapper # Mapper algorithm
GET /api/project/analysis/geometry # Includes persistent homology (if GUDHI available)