Topological Data Analysis

Analyze graph topology using Betti numbers, persistent homology, and the Mapper algorithm.

Topological Data Analysis (TDA) reveals the "shape" of data using tools from algebraic topology. In dependency graphs, TDA identifies cycles, holes, and the overall connectivity structure.

Betti Numbers

Betti numbers count the topological features at each dimension.

Definition

Betti Number	Counts	Graph Interpretation
β₀	Connected components	Separate subgraphs
β₁	1-dimensional holes	Independent cycles
β₂	2-dimensional voids	(Not applicable to graphs)

Formulas

For a graph treated as a 1-dimensional simplicial complex:

$\beta_0 = \text{number of connected components}$

$\beta_1 = |E| - |V| + \beta_0$

This follows from Euler's formula for the Euler characteristic:

$\chi = |V| - |E| = \beta_0 - \beta_1$

Cyclomatic Complexity

$M = |E| - |V| + 2\beta_0$

Measures the number of independent paths through the graph. Higher values indicate more complex control flow.

Interpretation in Lean

Metric	High Value	Low Value
β₀	Disconnected modules	Single connected codebase
β₁	Circular dependencies	Clean DAG structure
Cyclomatic	Complex proof dependencies	Linear proof chains

Persistent Homology

Persistent homology tracks how topological features appear and disappear as we "grow" the graph through a filtration.

Filtration

A filtration is a nested sequence of subgraphs:

$G_0 \subseteq G_1 \subseteq \cdots \subseteq G_n = G$

Astrolabe supports three filtration types:

Type	Node Value	Interpretation
degree	1 - deg(v)/max_deg	High-degree nodes appear last
centrality	1 - PR(v)/max_PR	Important nodes appear last
distance	Shortest path from hub	Grow outward from central node

Birth-Death Diagram

Each topological feature has:

Birth time: When the feature appears
Death time: When the feature merges or fills in
Persistence: death - birth

Features with high persistence are "real" structure; low persistence are noise.

Parameters

Parameter	Default	Description
filtration	"degree"	Filtration method
max_dimension	1	Maximum homology dimension

Output

{
  "persistence_diagrams": {
    "0": [{"birth": 0.0, "death": 0.3, "persistence": 0.3}],
    "1": [{"birth": 0.5, "death": 0.8, "persistence": 0.3}]
  },
  "betti_curve": [{"threshold": 0.1, "beta_0": 5, "beta_1": 2}],
  "summary": {"total_features": 42, "long_lived_features": 7}
}

Persistence Entropy

Persistence entropy measures the diversity of topological feature lifespans.

Formula

$H = -\sum_i p_i \log(p_i)$

Where p_i = persistence_i / Σ persistence_j

Interpretation

Entropy	Meaning
High	Many features with similar persistence (diverse topology)
Low	Few dominant features (simple topology)

Persistence Landscape

Persistence landscapes provide a stable vectorization of persistence diagrams for machine learning.

Construction

For each persistence pair (b, d) and parameter t:

$\Lambda(t) = \max(0, \min(t - b, d - t))$

This creates "tent" functions over each feature.

Parameters

Parameter	Default	Description
dimension	1	Homology dimension
num_landscapes	5	Number of landscape functions
resolution	100	Sample points

Use Case

Landscapes can be used as feature vectors for:

Comparing graph structures
Clustering similar projects
Anomaly detection

Mapper Algorithm

The Mapper algorithm creates a simplified topological skeleton of the graph by combining filtration, covering, and clustering.

Algorithm

Filter: Compute a function f: V → ℝ (degree, PageRank, closeness, or depth)
Normalize: Scale values to [0, 1]
Cover: Create overlapping intervals
Cluster: Within each interval, find connected components
Connect: Link clusters that share original nodes

Parameters

Parameter	Default	Description
filter_func	"degree"	Filter function
num_intervals	10	Number of covering intervals
overlap	0.3	Fraction overlap (0-0.5)
clustering	"components"	Within-interval clustering

Filter Functions

Function	Description	Best For
degree	Node degree	Connectivity structure
pagerank	PageRank score	Importance hierarchy
closeness	Closeness centrality	Core-periphery structure
depth	Topological depth	Layered dependencies

Output

The Mapper graph is a simplified representation:

{
  "mapper_nodes": [
    {"id": 0, "interval": 2, "members": ["node1", "node2"], "size": 2}
  ],
  "mapper_edges": [
    {"source": 0, "target": 1, "weight": 1}
  ],
  "summary": {
    "num_mapper_nodes": 15,
    "num_mapper_edges": 20,
    "mapper_beta_1": 3
  }
}

Interpretation in Lean

Mapper Feature	Meaning
Long chain	Linear dependency progression
Branch point	Theorem used by multiple proof paths
Loop	Circular or mutual dependencies
Isolated node	Self-contained module

Application Example

Consider analyzing a Lean project's dependency structure:

Betti numbers: β₀ = 1 (connected), β₁ = 5 (has cycles)
Persistent homology: Long-lived H₀ features reveal major components
Mapper with PageRank: Shows hierarchy from core theorems to applications

API Endpoints

GET /api/project/analysis/topology        # Betti numbers and Euler characteristic
GET /api/project/analysis/mapper          # Mapper algorithm
GET /api/project/analysis/geometry        # Includes persistent homology (if GUDHI available)