Topological Data Analysis

Analyze graph topology using Betti numbers, persistent homology, and the Mapper algorithm.

Topological Data Analysis (TDA) reveals the "shape" of data using tools from algebraic topology. In dependency graphs, TDA identifies cycles, holes, and the overall connectivity structure.

Betti Numbers

Betti numbers count the topological features at each dimension.

Definition

Betti NumberCountsGraph Interpretation
β₀Connected componentsSeparate subgraphs
β₁1-dimensional holesIndependent cycles
β₂2-dimensional voids(Not applicable to graphs)

Formulas

For a graph treated as a 1-dimensional simplicial complex:

β0=number of connected components\beta_0 = \text{number of connected components}

β1=EV+β0\beta_1 = |E| - |V| + \beta_0

This follows from Euler's formula for the Euler characteristic:

χ=VE=β0β1\chi = |V| - |E| = \beta_0 - \beta_1

Cyclomatic Complexity

M=EV+2β0M = |E| - |V| + 2\beta_0

Measures the number of independent paths through the graph. Higher values indicate more complex control flow.

Interpretation in Lean

MetricHigh ValueLow Value
β₀Disconnected modulesSingle connected codebase
β₁Circular dependenciesClean DAG structure
CyclomaticComplex proof dependenciesLinear proof chains

Persistent Homology

Persistent homology tracks how topological features appear and disappear as we "grow" the graph through a filtration.

Filtration

A filtration is a nested sequence of subgraphs:

G0G1Gn=GG_0 \subseteq G_1 \subseteq \cdots \subseteq G_n = G

Astrolabe supports three filtration types:

TypeNode ValueInterpretation
degree1 - deg(v)/max_degHigh-degree nodes appear last
centrality1 - PR(v)/max_PRImportant nodes appear last
distanceShortest path from hubGrow outward from central node

Birth-Death Diagram

Each topological feature has:

  • Birth time: When the feature appears
  • Death time: When the feature merges or fills in
  • Persistence: death - birth

Features with high persistence are "real" structure; low persistence are noise.

Parameters

ParameterDefaultDescription
filtration"degree"Filtration method
max_dimension1Maximum homology dimension

Output

{
  "persistence_diagrams": {
    "0": [{"birth": 0.0, "death": 0.3, "persistence": 0.3}],
    "1": [{"birth": 0.5, "death": 0.8, "persistence": 0.3}]
  },
  "betti_curve": [{"threshold": 0.1, "beta_0": 5, "beta_1": 2}],
  "summary": {"total_features": 42, "long_lived_features": 7}
}

Persistence Entropy

Persistence entropy measures the diversity of topological feature lifespans.

Formula

H=ipilog(pi)H = -\sum_i p_i \log(p_i)

Where p_i = persistence_i / Σ persistence_j

Interpretation

EntropyMeaning
HighMany features with similar persistence (diverse topology)
LowFew dominant features (simple topology)

Persistence Landscape

Persistence landscapes provide a stable vectorization of persistence diagrams for machine learning.

Construction

For each persistence pair (b, d) and parameter t:

Λ(t)=max(0,min(tb,dt))\Lambda(t) = \max(0, \min(t - b, d - t))

This creates "tent" functions over each feature.

Parameters

ParameterDefaultDescription
dimension1Homology dimension
num_landscapes5Number of landscape functions
resolution100Sample points

Use Case

Landscapes can be used as feature vectors for:

  • Comparing graph structures
  • Clustering similar projects
  • Anomaly detection

Mapper Algorithm

The Mapper algorithm creates a simplified topological skeleton of the graph by combining filtration, covering, and clustering.

Algorithm

  1. Filter: Compute a function f: V → ℝ (degree, PageRank, closeness, or depth)
  2. Normalize: Scale values to [0, 1]
  3. Cover: Create overlapping intervals
  4. Cluster: Within each interval, find connected components
  5. Connect: Link clusters that share original nodes

Parameters

ParameterDefaultDescription
filter_func"degree"Filter function
num_intervals10Number of covering intervals
overlap0.3Fraction overlap (0-0.5)
clustering"components"Within-interval clustering

Filter Functions

FunctionDescriptionBest For
degreeNode degreeConnectivity structure
pagerankPageRank scoreImportance hierarchy
closenessCloseness centralityCore-periphery structure
depthTopological depthLayered dependencies

Output

The Mapper graph is a simplified representation:

{
  "mapper_nodes": [
    {"id": 0, "interval": 2, "members": ["node1", "node2"], "size": 2}
  ],
  "mapper_edges": [
    {"source": 0, "target": 1, "weight": 1}
  ],
  "summary": {
    "num_mapper_nodes": 15,
    "num_mapper_edges": 20,
    "mapper_beta_1": 3
  }
}

Interpretation in Lean

Mapper FeatureMeaning
Long chainLinear dependency progression
Branch pointTheorem used by multiple proof paths
LoopCircular or mutual dependencies
Isolated nodeSelf-contained module

Application Example

Consider analyzing a Lean project's dependency structure:

  1. Betti numbers: β₀ = 1 (connected), β₁ = 5 (has cycles)
  2. Persistent homology: Long-lived H₀ features reveal major components
  3. Mapper with PageRank: Shows hierarchy from core theorems to applications

API Endpoints

GET /api/project/analysis/topology        # Betti numbers and Euler characteristic
GET /api/project/analysis/mapper          # Mapper algorithm
GET /api/project/analysis/geometry        # Includes persistent homology (if GUDHI available)