UCM Case Study: Actionable Impact Analysis on marimo
What UCM Does (30-second version)
UCM scans your Python/TS/Rust codebase, builds a dependency graph, and answers one question: “I changed X — what else might break?”
It outputs:
- Impact Report — every entity affected by your change, with confidence scores and the exact dependency path
- Test Intent — prioritized list of what to test, what risks exist, and what coverage gaps you have
- Ambiguity Flags — where the graph has low-confidence or stale edges (things the tool isn’t sure about)
Getting Started
Install and build
git clone https://github.com/paritoshk/ucm-core
cd ucm-core
cargo build --release
# Binary at: target/release/ucm
Scan your codebase
# TypeScript (default)
ucm scan ./my-project --language typescript
# Python — specify the package root for absolute import resolution
ucm scan ./marimo --language python --package-root marimo
# Rust
ucm scan ./my-crate --language rust
# Large repos (>500 entities) need --no-limit
ucm scan ./marimo --language python --package-root marimo --no-limit
Run impact analysis before your PR
# "I changed the execute_cell method in executor.py — what might break?"
ucm impact _runtime/executor.py "Executor.execute_cell" \
--path ./marimo --language python --package-root marimo --no-limit --json
Get test recommendations
# "What should I test before merging?"
ucm intent _runtime/executor.py "Executor.execute_cell" \
--path ./marimo --language python --package-root marimo --no-limit --json
Export full graph for custom analysis
ucm graph ./marimo --language python --package-root marimo --no-limit --export json > graph.json
How to Read the JSON Outputs
Impact Report (ucm impact --json)
{
"changes": [{ "entity_id": "...", "name": "Executor.execute_cell", "file_path": "_runtime/executor.py" }],
"direct_impacts": [...],
"indirect_impacts": [...],
"not_impacted": [...],
"ambiguities": [...],
"stats": { "total_entities": 8117, "directly_impacted": 1, "indirectly_impacted": 2 }
}
Each impact entry looks like:
{
"name": "hash.py",
"confidence": 0.95,
"tier": "High",
"depth": 1,
"path": ["_ast/visitor.py#ScopedVisitor", "_save/hash.py#module"],
"reason": "imports via ScopedVisitor",
"explanation_chain": {
"summary": "hash.py is impacted by this change",
"steps": [{
"step": 1,
"evidence": "Graph traversal found dependency path: ScopedVisitor -> hash.py",
"inference": "hash.py is transitively dependent via 1 hop",
"confidence": 0.95
}]
}
}
What to do with this:
direct_impacts— these MUST be tested. They directly depend on what you changed.indirect_impacts— test these if confidence > 0.7. Thepathfield shows you the exact dependency chain so you can trace WHY.not_impacted— UCM says these are safe to skip. Thereasonfield explains why (no graph path, or confidence below threshold).depth— how many hops away. Depth 1 = direct consumer. Depth 4+ = likely safe to skip.
Test Intent Report (ucm intent --json)
{
"high_confidence": [
{
"description": "Verify Executor still functions correctly after change",
"rationale": "contains via Executor.execute_cell",
"confidence": 0.99,
"related_entity": "Executor"
}
],
"risks": [
{
"severity": "High",
"description": "Executor directly depends on changed code — regression risk",
"mitigation": "Run existing tests for Executor and verify expected behavior"
},
{
"severity": "Medium",
"description": "DefaultExecutor is indirectly affected via 2-hop chain with 89% confidence",
"mitigation": "Integration test covering the path: Executor.execute_cell -> Executor -> DefaultExecutor"
}
],
"coverage_gaps": [
{
"entity": "DefaultExecutor",
"description": "DefaultExecutor is impacted but has no linked test coverage in the graph",
"recommendation": "Add test coverage for DefaultExecutor focusing on the changed behavior"
}
],
"decided_not_to_test": [
{ "entity": "marimo_path.py", "reason": "No graph path exists to changed entities", "confidence_of_safety": 0.9 }
]
}
What to do with this:
high_confidence— your PR checklist. Write/run these tests before merging.risks— include these in your PR description. Themitigationfield is the action item.coverage_gaps— these are the entities UCM flagged as impacted but having NO test. This is where you add new tests.decided_not_to_test— UCM’s reasoning for what it says is safe to skip. Review these if your PR is high-risk.
How the Confidence Math Works
UCM uses three mathematical models from the research literature:
1. Noisy-OR Fusion (Google Knowledge Vault, KDD 2014)
When multiple evidence sources confirm a dependency:
P(edge exists) = 1 - product(1 - P(source_i))
Example: Static analysis says 92% confident, test coverage confirms at 75%.
Result: 1 - (1-0.92)(1-0.75) = 1 - 0.08*0.25 = 0.98 (98% confident).
Sources that agree compound confidence. This is why edges confirmed by both code analysis and test coverage are very high confidence.
2. Temporal Decay (TempValid, ACL 2024)
Confidence isn’t permanent — edges decay over time since last verification:
confidence(t) = base_confidence * exp(-lambda * days_since_verified)
Decay rates by edge type:
- Import statements:
lambda=0.001— very slow (imports rarely become invalid) - Call graph edges:
lambda=0.005— slow - Test coverage:
lambda=0.01— moderate (tests go stale) - API traffic:
lambda=0.1— fast (traffic patterns change daily)
3. Chain Confidence (BFS propagation)
When A -> B -> C, the confidence that changing A impacts C:
P(A impacts C) = P(A->B) * P(B->C)
Each hop multiplicatively reduces confidence. A 4-hop chain at 0.95 per edge = 0.95^4 = 0.81. This is why depth matters — deeper impacts are less certain.
For multiple paths (A->B->C and A->D->C), UCM uses Noisy-OR over the path confidences.
What the Tiers Mean
- High (>=0.85) — definitely impacted, definitely test this
- Medium (0.60-0.84) — probably impacted, test if time permits
- Low (<0.60) — might be impacted, low priority
Which UCM Modules Do What
| Crate | Purpose | Key Function |
|---|---|---|
ucm-ingest/code_parser |
Scans source files, extracts entities (functions, classes, modules) and edges (imports, contains, extends) | parse_source_code_full() |
ucm-core/graph |
Stores the dependency graph (petgraph), resolves entity lookups, computes reverse dependencies | UcmGraph, reverse_deps() |
ucm-core/confidence |
Noisy-OR fusion, temporal decay, chain confidence math | noisy_or(), temporal_decay(), chain_confidence() |
ucm-core/edge |
Edge model with confidence scoring, evidence tracking, decay rates | UcmEdge, decayed_confidence() |
ucm-reason/impact |
Reverse BFS from changed entities, classifies direct/indirect/not-impacted | analyze_impact(), impact_bfs() |
ucm-reason/intent |
Converts impact report into test recommendations, risks, coverage gaps | generate_test_intent() |
ucm-reason/ambiguity |
Flags low-confidence edges, stale data, conflicting sources | detect_ambiguities() |
ucm-reason/explanation |
Builds traceable reasoning chains for every conclusion | ExplanationChain |
ucm-cli |
CLI interface, file walking, package detection | ucm scan/graph/impact/intent |
Data Flow
Source files -> code_parser -> UcmEvents -> GraphProjection -> UcmGraph
|
analyze_impact() <- changed entity IDs
|
ImpactReport
|
generate_test_intent()
|
TestIntent (JSON)
marimo Validation Results
What We Scanned
| Metric | Value |
|---|---|
| Python files | 1,108 |
| Lines of code | 170,240 |
| Entities discovered | 8,117 (5,832 functions, 1,177 classes, 1,108 modules) |
| Total edges | 14,582 |
| Import edges | 1,295 |
| Contains edges | 3,402 |
| Extends edges | 156 |
| DependsOn edges | 9,729 |
| Largest connected component | 6,386 nodes (78% of graph) |
Critical Finding: Absolute Imports
marimo has 2,470 absolute imports and zero relative imports. Before our parser fix, UCM produced zero cross-module edges. After: 1,295 import edges and 78% of the graph connected.
5 Impact Scenarios
| Scenario | What Changed | Direct | Indirect | Interpretation |
|---|---|---|---|---|
| A | Executor.execute_cell (runtime method) |
1 | 2 | Contained within class hierarchy — low blast radius |
| A’ | _runtime/runtime.py (module-level) |
118 | 64 | Widest blast radius in codebase — 182 total impacts |
| B | ScopedVisitor (AST class) |
1 | 93 | Wide cascade through cell compilation pipeline |
| C | DirectedGraph (dependency tracking) |
2 | 24 | Moderate cascade through execution scheduler |
| D | slider (UI plugin) |
0 | 0 | Leaf node — correctly identified as isolated |
| E | flatten (utility) |
1 | 10 | Cross-cutting but shallow |
What this proves: UCM correctly differentiates blast radius. Changes to core runtime cascade widely; changes to UI plugins are isolated. This matches architectural intuition and manual code review.
Test Intent for Scenario A
UCM recommended 4 high-priority test scenarios, identified 3 risks, and flagged 3 coverage gaps — specifically that Executor, DefaultExecutor, and StrictExecutor are impacted but have no linked test coverage in the graph.
Actionable output: Before merging a PR that touches execute_cell, a developer should run tests covering those 3 classes and verify the executor chain still works end-to-end.
What a Developer Should Do Before a PR
- Run
ucm impacton the files/symbols you changed - Check
direct_impacts— these are your must-test list - Check
risks— paste the high-severity ones into your PR description - Check
coverage_gaps— if UCM says an impacted entity has no tests, that’s your signal to add one - Check
decided_not_to_test— verify UCM’s reasoning makes sense for your change
Example PR workflow
# After making changes to _ast/visitor.py
ucm intent _ast/visitor.py ScopedVisitor \
--path . --language python --package-root marimo --no-limit
# Output tells you:
# MUST TEST: hash.py, cell_manager.py, app.py (direct dependencies)
# RISKS: 93 indirect impacts through AST pipeline
# COVERAGE GAPS: SerialRefs, BasePersistenceLoader have no tests
# SAFE TO SKIP: UI plugins, tutorials, smoke tests (no graph path)
Known Limitations
| Limitation | Impact | Workaround |
|---|---|---|
__init__.py re-exports not resolved |
~48% of absolute imports miss | Use --package-root + file is still tracked as module |
| No call-site detection | Only import/contains/extends edges, no Calls edges |
Impact analysis uses module-level granularity for unresolved calls |
| Regex-based parsing (not tree-sitter) | May miss complex syntax (decorators, comprehensions) | Covers >95% of standard def/class/import patterns |
| No server mode | Must re-scan for each command | Graph persistence is on the roadmap |
| Confidence starts fresh each scan | No persistence of historical confidence | Edges from static analysis have very slow decay (lambda=0.001) |
Reproducing This Case Study
# 1. Clone marimo
git clone --depth 1 https://github.com/marimo-team/marimo ~/marimo
# 2. Build UCM
cd ucm-core && cargo build --release
# 3. Run the full pipeline
cd case-study/marimo
./run.sh ~/marimo ../../target/release/ucm
# 4. Analyze graph
python3 analyze_graph.py results/graph.json
# 5. Find contribution targets (high-impact, untested code)
python3 contribution_analysis.py