Komment helps teams capture and retain crucial insights about their codebases.
More specifically, a Komment wiki lays out the what, the why and the how for every asset in a repository, explaining not only its internal implementation and structure, but also its purpose and how it fits into the larger project.
The latter is no easy feat. It requires indexing the codebase in a manner that gives an LLM relevant context for any arbitrarily chosen node.
Read on to learn how Adaptive Context Cruising, our proprietary approach to map complex codebases, helps extract non-obvious insights from large projects and deliver context-rich wikis to end users.

Why context matters
A naive documentation system would loop through every file in a repository, submit its content to a language model, and request some version of “document this for me, pretty please”.
Recent advancements in LLM technology, particularly in reasoning models, have made this approach surprisingly effective. But this works well only for self-contained files or boilerplate code, like simple CRUD modules.
In real-world projects, files rarely exist in isolation. They interact with many other portions of the codebase, and a vanilla LLM lacks any meaningful visibility into what these referenced entities actually are.
This inevitably leads to vague, incomplete or entirely unhelpful explanations about the project. For example, suppose a Python file imports several functions from another module via the following call
from utils import process_data, validate_input
If the model doesn’t know what process_data
or validate_input
execute, it can only guess — or worse, make something up.
To ensure our wikis provide genuinely useful insights, we need a mechanism to supply only the right context to the LLM without overwhelming its context window.
Our experiments with context extraction
Over the past year, we’ve explored multiple methods to provide the LLM with the right documentation context.
Static analysis was an obvious first choice, as it allowed us to map function dependencies, class hierarchies, and file imports. However, it struggled with dynamic languages and fell short when handling implicit references across the repository.
We also experimented with retrieval-augmented generation (RAG), precomputing a searchable index of function definitions, class documentation, and architectural metadata. While promising, this method often led to unnecessary retrievals, overwhelming the LLM with redundant or irrelevant context.
We even tried handcrafted heuristics to create framework-specific rules that detect critical dependencies. However, these quickly became brittle and failed to generalize across diverse codebases.
Ultimately, none of these approaches struck the right balance between precision, efficiency, and scalability.
That is, until now.
Adaptive Context Cruising
Our efforts led us to a fundamental realization: rather than dictating what context should be retrieved, we needed a system where the LLM could proactively decide what information was relevant.
We also quickly realized that giving an LLM unrestricted retrieval capabilities led to inefficiencies. When left unchecked, the LLM either pulled in too much irrelevant data or failed to recognize critical dependencies.
The LLM needed autonomy to request relevant information but also required structured guidance and constraints for efficiency.
So we designed an iterative, self-refining mechanism where the agent dynamically determines its own context requirements, but within the limits of pre-determined guardrails and continuous course correction.
Through a combination of guided workflows, structured function calling, and recursive context aggregation, our system is able to navigate a repository and construct a chain-of-thought analysis that adapts to specific documentation tasks at hand.

How we build adaptive context
Here’s an overview of how our context mapper works:
- The system starts with a given file’s content, as part of a predefined Jinja template that we have empirically found to work well for the language and specific task at hand
- When the system encounters an unmapped entity — like an imported module — it can issue commands to explore the repository.
- We allow the system access to two commands (for now)
- >
ls <directory>
to list the assets within a folder - >
cat <file>
to read the content of a file
- >
- Our architecture allows extending this to any system command that aids context building, such as
grep
,tail
, orstat
. - We give our algorithm a budget of N queries per asset, which forces it to be selective.
- Once the agent gathers enough context or exhausts its queries, it prunes unnecessary information that adds little value to the programmatic context.
- > The system uses this captured context to generate meaningful documentation for the file it started off with in Step 1.
- Repeat this across the repository until all assets are mapped and the codebase is exhaustively explored.
The adaptive context thus produced means our system no longer blindly retrieves files based on fixed rules. Instead, it autonomously determines and requests only what’s actually needed. This makes the documentation engine efficient, targeted, and scalable.

Just like a car’s cruise control requires periodic driver intervention to stay on track, our system keeps the agent guided within well-defined constraints, preventing over-exploration while maximizing relevance.
We then use this abstraction to build out rich wikis with best-in-class insights that meaningfully help our users work with unfamiliar codebases.
Uncovering a codebase’s hidden topography
Beyond significantly enhancing the value of insights offered by our wikis, this technique unlocked an unexpected benefit — a new way to visualize a codebase, one that exposes hidden structural relationships and interdependencies within a repository.
By logging relationships for each relevant node, the system builds a structured blueprint of the repository’s adaptive context.
- Each node in the graph represents a repository asset, and directed edges indicate that the agent pulled in a file, the source, while attempting to document another, the sink.
- Large nodes correspond to higher inbound relations, and these tend to denote higher implementation complexity.
- The more red a node, the more crucial it is to understanding the codebase. These correspond to a greater number of connected sinks, with several neighboring files pulling it in to build their context.
- Clusters emerge naturally around distinct functionalities in the terrain.
- Tests often form self-contained clusters that are weakly linked to the central structure. Build scripts and docs also tend to group together.

- Being a force-directed visualization, the closer a file is spatially to the center of the graph, the more clusters it influences.
- For example, crates/uv/Cargo.toml is referenced by more than 50 files in the
uv
project.
- For example, crates/uv/Cargo.toml is referenced by more than 50 files in the
- Superstructures begin to appear for very large production grade repos, like the
transformers
library shown below.
This topographical visualization offers developers a new perspective on their codebases, clearly revealing critical dependencies and areas for improvement.
Patterns in these graphs reveal deeper insights into code quality and maintainability, and we’ve been hard at work to integrate them into our documentation engine. Expect to see sharper practical recommendations included in your Komment wikis soon 🚀

Practical recommendations for your codebase
Adaptive context is now foundational to how Komment uncovers valuable insights about a project and builds highly contextual wikis. Newer releases of Komment will also incorporate suggestions on how users can meaningfully improve a given codebase.
For example, a large red node on the context graph (i.e., a complex file with high relevance in the repo) is a strong candidate for refactoring into smaller, modular components to enhance maintainability.
The separation of a test cluster from the central structure reveals the robustness of the testing suite much better than conventional code coverage metrics.
Adaptive context shows how well tests align with the logic they’re meant to validate; not just check whether certain code was executed.
And small clusters that are completely detached from the central graph often indicate deprecated assets that can be removed to keep the codebase clean and maintainable. Similar recommendations could apply to security and scalability, highlighting areas for improvement in code structure and resilience.
Most immediately, though, we plan to open-source the library as a standalone tool (stay tuned for a release!) and allow developers to play around with these adaptive context graphs.
How would access to such actionable insights affect your team’s workflow? Tell us what you think!
🔍 Browse our public wikis to see how teams use Komment to gain valuable insights about their codebases. Or build a wiki for your favorite project today — get started here!