MarkdownIndexer: vendorable C# markdown-to-tree parser inspired by PageIndex

A quick note about a tool I built this week: MarkdownIndexer — a C# markdown indexing library that you vendor by copying two .cs files into your project. Zero NuGet dependencies (in the vendored files), pure BCL.

The idea

PageIndex popularized vectorless RAG — instead of chunking documents and searching via embeddings, you build a hierarchical tree (like a table of contents with summaries) and let an LLM reason its way to the right section. It achieved 98.7% on FinanceBench vs ~30-50% for vector RAG on complex documents.

PageIndex is Python. I ported the markdown path to C#.

How it works

.md file → extract headers → slice text per section → build tree → assign IDs → JSON
                                       │
                                  [optional: thin small subtrees]
                                  [optional: LLM summarizes each node]

Output is a structured tree where every node has a title, node_id, and either summary (leaf) or prefix_summary (branch signpost). The downstream LLM reads the compact tree (no text — just summaries), selects relevant node_ids, fetches only those sections, and generates the answer.

What’s interesting

The two vendored files accept all external concerns as Func<> delegates — token counting, LLM calls — so the caller uses whatever libraries they already have. The files themselves are pure algorithm.

If you work with .NET and deal with long structured documents (docs, wikis, specs), take a look: github.com/ypyl/MarkdownIndexer.