Language & Vision Model Training:
Official Statements Digitized For Rigorous Training of Language Model (LMs), Vision (VLMs), & Optical Character Recognition (OCR).
Utilized for creating custom fine-tuning table extraction with higher financial benchmarks for Machine Ready Markdown, JSON, XML or HTML Conversion.
Official Statement Maps, Boundaries, and Project Photos Utilized to Training Matching
Place-Based Indictors for Geocoding &
Material Risk Analysis.

COMPLETED:
California Debt and Investment Advisory Commission (CDIAC)
MuniLM and our vision & language models have completed ingested the entire history of municipal bond activity in the State of California from 1984 to current, leading to over 30,000 PDF Official Statments fully digitzed, processing over 6.2 million pages in raw intelligence.

Next Steps: State Coverage Expansion Down to the Special District & Parcel
MuniLM will grow is knowledge base, datalakes, and fine-tuned model training by expanding to Texas, Colorado, and Washington.
Where full state coverage is not available, MuniLM will also build on corpus where city and local official statements, municipal finance, and GIS tools can be leverage.
We also offer backend customization applications for government, firms, and research institutions.
High-Fidelity Extraction Unlocking Decades of Municipal Disclosures

Precise Digitization to Train Language & Vision Models for the Public Finance Domain
MuniLM achieves high-fidelity extraction by training our language (LMs), vision (VLMs), and optical character recognition (OCR) models strictly on municipal data. By processing over 6.2 million pages of raw intelligence from 1984 to the present , we are developing a proprietary pipeline that allows us to scale our programmatic extraction against the most complex, unstructured disclosures in the public finance domain.
This foundational dataset actively trains our domain-specific language models to natively understand the nuances of public finance. This creates a scalable architecture capable of instantly exporting future municipal disclosures into publication-ready XML, JSON, and Markdown.
By pairing our extraction methodology with intensive market datasets from the UChicago Center for Municipal Finance, we are training the definitive foundational model for the municipal bond market. MuniLM is evolving into a sovereign analytical engine capable of programmatic, multi-dimensional financial, economic, and environmental analysis. As we scale, we will democratize institutional research—unlocking the capacity to evaluate overlapping district liabilities and material risk down to the special purpose district and parcel levels.
Transform Municipal Research Through Natural Language Querying and Cross-Referencing

Granular & Intuitive Spatial-Financial Analytics
Our roadmap bridges the gap between raw data extraction and actionable market intelligence. We have mapped over 209,316 CUSIPs from California alone down to special districts, and integrated national market data on over 65,000 issuers into Congressional Districts.
Building our software to allow analyst conversation prompting, we bridge the gap between official statements, material risk, due diligence, and geospatial information. We develop vision and language models trained to extract hyper-local text disclosures alongside boundary maps and project photos, enabling the precise geocoding of municipal investments, housing, and infrastructure.
MuniLM allows analysts to instantly query hyper-specific, multi-sector use cases from Official Statements, including overlapping debt percentages and parcel-by-parcel assessed valuations, hospital and school operations, retail sales breakdowns, agricultural commodity rankings, investment pool portfolios, and 10-year macroeconomic labor shifts directly from historical source material.
Moving beyond our foundational California baseline of 30,000+ digitized disclosures , our extraction pipeline is actively expanding to target issuers in Texas, Colorado, and Washington. By leveraging over 700,000 extracted boundary maps and project photos , we are training our vision models to natively pair place-based text narratives with exact spatial polygons to advance our geocoding capabilities.
Process Preliminary Statements Privately, Running Locally On Your Own Device & Network




