Educational
Bond data is messy by nature—not because systems are broken, but because the source material itself is inconsistent, varied, and complex
Oct 14, 2025 @ London by Natasha Salmi
At ClimateAligned, we're tackling data quality with unprecedented transparency and human expertise. Learn how we handle the inherent challenges of bond data through AI-assisted tools, expert judgment, and complete transparency.
I recently sat down with Leo Browning, our ML expert and senior engineer, to discuss something most data providers don't talk about: what happens when the data isn't perfect.
The reality is that bond data quality challenges aren't primarily about system failures—they stem from the source material itself being inconsistent, varied, and complex. At ClimateAligned, we're taking a different approach: using AI-assisted tools to efficiently identify discrepancies, applying human expertise to make judgment calls, and providing unprecedented transparency so you can trace our reasoning for every data point.
Here's the fundamental challenge: we're creating semi-structured data from unstructured, complex, and highly varied sources.
Think about it: bond documentation comes in every format imaginable. Japanese bonds might report in yen while their placement amounts are in USD. Municipal bonds might aggregate reporting across multiple series. French documents need translation. Pre-issuance frameworks use different terminology than post-issuance reports. And that's just scratching the surface.
"The edge cases are inherent to the data," Leo explained. "Even if you had a perfect system, you'd still have these challenges. It's not about the AI being imperfect—though it is—it's that the source material itself is inconsistent."
When we standardise this information into a consistent format—so you can actually analyse thousands of bonds together—discrepancies emerge. Not because something went wrong, but because standardisation reveals the underlying inconsistencies that were always there in the source documents.
The challenges we encounter aren't always straightforward errors—they're often symptoms of underlying source data inconsistencies.
Take currency mismatches as a simple example. If we were converting currencies incorrectly, that would be purely an error on our part. But if our data providers occasionally provides the wrong currency alongside their numbers, that's a source data inconsistency. When you move into actual bond documents, things get even more complex: you might see bonds with placement amounts in one currency, but all their reporting is done in another currency.
Leo said, "Japanese bonds will be issued in USD, but the reporting is all in yen. This is not an error—there's nothing wrong about that. It's just inconsistent because not everybody does that. And more than that, you don't know who doesn't do that."
This variability means you're often looking for symptoms of data issues rather than the data errors themselves.
For instance, our biggest current challenge: allocation vs. placement mismatches. We extract allocation data from post-issuance reports, but it differs from the placement information on the bond. Sometimes significantly—we currently have about 600 cases with greater than 10x differences.
Why does this happen? It could be:
"We had a case where an entity was reporting at a very aggregate level," Leo explained, "so all these bonds have really small placement amounts, but they report everything under this aggregate umbrella. In a case like that, you might be able to do a semi-systematic correction—maybe fix 10 or 15 at once if you make a judgment call."
Each mismatch requires investigation. Sometimes our team members will spend 25 minutes hunting through documentation trying to find what actually happened for a single bond. "And sometimes," Leo said, "there's just nothing. No documentation. So then what do you do?"
Most data quality work can't be fixed systematically. If you could write a rule to catch and fix an error type in bulk, it would be relatively straightforward (though finding the pattern is the hard part). But that's not what we're dealing with.
"AI produces data that often isn't very systematic," Leo noted, "but AI tools also let us make semi-systematic corrections. Maybe not one-off, maybe not 100 at a time based on a rule—something in the middle."
Our workflow looks like this:
That last part is crucial. Anyone who's worked with bond data manually knows what happens without documentation: you spend hours marking data points, making judgment calls, and six months later when someone asks why you classified something a certain way, you honestly can't remember—you processed 200 of those that day.
That last part is crucial. When our AI extracts data, it stores its reasoning. When we make corrections, we adjust the reasoning. This means when you see a number in our product, you can click through to understand where it came from and what judgment calls were made.
"If an analyst were doing this by hand, they wouldn't have the time or energy to write out the reasoning behind every decision," Leo said. "But we have it built into our system, which makes identifying and correcting errors much more efficient."
We're realistic about resource allocation. We can't manually verify every single bond, and we don't need to.
High priority: Major bonds that everyone holds need to be airtight. These get extra scrutiny even if nothing looks obviously wrong.
Medium priority: Systematic patterns and discrepancies that we can catch with our tools. We investigate these regularly.
Lower priority: Smaller, less-held bonds where the data is broadly correct. We're still responsive when users report issues, but we don't proactively audit every detail.
"It's a hybrid approach," Leo explained. "Find broad systematic problems, check those, manually fix them. Look at the most important bonds. And stay very receptive when people tell us something's wrong."
Here's what makes our approach different: we show you our work.
Traditional data providers operate as black boxes. If you find an error, you report it and wait months for a correction—if you get one at all. You have no visibility into how the data was created or what assumptions were made.
We've built transparency into every level:
"The semi-structured nature of our data is actually a strength in edge cases," Leo said. "It gives us the flexibility to go back, look at the reasoning, and make nuanced corrections. You're not locked into rigid rules that break down when reality gets messy."
Bond data quality is challenging—and anyone who tells you otherwise hasn't spent enough time in the documents. The source material is inconsistent. The reporting varies wildly. Edge cases are inevitable.
What matters is having a process that can:
We're not claiming perfection. We're claiming transparency, rapid iteration, and a team that actually understands the documents we're processing.
"This is an ongoing process," Leo told me. "But I think that's the point. It's not an achievement blog post—it's about how we think about data quality as a continuous practice."
If you've ever spent hours manually correcting bond data, or waited months for a provider to fix an obvious error, or wondered why a number looked suspicious but had no way to investigate—we see you. And we're building something different.