Meet Kevin: Building AI into sustainability reporting at Watershed

Kevin leads the engineering team building Flexible Reporting, Watershed's next-generation reporting platform—and sits at the intersection of fast-moving sustainability regulation, enterprise-grade data infrastructure, and LLMs. We sat down to talk about the technical challenges of making AI work in high-stakes regulatory contexts, what it's like to straddle the individual contributor/engineering manager line, and why he's learning the accordion.

We're hiring on Kevin’s team in London—learn more about our open roles:

Before Watershed, you were at Meta. What made you decide to make the move to climate tech?

I'd actually heard about Watershed when I first joined Meta—it was a small, young company and I thought it was really great. But I wanted to develop my skills first, and then apply them somewhere I felt passionate about.

I think a combination of things made it feel like the right moment. There are always reminders that climate change is a crisis. The wildfires for example—I just felt like I'd developed enough elsewhere and now wanted to go apply it to something that genuinely mattered. Watershed's mission was really attractive, and so was the startup energy and the very high technical bar.

That high technical bar makes sense for a software engineer. But your background is actually in physics—you studied at Harvard and competed at the International Physics Olympiad (IPhO). Does any of that still show up in how you work?

I think physics gives you a really systematic, structural way of solving problems. Computer science teaches you the same thing, but doing physics at that level meant using a lot of those same skills at a pretty intense level.

IPhO poses problems where usually all you need are elementary equations—but it can still be very hard. And Watershed engineering is kind of similar. We're not always doing large-scale algorithm questions, and that's not the point. A lot of it is plumbing data from one shape to another; but the real challenge is in really understanding customer needs. It's not always at the cutting edge technically, but the domain is interesting, and the work is genuinely exciting and so important.

Speaking of customer needs—your team sits at the intersection of fast-moving regulations like CSRD, CDP, IFRS S2, and SBTi, and user-facing product. How do you build something flexible when the rules themselves keep changing?

I think the best approach is similar to what makes great software in general: you get deep in understanding the customer domain and experience. You internalize the data model and the mental models you need for as many frameworks as you can, and then you can build a flexible foundation. One of the things we've been able to unlock is collaborative document editing and a requirements data collection system—high-level product services that can be used flexibly by any customer. Flexibility is always in the foundation; opinionated guardrails help people along the golden path, but the flexibility is always there underneath.

That foundation is what you're rebuilding now with Flexible Reporting. Why start from scratch rather than iterate on what was there?

We've learned a lot more about how customers actually build reports since the last time. And I think we've learned what customers care about in a climate product with enterprise-grade assurance, like versioning and auditability. That is really hard to retrofit. The decision to build from the ground up was so we could bake that versioning and those guarantees in at every level. We think that's a real moat for Watershed, because it's so hard to do well if you try to add it later.

And as you're building that foundation, where does AI actually fit in? Where does it genuinely help with reporting, and where have you found it to be noisy?

The pipe dream is for AI to just write a report for you. People produce so much written content about their sustainability goals and progress, and ideally you could just feed it in and get a full report out. We're still pretty far from that, and I think anyone claiming otherwise is glossing over a lot of subtle ways it can go wrong—and for a high-stakes investor- and regulator-facing document, that's not something we think our users should have to deal with.

So right now we're building up from the atomic jobs: getting people started with a draft answer based on previous documents, giving suggestions and helping iterate against document requirements, using it as a checklist and judge. We've found it particularly helpful for framework-specific things—like helping with CDP scoring, or helping calibrate tone for what a specific regulatory report expects in terms of over- versus under-disclosure. Over time, as we hone these and models get more powerful, we'll be able to layer it up toward that one-click report generator dream.

That non-determinism question seems especially thorny in this context. Sustainability reporting sits on top of a lot of structured data—emissions figures, regulatory frameworks with specific schemas. How does that shape how you think about using LLMs?

The job of our platform is to give a clean, structured, and—critically—auditable path for data to flow through the system. Starting with activity and emissions data, through the regulatory frameworks, out the other side with full traceability to the original source. LLMs can help at a lot of those steps: writing the right queries, pulling the right data out of frameworks, and interpreting custom shareholder reports or supplier requests. But without guardrails, they'll produce hallucinations. The platform's job is to give LLMs a safe place to do their work—so that whatever they produce has the same guarantees as if a human had done it.

For someone who's a strong AI engineer and could work anywhere right now, why is sustainability reporting an interesting technical problem to work on?

I think sustainability is a place where you can really push the limits of what LLMs are actually good at—text, reasoning, iteration. One of the biggest time savings we're going after is the ability to take a completed report and extract and adapt its content into a different reporting framework. Companies have to file so many reports, and there's so much overlap. The technical challenge is building a data pipeline that can transform that polished output reliably and auditability at every step, so the final result is still something the user can trust. Building in citations, building in determinism into an inherently non-deterministic system—that's the key challenge. And then there's the mission piece: this is a place where taking your talents will genuinely help the world.

On that note—what's something about working at Watershed you wouldn't have understood until you actually started here?

I think it's the nature of the customers. Going in, you might expect to be fighting “the man” a little—that customers are reluctantly doing climate work, or trying to greenwash and needing to be held back. But the customers are genuinely passionate and excited. They hold themselves to a high bar and really want to work with us together to hit it. We're fortunate to work with people who are mission-aligned in the same way we are.

You're leading that team now as an engineering manager. What's surprised you most about how being a strong individual contributor made you a better manager, or vice versa?

One thing I think a lot of people come to realize is that to get big things done, you need a team—and to make that team work, you need to be thinking a lot about dependencies and collaboration. That's true even when you're writing code and drafting technical designs. The mode shifts a bit when you become a manager, but those instincts carry over.

And as a manager, you're in the rooms for a lot more cross-functional conversations—understanding what CSMs need, what AEs need. That gives you a better intuition for how to build in a way that's actually effective for the whole team, not just the engineering side.

Okay, shifting gears. You moved to London in 2023—what's been the biggest adjustment?

There's probably some joke answer about getting to go to the pub more often, which is genuinely great. But personally, I think there's just so much easily accessible art, culture, and green space. I was remote before, so working in the London office has been a big shift—there's a really tight-knit culture, and the in-person vibe is something I've really enjoyed. And being closer to our European customers, who have some of the most complex regulatory requirements, has given me a lot of exposure to really interesting use cases.

And once you were settled in London, you apparently took up Irish dancing. You were on the ballroom team at Harvard—how does the transition compare?

Irish dance and standard ballroom have one thing in common—neither really knows what to do with the arms. But Irish dance has pushed me into Irish trad music and culture more broadly, and I'm now learning the accordion.

Between the dancing and the accordion, you clearly have varied interests. Your gaming taste is also pretty eclectic—if you had to psychoanalyze yourself based on what you play, what would you conclude?

Probably something about being a communist roguelike puzzle fan. Disco Elysium with the communism route, Hades, Myst, Riven, a lot of Nancy Drew games. There's definitely a throughline of games about corporate culture and economics, and puzzle games that reward thinking carefully about systems. I've actually been working on a puzzle game design of my own!

Last question—what custom Watershed Slack emoji best represents you, and what do you use it for?

The "yay" red fox going up and down doing its little dance. It's generally celebratory—there's a lot of good work happening and things being shipped, and positivity matters. Climate optimism is everything.

Kevin's work reflects what it looks like to build at the frontier of two fast-moving fields at once: AI that actually has to be right, and climate regulation that isn't slowing down.

We're hiring on Kevin's team. If you're based in London and you want to push the limits of what LLMs can do in high-stakes, real-world contexts, we'd love to hear from you.