Operations2026-06-25 ·

The AI Risk AEC Firms Aren't Talking About: The Data You Feed It

Everyone in AEC is worried about AI hallucination. The bigger risk is the data you're feeding it without thinking.

The hallucination conversation gets airtime because it's visible. The model says something wrong, an engineer notices, the firm has a story to tell at the next AI panel. Embarrassing but contained.

The data-feed risk is the opposite. It's invisible. Nothing breaks. The output looks reasonable. And the firm slowly trains itself, its decisions, and its proposals on data nobody validated.

Why the hallucination conversation is the wrong conversation

Hallucination is a model problem. It happens when the model generates text that sounds plausible but isn't grounded in the input. It's visible because a competent reviewer catches it — the model said the span was 40m when the drawing shows 60m, the model cited a code clause that doesn't exist, the model attributed a quote to the wrong party.

These failures are frustrating. They're also reasonably manageable. You build a review step. You don't use AI for the clause-number-sensitive work. You cross-check model outputs against source documents before they leave the firm.

The data-feed problem is different in kind, not just degree.

When you point an AI tool at firm data — past proposals, meeting transcripts, contract templates, project reports — you're not just asking it to generate text. You're anchoring its outputs to a corpus that represents your firm's accumulated work history. If that corpus is clean, current, and representative of how you want to work, the outputs will be calibrated well. If the corpus is stale, mislabelled, or full of work you've since stopped doing, the outputs will be calibrated to the wrong thing — and nothing will obviously break.

Three patterns that have caused real damage

A firm fed five years of proposals into a corpus to "learn from past wins." Half were unsuccessful bids the firm had stopped doing. The model started generating proposals in the style of work the firm no longer did profitably.

The firm's intent was reasonable: use past proposals to shape future ones. But the dataset included both wins and losses, and both work the firm still did and work it had abandoned as unprofitable. The model had no way to distinguish these. It treated everything in the corpus as signal. Proposals started coming out with methodology language and pricing structure that reflected the old, lower-margin work — and the firm's win rate on the new, higher-margin work it was trying to pursue dropped.

Nobody noticed for two submission cycles. The proposals "looked fine." The problem only surfaced when a senior director read a proposal draft and noticed it was framed for a project type the firm had explicitly decided to exit 18 months earlier.

A firm asked an AI tool to summarise a long client meeting from a transcript. The transcript had three speakers misattributed. The summary baked the misattributions into a follow-up email that went to the client.

Transcription tools misattribute speakers regularly — particularly in construction site meetings where multiple people speak quickly, over background noise, with similar accents. The AI tool that generated the summary had no way to know the transcript was wrong. It processed the text as given.

The follow-up email attributed a commitment to a client representative who hadn't made it. The client replied to correct the record. Minor, recoverable — but the kind of incident that erodes trust in a relationship that depends on accurate documentation.

A firm uploaded its standard contract template to a contract-review tool. The template hadn't been updated since 2024 and contained a liability clause the firm's insurer no longer accepted. The tool happily benchmarked new contracts against it.

The contract-review tool was doing exactly what it was told: identify deviations from the firm's standard template and flag anything that looked like a concession. It had no way to know the template itself was out of date. Every new contract the firm reviewed was benchmarked against a liability standard the firm's insurer had explicitly rejected.

In each case, the model didn't hallucinate. It did exactly what it was asked to do. The data was the problem.

Why this risk is harder to manage than hallucination

Hallucination is visible because it's usually wrong in specific, checkable ways. The AI said 40m; the drawing says 60m. A reviewer catches it.

The data-feed problem produces outputs that are internally consistent and match the corpus. A reviewer looking at the output for obvious errors won't find them — because there are none. The output is a faithful reflection of flawed inputs. The firm has to audit the corpus itself to find the problem, and that audit requires someone who knows what the firm's data should look like versus what it actually contains.

Most firms don't have that person. Most firms have someone who manages the software and someone who manages the projects — but nobody whose job it is to ask "is the data we're training our AI tools on still representative of how we want to work?"

The fix isn't more sophisticated AI

The fix is the unglamorous discipline of curating what goes in.

Three questions worth asking before you point any AI tool at firm data:

Has anyone reviewed this dataset in the last 12 months for things that should no longer be in it?

Data accumulates. Old project reports, superseded templates, cancelled bids, work from a service line the firm has since exited — all of it ends up in the same folder structure. AI tools don't distinguish between current and historical. Unless someone actively curates, the corpus grows and the signal degrades.

Is what's in here representative of the work we want more of, or representative of work we used to do?

This is the question the proposal corpus case gets at. If your five years of past proposals includes two years of work you've since repositioned away from, and you're using that corpus to generate new proposals, you're training on the wrong target. The dataset needs to reflect your current positioning, not your historical average.

If a junior engineer learned only from this dataset, would they make decisions you'd back?

This is the most useful calibration question. An AI tool anchored to your firm data will develop something like a judgment — a sense of what's normal, what's acceptable, what's risky. If the answer to this question is "no" or "I don't know," the dataset isn't ready.

Who wins this AI cycle

The firms that win this AI cycle won't be the ones with the cleverest tools. They'll be the ones who curated their inputs before they automated their outputs.

This is unglamorous work. It doesn't make a good conference presentation. There's no product demo for "we reviewed and cleaned five years of proposal data before connecting it to an AI tool." But it's the work that separates firms whose AI outputs improve over time from firms whose AI outputs drift toward their worst historical average.

What's the one dataset at your firm that's powering decisions and hasn't been audited in the last year?

I write about AI for engineering and construction firms weekly: → Full breakdown: https://sigmametrix.net/insights/ai-risk-data-you-feed → Free AI Readiness Audit (7 questions → your 2-page playbook): https://sigmametrix.net/audit → Newsletter for AEC firm directors: https://sigmametrix.kit.com/8686be4583