What we learned, what we measured, and where we’re going — a transparent look at how we built an agentic AI platform for one of the most demanding professional domains.
1. Why We Started Here
The engineering and construction sector sits at an uncomfortable intersection: highly regulated, deeply document-intensive, and legally liable for every number that ends up in a technical submission. Eurocodes run to thousands of pages. Technical approvals cross-reference tables, graphics, and annexes in ways that defeat simple search. Bills of quantities require tolerance-based matching against product catalogues. Architectural plans require geometric reasoning to extract quantities accurately.
When we looked at the AI tooling available for this domain, we found the same answer everywhere: general-purpose retrieval on top of a cloud language model. Fast to demo. Unreliable in production. And architecturally incompatible with the one requirement every engineering firm shares.
"Their data cannot leave their network. That is not a preference — it is a professional and legal requirement. That gap is what Norma by Lumen-IT is built to close."
2. What Sovereign AI Means in Practice
Sovereign AI is an architectural commitment, not a feature. Every design decision flows from four non-negotiables:
No query, no document fragment, no intermediate result leaves the customer's infrastructure. Enforced at the network level, not just by policy.
Every component can be audited, replaced, or extended by the customer's own team. If Lumen-IT ceased to exist tomorrow, the system keeps running.
Pre-trained models are deployed into each customer instance. Customer documents shape retrieval — they never shape model weights.
No proprietary API with usage pricing that compounds over time. No migration cost. No dependency on a third party's roadmap decisions.
For a firm handling confidential tender documents, proprietary product specifications, or client structural assessments, these are not limitations — they are the only acceptable architecture.
3. Why Single-Turn Q&A Fails This Domain
Early in development we validated a hypothesis quickly: a standard retrieve-and-answer pattern produces plausible-sounding outputs that fail on precision. An engineer asking about shear resistance under Eurocode 2 doesn't need a plausible paragraph — they need the correct formula, the correct variables, the correct national annex values, and a citation they can verify. A plausible paragraph with a wrong coefficient is worse than no answer.
We built a supervised agentic pipeline where each step in an engineering workflow is handled by a specialized agent — and where no output reaches the engineer without mandatory human sign-off:
Before any output reaches the human review step, an automated quality gate evaluates the answer for consistency and completeness. Only outputs that pass proceed. Generic AI tools have no equivalent step — answers go directly from model to user with no verification layer.
Every approved output carries a complete log: which sources were cited with document name and page number, which engineer approved it, and when. Exportable for inclusion in technical submissions.
4. How We Measure 95% Accuracy
Accuracy claims without methodology are marketing. Our April 2026 benchmark ran 75 questions across 5 structured test rounds, evaluated manually by domain experts against ground-truth answers. The test covered Eurocode Q&A, formula retrieval, LV matching, and deliberate out-of-scope traps designed to trigger hallucination. Each answer was rated correct, partially correct, or incorrect — with failure root causes documented per round and addressed as fixes.
The knowledge layer was tested across 75 questions over 5 test rounds in April 2026 — manually evaluated by domain experts. After two structured fix cycles, the system reached 97% accuracy. Hallucination protection scored 100% — every out-of-scope question was correctly refused, with zero fabricated answers. We publish the methodology because the number is only as credible as the method behind it.
5. The Hardware Reality
The most common objection to on-premise AI is hardware cost — and it is based on a category error. The assumption is that on-premise AI requires the same compute as a frontier cloud model. It doesn't.
We run on Small Language Models combined with a structured domain knowledge base that encodes all relevant data, document relationships, and norm cross-references. The model doesn't need to "know" Eurocode — it reasons over verified, structured content. This delivers domain accuracy at a fraction of the compute footprint of large general-purpose models. Domain-specific fine-tuning is the logical next step — already on our roadmap.
A mid-range workstation GPU. Not a data centre. Not an H100 cluster. Not cloud credits that compound every month. A one-time hardware investment — with typical break-even against cloud API pricing under 12 months for firms running compliance workflows daily.
6. What We've Proven in Live Deployments
Two customer deployments are live as of mid-2026. Different sectors, different workflow profiles — both validating the core architecture.
Technical approval analysis (cross-referencing Zulassung tables against production conditions), LV specification matching against product catalogue using tolerance logic, and quantity extraction from architectural plans with automatic opening deductions.
● Live — ProductionEurocode Q&A with formula rendering and inline diagram extraction. Source citations to document name, section, and page number. Norm version tracking — every answer flags which version of the standard was used.
● Live — ProductionIn both cases the primary value is not speed — it is reproducibility and traceability. The same question asked twice produces the same answer with the same sources. Every output can be reviewed, challenged, and verified.
7. What We're Building Next
We are direct about what is live and what is planned. Customers making infrastructure decisions deserve accurate information.
| Capability | Description | Status |
|---|---|---|
| Agentic Pipeline | Supervised multi-agent workflow with reflection guard and human-in-the-loop approval | ● Live |
| Structured Knowledge Retrieval | Domain knowledge base with cross-reference reasoning across Eurocodes, approvals and technical documents | ● Live |
| LV Matching | Deterministic tolerance logic for matching LV positions against product catalogues | ● Live |
| Plan Takeoff | Quantity extraction from architectural plans with geometric deduction of openings | ● Live |
| Audit Trail | Full output log with engineer sign-off, timestamps, and PDF export | ● Live |
| NVI Engine | Eliminates hallucinated norm values from compliance-critical workflows | ◎ Planned — Q3 2026 |
| Norm Versioning | Automatic flagging when answers reference superseded norm versions, with update impact assessment | ◎ Planned |
| Multi-Doc Search | Unified search across Eurocodes, NDT standards, inspection reports, and BIM data | ◎ Planned |
The NVI engine is the capability we are most often asked about. When live, it will solve the single biggest remaining risk in AI-assisted engineering compliance. We begin development Q3 2026.
Do You Have a Use Case Close to This?
We are actively engaged with engineering and construction firms exploring sovereign AI for their specific workflows. If your team handles norm compliance, approval analysis, LV processing, or plan quantity takeoff — let's talk. A focused POC is scoped in one call and running in two to three weeks.
Start a Conversation Contact Us

