Building Sovereign AI for Engineering & Construction

Building Sovereign AI for Engineering & Construction

What we learned, what we measured, and where we’re going — a transparent look at how we built an agentic AI platform for one of the most demanding professional domains.

Sovereign AI for Engineering & Construction — Lumen-IT Blog

1. Why We Started Here

The engineering and construction sector sits at an uncomfortable intersection: highly regulated, deeply document-intensive, and legally liable for every number that ends up in a technical submission. Eurocodes run to thousands of pages. Technical approvals cross-reference tables, graphics, and annexes in ways that defeat simple search. Bills of quantities require tolerance-based matching against product catalogues. Architectural plans require geometric reasoning to extract quantities accurately.

When we looked at the AI tooling available for this domain, we found the same answer everywhere: general-purpose retrieval on top of a cloud language model. Fast to demo. Unreliable in production. And architecturally incompatible with the one requirement every engineering firm shares.

"Their data cannot leave their network. That is not a preference — it is a professional and legal requirement. That gap is what Norma by Lumen-IT is built to close."

2. What Sovereign AI Means in Practice

Sovereign AI is an architectural commitment, not a feature. Every design decision flows from four non-negotiables:

All inference runs locally.

No query, no document fragment, no intermediate result leaves the customer's infrastructure. Enforced at the network level, not just by policy.

Open-source foundation throughout.

Every component can be audited, replaced, or extended by the customer's own team. If Lumen-IT ceased to exist tomorrow, the system keeps running.

No model trained on customer data.

Pre-trained models are deployed into each customer instance. Customer documents shape retrieval — they never shape model weights.

No vendor lock-in.

No proprietary API with usage pricing that compounds over time. No migration cost. No dependency on a third party's roadmap decisions.

For a firm handling confidential tender documents, proprietary product specifications, or client structural assessments, these are not limitations — they are the only acceptable architecture.

3. Why Single-Turn Q&A Fails This Domain

Early in development we validated a hypothesis quickly: a standard retrieve-and-answer pattern produces plausible-sounding outputs that fail on precision. An engineer asking about shear resistance under Eurocode 2 doesn't need a plausible paragraph — they need the correct formula, the correct variables, the correct national annex values, and a citation they can verify. A plausible paragraph with a wrong coefficient is worse than no answer.

The Agentic Pipeline

We built a supervised agentic pipeline where each step in an engineering workflow is handled by a specialized agent — and where no output reaches the engineer without mandatory human sign-off:

📄Document Extraction
🔍Retrieval & Matching
Verification
🔁Reflection Guard
👷Human Approval
📋Report & Audit Log
Reflection Guard

Before any output reaches the human review step, an automated quality gate evaluates the answer for consistency and completeness. Only outputs that pass proceed. Generic AI tools have no equivalent step — answers go directly from model to user with no verification layer.

Full Audit Trail

Every approved output carries a complete log: which sources were cited with document name and page number, which engineer approved it, and when. Exportable for inclusion in technical submissions.

4. How We Measure 95% Accuracy

Accuracy claims without methodology are marketing. Our April 2026 benchmark ran 75 questions across 5 structured test rounds, evaluated manually by domain experts against ground-truth answers. The test covered Eurocode Q&A, formula retrieval, LV matching, and deliberate out-of-scope traps designed to trigger hallucination. Each answer was rated correct, partially correct, or incorrect — with failure root causes documented per round and addressed as fixes.

75
Test Questions
5 structured rounds, domain-specific, expert-evaluated
97%
Final Accuracy
After structured fix cycles — April 2026
100%
Hallucination Guard
10/10 out-of-scope questions refused — zero fabricated answers

The knowledge layer was tested across 75 questions over 5 test rounds in April 2026 — manually evaluated by domain experts. After two structured fix cycles, the system reached 97% accuracy. Hallucination protection scored 100% — every out-of-scope question was correctly refused, with zero fabricated answers. We publish the methodology because the number is only as credible as the method behind it.

97%
Knowledge layer accuracy — 75 questions, 5 rounds, April 2026
100%
Hallucination protection — zero fabricated answers on out-of-scope questions
100%
Audit trail on every approved output

5. The Hardware Reality

The most common objection to on-premise AI is hardware cost — and it is based on a category error. The assumption is that on-premise AI requires the same compute as a frontier cloud model. It doesn't.

Small Language Models — domain-grounded

We run on Small Language Models combined with a structured domain knowledge base that encodes all relevant data, document relationships, and norm cross-references. The model doesn't need to "know" Eurocode — it reasons over verified, structured content. This delivers domain accuracy at a fraction of the compute footprint of large general-purpose models. Domain-specific fine-tuning is the logical next step — already on our roadmap.

What you actually need

A mid-range workstation GPU. Not a data centre. Not an H100 cluster. Not cloud credits that compound every month. A one-time hardware investment — with typical break-even against cloud API pricing under 12 months for firms running compliance workflows daily.

6. What We've Proven in Live Deployments

Two customer deployments are live as of mid-2026. Different sectors, different workflow profiles — both validating the core architecture.

Deployment 01
Building Materials Manufacturer

Technical approval analysis (cross-referencing Zulassung tables against production conditions), LV specification matching against product catalogue using tolerance logic, and quantity extraction from architectural plans with automatic opening deductions.

● Live — Production
Deployment 02
Structural Diagnostics Firm

Eurocode Q&A with formula rendering and inline diagram extraction. Source citations to document name, section, and page number. Norm version tracking — every answer flags which version of the standard was used.

● Live — Production

In both cases the primary value is not speed — it is reproducibility and traceability. The same question asked twice produces the same answer with the same sources. Every output can be reviewed, challenged, and verified.

7. What We're Building Next

We are direct about what is live and what is planned. Customers making infrastructure decisions deserve accurate information.

Capability Description Status
Agentic Pipeline Supervised multi-agent workflow with reflection guard and human-in-the-loop approval ● Live
Structured Knowledge Retrieval Domain knowledge base with cross-reference reasoning across Eurocodes, approvals and technical documents ● Live
LV Matching Deterministic tolerance logic for matching LV positions against product catalogues ● Live
Plan Takeoff Quantity extraction from architectural plans with geometric deduction of openings ● Live
Audit Trail Full output log with engineer sign-off, timestamps, and PDF export ● Live
NVI Engine Eliminates hallucinated norm values from compliance-critical workflows ◎ Planned — Q3 2026
Norm Versioning Automatic flagging when answers reference superseded norm versions, with update impact assessment ◎ Planned
Multi-Doc Search Unified search across Eurocodes, NDT standards, inspection reports, and BIM data ◎ Planned

The NVI engine is the capability we are most often asked about. When live, it will solve the single biggest remaining risk in AI-assisted engineering compliance. We begin development Q3 2026.

Do You Have a Use Case Close to This?

We are actively engaged with engineering and construction firms exploring sovereign AI for their specific workflows. If your team handles norm compliance, approval analysis, LV processing, or plan quantity takeoff — let's talk. A focused POC is scoped in one call and running in two to three weeks.

Start a Conversation Contact Us
Hi, How may I help

Download the Lumen-IT whitepaper to explore groundbreaking GenAI applications and insights.

Share the Post:

Related Posts

Scroll to Top