Fabric Moves Your Data. But Who Checks the Math?

Microsoft Fabric is doing real work. It ingests data from dozens of sources, lands it in OneLake, and makes that data available to analytics and reporting tools. The orchestration works. The connectors work. Data movement, as a problem, is largely solved.

But moving data and validating data aren’t the same problem.

Mining and manufacturing companies we work with have spent months tuning their Fabric pipelines, only to discover they were scaling the wrong math. A plant’s OEE calculation was silently wrong. A cost allocation formula had been misapplied for years. Fabric moved the bad data faithfully and at scale across the enterprise. The pipeline was perfect. The numbers were broken.

That’s the gap between data plumbing and data modelling, and it’s the one most data architecture conversations skip entirely.

The Gap Between Moving Data and Understanding It

Data plumbing is infrastructure. Where is the data? How do we move it? Can we trust the pipeline? A Fabric pipeline that ingests sensor readings from manufacturing equipment every 60 seconds is doing plumbing. The data lands correctly, timestamps are preserved, and no rows are lost. That’s valuable, but it tells you nothing about whether your interpretation of that data is sound.

Data modelling is logic. What does this data mean? How should it be calculated? What aggregates are legitimate? Whether OEE should be calculated as (Good Units Produced / Maximum Possible Units) or decomposed as Availability × Performance × Quality is a modelling question, a business decision dressed as math.

Fabric’s semantic layer can encode both DAX formulas and expose them in Power BI. But it can’t enforce them. A report author can bypass the governed measure and drag raw fields into a calculation. A dashboard builder can average a ratio metric across sites rather than recalculate it from components. Nothing stops them.

There’s a deeper problem. The semantic layer can’t enforce domain rules that nobody has coded into DAX. It can’t say: “This facility has a 10-minute ramp-up phase each shift, don’t include that in OEE.” It can’t say: “When this specific pump appears in the dataset, that’s a known downtime category, not a failure.” Those rules live in your engineers’ heads. You can encode them in Fabric notebooks or Spark jobs, but they end up as code buried in a pipeline, not governed by rules that operations teams can audit or update.

The semantic layer handles statistical and aggregative calculations well: summing revenue, counting transactions, and averaging response times. It struggles where calculations are conditional, domain-specific, or built on unstated assumptions that your operations team knows but nobody has codified.

This is where operational intelligence platforms sit. They don’t replace Fabric’s plumbing or Power BI’s visualisation. They add a third layer:

  1. The visualisation layer (Power BI, Grafana, custom dashboards) displays numbers and trends.
  2. The data engineering layer (Fabric, Airflow, dbt) moves and transforms raw data.
  3. The operational intelligence layer (Capstone) validates, models, and governs the mathematical rules that define business metrics.

Capstone sits in that third layer. It owns the metric definitions: which aggregations are legitimate, which calculations are forbidden, and which transformations depend on the domain context. Its formula engine rejects calculations that violate defined aggregation rules. If someone tries to average OEE across sites instead of recalculating from components, the engine flags it. It doesn’t own the pipelines. It doesn’t own the dashboards. It owns the rules. When those rules are wrong, you know exactly where to look.

Concern Fabric Capstone
Data ingestion and movement Yes No
Schema harmonisation Yes No
Pipeline orchestration Yes No
Statistical aggregation (sums, counts, averages) Yes Yes, with validation constraints
Ratio metric aggregation (OEE, cost per unit, yield) Possible in DAX, not enforced Enforced by formula engine
Domain-specific business rules Manual coding in DAX Built into the model
Metric auditability and versioning Limited Full formula audit trail
Organisational hierarchy roll-ups Requires manual DAX modelling Native
Cross-system metric validation No Yes

“Can’t We Just Build This in dbt?”

You don’t need Capstone specifically to add a modelling layer. A competent data team can build validation rules in dbt (a widely-used SQL transformation framework), custom Python, or Fabric notebooks. If you’ve got data engineers on staff who also understand your manufacturing domain, you can build the equivalent yourself.

The question is whether that’s the best use of their time.

Domain-specific modelling logic (shift boundaries, equipment exclusions, ratio metric aggregation rules, multi-site hierarchy roll-ups) needs to be maintained by people who understand the operations, not just the code. When that logic lives in dbt models or Python scripts, every change requires a data engineer. When it lives in a purpose-built platform, operations teams can audit and update metric definitions without touching a pipeline.

Then there’s governance. dbt tests check data quality: nulls, ranges, freshness. They don’t check whether your cost allocation formula accounts for a new ERP cost category someone added last Tuesday. A modelling platform enforces business rule constraints continuously. A code-based approach relies on someone remembering to update the tests.

We’re not arguing that dbt or custom code can’t work. For organisations running multi-site operations with hundreds of calculated metrics, the maintenance cost of a code-based modelling layer tends to outstrip the cost of a purpose-built one within 12 to 18 months, across the deployments we’ve been involved in.

Integration Pattern 1: Fabric → Capstone

The most common pattern is one-directional: Fabric pipelines feed data into Capstone.

Fabric ingests raw data from operational systems (sensor streams, ERP events, equipment logs) and performs basic standardisation: schema harmonisation, null handling, timestamp alignment. It doesn’t try to calculate anything. That standardised data lands in OneLake, and Capstone ingests it. The modelling layer takes over from there: business rules are applied, calculations executed, metrics validated against constraints and domain logic.

Take a mining operation. Fabric delivers ore arrival times, processing duration, equipment codes, and operator IDs. Capstone knows the domain rules: which equipment codes indicate planned maintenance (exclude from availability), what constitutes a shift boundary, how to weight multi-equipment circuits. Capstone outputs planned_availability, effective_utilization, and cost_per_tonne_processed back to OneLake.

One thing to watch: Fabric’s refresh schedule and Capstone’s calculation window need to align. If Fabric lands a partial day’s data at 11 p.m. and Capstone calculates at midnight, you get incomplete metrics until the next cycle. You need to coordinate pipeline completion signals so Fabric triggers Capstone only after the full daily extract confirms. It’s solvable, but it’s a design decision you have to make explicitly. We’ve seen teams skip this and spend weeks debugging phantom data discrepancies.

Integration Pattern 2: Capstone → Fabric

Less common, but powerful: Capstone outputs enriched data back into Fabric.

Capstone generates not just metrics, but enriched datasets: events tagged with their root cause, production data labelled with quality outcomes, downtime events categorised by impact. These enriched tables get written back to OneLake as new data sources. Fabric can then use them as reference data for further enrichment or as trusted sources for downstream BI tools.

A practical example: a manufacturing plant has thousands of short downtime events logged as stop_code entries. In one facility we worked with, over 80% of them were noise. Capstone analyses the patterns (duration, frequency, equipment combination, shift timing) and classifies them as either “normal operation” (false alarms) or “genuine failures” requiring root-cause investigation. This classified downtime feed gets written back to OneLake. Fabric reports can now distinguish between noise and signal, and maintenance teams see only actionable events.

The edge case to watch is dependency loops. When Capstone writes enriched data back to OneLake, Fabric pipelines that consume OneLake tables may pick up Capstone’s output and re-process it. Design your pipeline to distinguish between raw source tables and Capstone-enriched tables, or you’ll process the same data twice and wonder why your numbers doubled.

Which Problems Are You Solving?

Your problem The right tool
Data lives in 47 places and needs to be in one place Fabric
Data is in one place but you don’t know if the math is right Capstone
Report authors produce different numbers for the same KPI Capstone (enforced metric definitions)
Pipeline is unreliable or slow Fabric
Ratio metrics are being averaged instead of recalculated Capstone
Need real-time streaming from SCADA/MES Fabric
Need domain-specific rules (shift exclusions, equipment categories) Capstone
Need audit trail for metric definition changes Capstone

Most organisations have both categories of problems. Most haven’t separated them clearly. The cost of conflating them is real: we’ve seen a single misapplied cost allocation formula propagate across three sites for over a year before anyone caught it. The pipeline worked perfectly the entire time.

Why Spreadsheets Lie About Cost Per Unit shows how ratio metric errors scale silently across sites. Your OEE Dashboard Is Hiding $88,000 a Day translates the gap between your OEE percentage and what it’s actually costing you. Or contact us to talk through your architecture.