Description English: Ethanol near IR spectrum. I took this spectrum using an Ocean Optics near IR (NIR-512) temperature-regulated InGaAs detector spectrometer [1] with IR fiber optic light guide. This is a very rough spectrum and should not be used for any kind of quantitative data whatsoever. I took it by shining the light from a halogen lightbulb through a tiny (~20ml) beaker of liquid ethanol (~2cm liquid optical path) and into the fiber optic of the spectrometer (I subtracted the spectrum of the empty beaker before taking this one). The spectrometer was not really intended to be used this way and it is a very sloppy way to take a spectrum! Nonetheless, based on comparing it to simillarly taken spectra of water and methanol (and other professionally traken nir ethanol spectra), with the exception of the region between about 1400 and 1600nm (this region is saturated by very strong absorbance), I think it likely fairly accurately shows real features of the NIR spectrum of this compound. Date 10 September 2006 (original upload date) Source Transferred from en.wikipedia to Commons by Sevela.p using CommonsHelper. Author The original uploader was Deglr6328 at English Wikipedia.

I recently had to audit a production NIR spectroscopy system used for biomass analysis. On paper, it sounded impressive: “state of the art chemometrics,” “advanced multivariate calibration,” “industry ready models.” In reality? It was a textbook example of how to do NIR modeling wrong in ways that quietly transfer risk and liability from the software vendor straight onto the client’s balance sheet.

Thank you for reading this post, don't forget to subscribe!

If your organization is buying NIR-based biomass analytics “as a service,” you should know exactly what to look out for, because this kind of thing is much more common than anyone likes to admit.


When “Advanced Analytics” Is Just Bad Math in a Nice UI

The first red flag was the Mahalanobis distance implementation used for local weighted regression (LWR). On the surface: “We use robust Mahalanobis distance to select local neighbors.” Sounds serious, right?

Under the hood, the system was doing this:

  • Centering the data ✅
  • Not scaling properly before computing the covariance matrix ❌

That means wavelengths with naturally higher variance (e.g., big water bands around 1400–1900 nm) completely dominated the distance metric. The “nearest neighbors” were not chemically similar samples – they were just spectrally loud in the most variable regions.

Bottom line: every single prediction was based on the wrong neighborhood of calibration samples. The math still ran, the UI still plotted nice graphs, but the scientific meaning was broken.

If your vendor can’t explain exactly how they compute distances and why, that’s not a “detail”, it’s a risk.


Preprocessing: When the Pipeline Is Upside Down

Next, the spectral preprocessing pipeline.

What it did:

  1. Derivative
  2. SNV
  3. Detrend

What it should have done (per decades of NIR practice and standards like ASTM/ISO):

  1. SNV (to correct scatter)
  2. Detrend (to fix baseline)
  3. Then derivative (to enhance features)

Doing derivatives before scatter correction is like sharpening a blurry, dirty photo and then trying to clean it afterwards. All you do is amplify garbage.

The impact:

  • Noise pumped up unnecessarily
  • Models needing more PLS components just to chase preprocessed artifacts
  • Prediction errors higher than they needed to be

This isn’t obscure, cutting-edge theory. This is basic NIR hygiene taught in short courses and textbooks.

So if your supplier waves away the preprocessing details with “we use standard chemometric methods,” ask them to write down the exact sequence. The order matters.


PLS Components by Vibes, Not Validation

Then there was the way they chose the number of PLS components.

Instead of using cross-validation to find the sweet spot between underfitting and overfitting, the logic was literally:

number_of_components = number_of_samples / 5, capped at 30

No theory, no justification. And yes, they did calculate PRESS and RMSECV… and then simply ignored them when deciding how many components to keep.

The result was painfully predictable:

  • Small calibration sets: overfitted models that look great in-sample, fall apart out-of-sample
  • Larger calibration sets: underfitted models that never reach the performance they could

If a system picks PLS components by a hard-coded rule instead of data-driven validation, it’s not doing chemometrics. It’s doing numerology.


Cross-Validation You Can’t Trust

Just to make things worse, they combined arbitrary component selection with 20-fold cross-validation on calibration sets of about 50 samples.

Do the math:

  • ~50 samples / 20 folds ≈ 2–3 samples per test fold

That’s nowhere near enough to give stable, trustworthy error estimates. The R² and RMSE numbers coming out of that setup look precise, but they’re statistically noisy. You can easily see swings of ±0.15 in R² just by reshuffling folds.

If someone shows you dazzling performance metrics from tiny calibration sets with very high fold counts, be skeptical. It’s very easy to generate pretty numbers that don’t survive contact with the real world.


No Outlier Detection = Garbage In, Garbage Out

Perhaps the most worrying finding: there was no outlier detection whatsoever.

No Hotelling T², no Q-residuals, no leverage checks, nothing to catch:

  • Instrument glitches
  • Mispositioned samples
  • Contaminated or unusual materials
  • Typo-riddled reference lab values

In real NIR work, a single ugly outlier in a 50-sample calibration can inflate RMSE by 40–60%. If nobody is looking for these, you’re building models on landmines. If your NIR vendor can’t show you how they detect (and handle) spectral and compositional outliers, you’re the one absorbing that uncertainty.


Under the Hood: 1990s Architecture in 2025 Clothing

The scientific issues would have been bad enough. The software engineering choices made them worse.

A few highlights:

  • Database abuse:
    Thousands of individual INSERT statements in loops instead of bulk operations. Think minutes of processing time where milliseconds would do. That translates directly into slower throughput and unnecessary infrastructure cost.
  • Global variables everywhere:
    Critical things like database connections and large matrices thrown into global scope. That’s a recipe for subtle bugs, memory leaks, and impossible-to-debug behavior – especially if anyone ever tries to parallelize or scale.
  • Zero error handling:
    File loads, database queries, and parsing operations all assumed to succeed. When they didn’t, the system didn’t fail loudly; it just carried on in a half-broken state. That’s how you end up with corrupted results without even knowing it.
  • Infinite polling loop:
    A process that wakes up every few seconds, manually polls a database, and goes back to sleep. This was a perfectly normal pattern in the 1990s. In 2025, with event-driven architectures and proper APIs, it’s a red flag.

These are not “stylistic” concerns. Poor engineering choices increase the chance of silent failure – exactly the kind of failure that’s most dangerous in analytical systems.


What’s Missing That Should Be Standard

More telling than what was broken was what wasn’t there at all.

No Calibration Transfer

Real-world NIR setups drift over time and differ between instruments. Serious platforms have:

  • Drift monitoring
  • Slope/bias correction
  • Calibration transfer strategies

Without that, you get:

  • 2020: Lignin = 28.5%
  • 2024 (same sample, same instrument): Lignin = 26.2%
  • 2024 (another instrument): Lignin = 31.1%

Same biomass, different “truths”.

No Model Monitoring

Once models were deployed, they were never monitored. No control charts, no alerts, no drift detection, no automated triggers for recalibration.

From a distance, it looked like a stable system. Up close, it was more like a “fire-and-forget” model dump.

No Uncertainty

Clients only saw point estimates:

“Lignin = 28.5%”

No confidence intervals, no indication of prediction uncertainty, no guidance on when a sample is near a decision threshold.

If you’re making six-figure purchasing decisions based on these numbers, that missing ±1–2% error band can be the difference between a smart decision and a very expensive mistake.


The Compliance Illusion

If you work in a regulated or quality-conscious environment, this part matters a lot.

On paper, you might hear: “We follow ISO and ASTM guidelines.”

In practice, this system:

  • Didn’t meet key requirements of ISO 12099 (e.g., outlier detection, ongoing validation, method documentation, QC procedures)
  • Didn’t align with ASTM E1655 expectations for preprocessing validation, independent validation sets, and QA/QC
  • Had no audit trails, no formal validation protocols, and no quality management system around the analytics

That means anyone trying to use such a system in GMP, ISO 17025, or CFR Part 11 contexts would be standing on very thin ice.


Why This Happens (And Why It Matters)

It’s tempting to blame this on “bad developers,” but the root causes are broader:

  • Systems built without real chemometrics expertise
  • Statistical methods implemented once and never peer-reviewed
  • Old codebases accumulating tech debt instead of being refactored
  • No culture of validation, monitoring, or quality management

Meanwhile, the marketing layer keeps evolving: prettier dashboards, nicer client reports, more buzzwords. It’s very easy for a buyer to confuse a polished UI with solid science under the hood.

But at the end of the day, nature doesn’t care about your UI. Bad math is still bad math, even if you wrap it in modern front-end frameworks.


If You’re a Client: Questions You Should Be Asking

If you’re buying NIR biomass analysis from a third party, you don’t need to become a chemometrics expert overnight. But you should feel comfortable asking pointed questions like:

  • How do you choose the number of PLS components? Show me.
  • What is your preprocessing pipeline, in exact order? Why that order?
  • How do you detect and handle outliers?
  • How do you handle instrument drift and calibration transfer?
  • How do you monitor model performance over time?
  • Can you provide uncertainty estimates, not just point predictions?
  • How do you align with ISO 12099 / ASTM E1655?

If the answers are vague, defensive, or hand-wavy, take that seriously. “Trust us, it’s proprietary” is not good enough when million-dollar decisions depend on the output.


Fixing It: What a Grown-Up NIR Platform Looks Like

A scientifically and technically robust NIR biomass system doesn’t have to be flashy, but it does need to:

  • Use mathematically consistent distance metrics
  • Apply preprocessing in a scientifically justified order
  • Select model complexity via proper cross-validation
  • Detect and handle outliers as a first-class concern
  • Support calibration transfer and drift monitoring
  • Provide uncertainty, not just point predictions
  • Run on maintainable, well-tested, well-instrumented code
  • Live inside an actual quality management framework

That’s the minimum bar for anyone claiming to provide “industrial-grade” NIR biomass analytics.


Final Thought

NIR spectroscopy is a powerful, mature technology. In the right hands, with the right modeling discipline, it can save enormous amounts of time and money and unlock deep insights into biomass composition. But when shortcuts are taken, in the math, in the validation, in the engineering the risk doesn’t disappear. It just moves.

It moves from the vendor’s cost of doing things properly…
…to the client’s cost of bad decisions made on beautifully wrong numbers.

If you rely on NIR for biomass analysis, make sure you know which side of that line you’re on.

By Kemal

A bioprocess engineer, modeler, machine learning dreamer.