RINSE | NMR cleanser

what is rinse

A small, honest restorer.
No tricks, no proprietary ops.

RINSE trains a three stage 1D U-Net on paired clean / dirty spectra and runs inference on either simulated or experimental data, using a fixed trained model in the online server workflow.

01 / input

Dirty spectrum · 2 channels

Real and imaginary parts of an NMR spectrum carrying noise and artefacts, fed to the network as a (W, 2, N) WHCN tensor, normalised per-spectrum.

02 / model

1D U-Net · 3 down / 3 up

Encoder channels base → 2× → 4× → 8×. Two residual blocks at the bottleneck. Nearest-neighbour upsampling, skip-concat on channels.

03 / output

Clean spectrum · 1 channel

A single 1×1 head convolution emits the predicted real-valued clean spectrum. Denormalised back to the original scale before saving.

← try scrolling. the spectra do things.

prediction architecture

UNet1D · three down, three up.

Spatial dimension W is padded up to a multiple of 8 so three rounds of 2× mean-pooling stay exact. At each decoder stage the matching encoder feature is concatenated on the channel axis before the conv block. A final 1×1 head collapses to one output channel, and the result is cropped back to the original length.

encoder block · conv3 + BN + relu + resblock bottleneck · conv3 + 2× resblock upsample · nearest ×2 + conv3 skip connection 1×1 head

residual block

ResBlock1D

Two Conv(3) + BatchNorm + relu stages with a skip: x + conv2(relu(bn2(conv1(relu(bn1(x)))))).

default widths

base = 32 channels

Encoder widths are 32 → 64 → 128 → 256 at the bottleneck. Decoder mirrors them on the way up.

padding trick

multiple-of-8 guard

Before the first encoder, the input is right-padded with zeros so W mod 8 == 0. After the head, the output is cropped back to the original width.

dirty signal model

How the dirty spectrum is built.

For training data the dirty spectrum is the clean one plus additive noise scaled by the inverse of a chosen SNR. Higher SNR means less noise; lower SNR means the network sees much harsher inputs.

f_dirty(t) = f(t) + η_SNR(t)
η_SNR(t) = |f(0)| / SNR · ξ_t
where ξ_t is i.i.d. standard normal; SNR is dimensionless.

peak-aware nmr loss

Why plain MSE throws peaks away.

NMR peaks are sparse, sharp, and carry almost all the useful information in the spectrum. A plain MSE objective is dominated by the much larger flat regions, and the network learns to suppress peaks along with the noise. RINSE weights the loss by the target's local curvature so peak apices matter more.

ℒ = MSE(ŷ, y) + λ_peak · MSE_w(ŷ, y) + λ_L1 · L1(ŷ, y) + λ_integ · IntegMatch(ŷ, y; k)
where w(y) = min_w + α · relu(−d²y/dx²) / max(relu(−d²y/dx²))

mse

Mean squared error over all points. The reference term.

always on · weight = 1

peak-weighted mse

MSE re-weighted by the target's local curvature so peak apices carry α× more cost.

λ_peak = 5.0 · α = 10.0

Small L1 term to discourage over-smoothing and keep narrow features.

λ_L1 = 0.05

integral match

Bins the spectrum into k segments and penalises per-bin area mismatch. Conserves total spectral area.

λ_integ = 0.5 · k = 32

note: the curvature weight and integral scale are wrapped in Zygote.ignore — they depend only on y, which has no gradient, and the weight computation uses in-place writes that Zygote wouldn't handle cleanly anyway.

Use the online version

online server workflow

Upload, process, preview, download.

The online RINSE server accepts JEOL .jdf, Bruker fid + acqus, or numeric TXT/CSV inputs. The browser sends the upload to a secure PHP gatekeeper, which validates the files and parameters, creates a private job folder, writes parameters.toml, and launches CPU-only Julia inference. When processing completes, the result page displays a downsampled preview and provides one token-protected ZIP download containing CSV, NPZ, JLD2, preview JSON, and run metadata.

01 / submit

Choose format and upload

Users submit JEOL, Bruker, or TXT/CSV data with controlled file-size limits and server-side validation.

02 / process

NMRflux preprocessing + RINSE

The backend applies zero filling, apodization, Fourier transform, and CPU-only model inference.

03 / receive

Preview and ZIP download

The result page shows a secure preview plot and provides a single ZIP package for download.

install · train · infer

Command-line workflow for developers.

① install

# from a julia 1.10+ repl in your default env
using Pkg
Pkg.add(["CUDA", "Flux", "JLD2", "BSON",
         "TOML", "Functors", "Plots", "Zygote"])
Pkg.develop(path="/path/to/NMRflux")

② train

$ julia scripts/train.jl dataset.jld2
$ julia scripts/train.jl dataset.jld2 configs/training_config.toml

# resume from a full checkpoint (state + optimizer)
$ julia scripts/train.jl dataset.jld2 cfg.toml --restart model_last.bson

# weight-only warm start (fresh optimizer, new lr)
$ julia scripts/train.jl dataset.jld2 cfg.toml --init    model_best.bson

③ infer

$ julia scripts/infer.jl infer-simulated    model.bson dataset.jld2
$ julia scripts/infer.jl infer-experimental model.bson dataset.jld2

# point at a directory to use every .bson as an ensemble
$ julia scripts/infer.jl infer-simulated    models_dir/ dataset.jld2

④ plot

$ julia scripts/infer.jl plot-simulated        result.jld2 7
$ julia scripts/infer.jl plot-simulated-all    result.jld2
$ julia scripts/infer.jl plot-experimental     result.jld2 0 700.0 4.76
$ julia scripts/infer.jl plot-experimental-all result.jld2

output directory

Every infer.jl run creates a fresh, timestamped folder in the working directory. The result .jld2, all PNG plots produced by the plot-* commands, and a run.log all land inside it — nothing overwrites across runs.

          inference_output_2026-04-18_14-02-07/
          ├── cleansed_real_demo_simulated.jld2 # predictions
          ├── plots/ # png images from plot-*
          │ └── *.png
          └── run.log
        

training checkpoints

EVERY N EPOCHS

model_last.bson

Overwritten at every checkpoint step and again at exit. Use this to resume a run that was interrupted.

ON IMPROVEMENT

model_best.bson

Overwritten whenever validation loss drops below the previous best. This is what the inference script loads by default.

ON DECADE THRESHOLD

model_E-K_epoch_<E>_vloss_<X>.bson

A tagged archive snapshot each time the validation loss crosses a new decade threshold. These stack — point infer.jl at the directory to ensemble them.

layout

          RINSE/
          ├── README.md
          ├── configs/
          │ └── training_config.toml
          ├── scripts/
          │ ├── train.jl # training entrypoint
          │ └── infer.jl # inference + plotting cli
          └── src/
           ├── RINSE.jl # main module, includes the rest
           ├── constants.jl # named constants
           ├── config.jl # Trainer + LossConfig structs
           ├── tensor_utils.jl # complex ↔ real reshaping, norms
           ├── model.jl # UNet1D architecture
           ├── losses.jl # peak-aware nmr loss
           ├── metrics.jl # mean train / val loss eval
           ├── checkpoints.jl # bson save / load, device safe
           ├── data_io.jl # training / inference dataset loaders
           ├── plotting.jl # training loss curves
           ├── inference.jl # infer_simulated, infer_experimental
           ├── inference_plotting.jl
           └── training.jl # main training loop
        

rinse artefacts
and restore
NMR spectra.

A small, honest restorer.
No tricks, no proprietary ops.

Dirty spectrum · 2 channels

1D U-Net · 3 down / 3 up

Clean spectrum · 1 channel

from dirty → predicted → clean

spectrum · s1384

UNet1D · three down, three up.

ResBlock1D

base = 32 channels

multiple-of-8 guard

How the dirty spectrum is built.

Why plain MSE throws peaks away.

Upload, process, preview, download.

Choose format and upload

NMRflux preprocessing + RINSE

Preview and ZIP download

Command-line workflow for developers.

① install

② train

③ infer

④ plot

output directory

training checkpoints

model_last.bson

model_best.bson

model_E-K_epoch_<E>_vloss_<X>.bson

layout

How to cite RINSE.

A small, honest restorer. No tricks, no proprietary ops.

Dirty spectrum · 2 channels

1D U-Net · 3 down / 3 up

Clean spectrum · 1 channel

from dirty → predicted → clean

spectrum · s1384

UNet1D · three down, three up.

ResBlock1D

base = 32 channels

multiple-of-8 guard

How the dirty spectrum is built.

Why plain MSE throws peaks away.

Upload, process, preview, download.

Choose format and upload

NMRflux preprocessing + RINSE

Preview and ZIP download

Command-line workflow for developers.

① install

② train

③ infer

④ plot

output directory

training checkpoints

model_last.bson

model_best.bson

model_E-K_epoch_<E>_vloss_<X>.bson

layout

How to cite RINSE.

A small, honest restorer.
No tricks, no proprietary ops.