
Review-Trust Pipeline: How We Make Reviews Reliable
Reliable review analysis requires transparency. At Collected.reviews, we use our own method: the Review-Trust Pipeline. It filters out noise, detects manipulation, and assesses reviews for reliability so that every theme score truly means something. Below, you can read how it works – with concrete figures.
Dataset
For this measurement, we used the dataset EU Retail Reviews v1.3, containing a total of 182,450 reviews (of which 169,732 were unique after deduplication). The period covers 1 January to 30 September 2025, with data from the Netherlands, Germany, Belgium and Austria, in the languages NL, DE and EN. The analysis was carried out using pipeline version 2.4.0.
Why This Is Necessary
Not all reviews are equally valuable. We identify three structural issues:
- Manipulation – spikes in short periods, copied texts, or reward campaigns.
- Noise – incomplete sentences, duplicate submissions, non-experiential opinions.
- Bias – mostly extreme experiences are shared, or platforms moderate selectively.
To correct such distortion, we assess each review across six signals.
The Five Steps of Our Pipeline
-
Intake and Normalisation
All reviews are converted into a uniform schema (text, date, star rating, metadata). Exact duplicates are removed.
-
Identity and Behaviour
Account age, posting frequency, device patterns and timing clusters (where the source allows).
-
Text Signals
Semantic repetition, template phrases, and extreme sentiment without details.
-
Incentive Detection
Language indicating benefit (discount, cashback, gift card) → label “incentivised”.
-
Weighting and Normalisation
Each review receives a trust score (0–1). Theme scores are weighted and time-corrected (recent > old).
Important: We never delete anything arbitrarily; we evaluate it. Transparency over censorship.
Key Signals and Thresholds
Signal Threshold Effect Duplicate / near-duplicate ≥ 0.88 semantic overlap lower trust Timing spike peak within 12 hours vs baseline lower weighting Incentive language word list + context label “incentivised” Template phrases repetition score > 0.75 lower trust Lack of detail extreme sentiment without facts lower trust Account signals young account + high output lower trust
Weighting Model
Each component receives a weight; the formula in short:
trust = 1 − (0.35D + 0.20S + 0.20I + 0.10T + 0.10P + 0.05A) Component Symbol Weight Duplicate / near-dup D 0.35 Timing spike S 0.20 Incentive language I 0.20 Template phrases T 0.10 Lack of detail P 0.10 Account signals A 0.05 Time decay λ 0.015
Mini Results (Q1–Q3 2025)
Metric Value Share of near-duplicates 6.8% Share of incentivised reviews 12.4% Median trust score 0.73 Average theme score correction +4.6 points Detected spike events 89
This correction ensures more representative theme scores. A sector with many promotions is no longer artificially positive.
Example Cases
Case Signal Effect on trust C-1274 35 identical sentence parts within 2 hours −0.22 C-2091 Coupon mention + referral link −0.18 C-3310 40 reviews new account within 24 hours −0.26
Normalisation and Reporting
After weighting, we first normalise per platform (to compensate for moderation differences) and then cross-platform via z-score, so that all results appear on a single scale (0–100). On the company page, we display:
- weighted theme scores,
- sentiment distribution,
- reliability band (CI),
- share of incentivised reviews.
Limitations
- Not every platform provides device or account data.
- Short reviews remain difficult to assess.
- Source bias: audience per source may differ from the actual customer base.
- Irony or sarcasm is not always accurately recognised.
That’s why we report with margins and definitions rather than absolute truths.
What This Means for You
For Consumers
Trust patterns, not outliers. Check labels “incentivised” and “low repetition”.
For Companies
Address themes with high impact and low trust (e.g. billing or delivery time) for quick gains.