Hi everyone,
Reading through the recent threads on anonymous surveys here, two honest admissions stood out to me:
- Joffm called the current situation "ethical anonymity" — because name/email are often still collected (for incentives, reminders), so the tool isn't really in a fully anonymous state; anonymity rests on the organizer behaving well.
- Holch pointed out that with fine structures (department, sub-department, unit...), you quickly drop into single-digit groups where one or two answers re-identify a specific person — and concluded "staff surveys are the hardest thing there is."
- And several users captured the core feeling: even when you promise the tokens are separated, someone always thinks "if the boss really wants to know what I wrote, he'll find a way." What they wanted was to be able to say "I cannot attribute responses" — not just promise it.
That exact gap — trust-based anonymity, and the small-cohort re-identification problem — is what I've been building an open-source protocol for (VERA), and I'd value this community's feedback on whether it could become a LimeSurvey plugin.
The idea: instead of publishing raw counts, publish an aggregate protected by differential privacy, so that no individual response — not even the lone outlier in a single-digit unit — is recoverable from the result. I structured it as a threat model of explicit "gates", each with an honest status:
- Gate 1 — Noise mechanism (CLOSED): aggregates perturbed via OpenDP (an audited DP library); epsilon = 0.5 computed analytically with meas.map(), not estimated.
- Gate 3 — Small cohorts: below a minimum participant threshold, nothing is published at all (directly addresses Holch's single-digit problem).
- Gate 4 — Composition: a capped privacy budget refuses further queries once exhausted, so anonymity can't be peeled away by re-querying.
- Gate 7 — Cohort differencing, the "49/1" attack (prototype, crypto to harden): one single-use token per participant per consultation enforces a partition. The partition logic works and is tested; the blind-signature primitive is still a homemade prototype that must be replaced by an audited library (RFC 9474) before production — I'm explicit about this.
- Gate 8 — Direct outlier inference (measured): leakage on the atypical respondent stays negligible thanks to the Gate 7 partition.
No raw responses retained after aggregation.
The goal is exactly what people in those threads asked for: turning "trust us, the tables are separated" into a mathematical guarantee that the link simply cannot be made.
What I deliberately do NOT claim: network-level observers (IP upstream) and coercion are out of scope; the GDPR qualification (anonymization vs pseudonymization, Art. 5) is left to a DPO/CNIL opinion.
Code, full threat model, reproducible proof:
github.com/taha-vera/Protocole-Vera
Questions:
1. Is there interest in a plugin that publishes DP-protected aggregates instead of raw counts?
2. Which LimeSurvey event/hook would be the right place to intercept aggregation before results are stored or exported?
3. Has anyone here explored differential privacy (beyond pseudonymization tools like ALIIAS)?
Happy to be challenged on any gate.
Taha
(Write here your question/remark)