Hi everybody,
before I place my feature-request here: Thanks to the LS Team and the community for the great work on LS! I'm grateful to have LS for collecting data and not to have employ other software and services. And since LS is open, problems like the one below have the chance to get solved before the competitors.
Protecting private data & GDPR
This request is about data security. In an ongoing project I am faced with the need for high security of the respondents data, both for the information of the personal information in the token table and the response data. Additionally the EU General Data Protection Regulation (GDPR), which is enforcable from May 25th 2018 unifys the laws for handling/protection of personal data in the EU. It "regulates the processing by an individual, a company or an organisation of personal data relating to individuals in the EU." (
ec.europa.eu/info/law/law-topic/data-pro...ation-gdpr-govern_en
) So I guess the majority of LS users are affected.
In other threads the effects of the GDPR on Onlinesurveys was discussed before
•
www.limesurvey.org/de/foren/can-i-do-thi...urvey-delete-me-link
•
www.limesurvey.org/forum/can-i-do-this-w...-of-tokens-responses
Anonymization, pseudonymization, encryption
The anonymization, pseudonymization of data plays a key role (the latter is mentioned 15-times in the GDPR, so I read). Especially for data that's online (also if it's behind a Login) the encryption of data is essential, since also pseudonymizating data in a data breach endangers respondents, if there are keys or personal info like email-adresses or other ways to infer from the answers to a person.). In case data gets lost, the collectors of unencrypted data will probably be in trouble. The question is how Limesurvey could be equipped to not to save and store personal information, the token table and the response table, unencrypted in the database, which is problematic and which in a data breach (unauthorized access in LS or just the DB) would definitely have very serious consequences (e.g. obligatory information of all potentially affected respondents and: very serious penalties)
I'd like to propose two functionalities that would help me (and maybe some other survey researchers here to sleep a little better before May 25th and after
. I'm not a programmer and thus cannot implement these functionalities - I'm still interested to help wherever I can - but I hope that there are other more sophisticated LS users around who are interested (I think most of us working with data of EU citizens should be interested
Two aims:
1) Encrypted personal data
The personal data in the token table should be protected by all means - it should not be accessible as plaintext in the database (not for people having unauthorized or even authorized access to the data - e.g. employees at the provider hosting your LS installation).
A possible solution would be to upload the token table with certain fields preencrypted (first name, last name, email) and have a plugin that does the temporary decryption (after entering a password) only when LS accesses the token table to send out the invitation and reminder-emails. The plugin would have "see" which entries are encrypted, maybe with a simple flag 0/1 as additional field. And: ideally the plugin would only decrypt the rows, i.e. respondents, the mails are sent to.
The offline pre-encryption could be done in many ways - e.g. in R package sodium (I had to delete the link to be able to post this). The encryption algorithm used for preencrypting the data "offline" certainly has to match the decryption algorithm "online" in LS. This would be the first step to have the personal information encrypted completely (almost all the time).
2) Encrypted Responses
As I wrote before in another thread on the gdpr and encryption (s. bottom) (and in
www.limesurvey.org/forum/plugins/113944-...crypt-how-to-decrypt
) the LSEncrypt Plugin was developed to encrypt the whole response table via asymmetrical encryption, but a) does not work at the moment - not for 2.6.7LTS or later versions) and b) if it worked it would bring security at the cost of comfort working with LS: encrypted responses cannot be viewed, edited or deleted inside LS, the standard export function does not work, the timings are not exported (?), the data cannot be accessed via API (which is bad, since directly loading it into R via the LimeR Plugin (
github.com/cloudyr/limer
) works like a charm.
Imho a solution to this could be to a new/adapted plugin that on "Submit" -
a) takes the unencrypted answers and saves these as an encrypted string into one single table cell in the existing answer table
b) keeps the entries unencrypted that are not personal information and helpful when displayed in the Response Summary (id, lastpage, completed, startlanguage, startdate, datestamp)
c) keeps an encrypted version of the token
d) overwrites the rest of the answers with empty strings ""
The data would then be exported as usual and unencrypted offline with the private key, e.g. in R. This solution would keep the functionality of the response table, the export function (incl. export of timings), the ability to access the responses via api etc.
(My) conclusion
1) (i.e. Encrypting the token table) is necessary, since losing an unencrypted table with first name, last name and email-adress (and further information) of respondents would be a clear data breach and make it necessary to contact all these respondents and to inform them.
2) (i.e. Encrypting the response table) is not "as necessary" as 1), if the data is anonymized or pseudonomynized and it's not possible to infer from answers to respondents. As I'd question the latter, I'd say that 2) is also necessary, definitely!, since there is a chance that respondents become identifiable by their answers, especially when they enter text and give information about themselves.
So I think that 1) and 2) both are necessary and together would be absolutely great. We'd have a high security level for both the personal information and the potentially sensitive responses. There would be some additional work before and after a survey, caused by encryption/decryption, but there would not be any other loss of comfort in working with LS.
I'd like to help here, but I'm not a programmer, nor an encryption expert, but LS and R user interested in the protection of the respondents and their (sensitive) answers. If you think, that I can be of help, please let me know.
Remark: I still use 2.6.7LTS - so a solution that also works with the LTS Version would be great!
Thanks for reading and all the best,
G
@LS Forum Team: Please, please increase the number of links that can be mentioned in the forum entries - I'd like to reference my sources correctly, but again had to delete some (also links to other threads, etc.). Thanks a lot!