Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Encryption of token table and responses

  • socius
  • socius's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
More
6 years 8 months ago #167706 by socius
Hi everybody,

before I place my feature-request here: Thanks to the LS Team and the community for the great work on LS! I'm grateful to have LS for collecting data and not to have employ other software and services. And since LS is open, problems like the one below have the chance to get solved before the competitors.

Protecting private data & GDPR
This request is about data security. In an ongoing project I am faced with the need for high security of the respondents data, both for the information of the personal information in the token table and the response data. Additionally the EU General Data Protection Regulation (GDPR), which is enforcable from May 25th 2018 unifys the laws for handling/protection of personal data in the EU. It "regulates the processing by an individual, a company or an organisation of personal data relating to individuals in the EU." ( ec.europa.eu/info/law/law-topic/data-pro...ation-gdpr-govern_en ) So I guess the majority of LS users are affected.

In other threads the effects of the GDPR on Onlinesurveys was discussed before
www.limesurvey.org/de/foren/can-i-do-thi...urvey-delete-me-link
www.limesurvey.org/forum/can-i-do-this-w...-of-tokens-responses

Anonymization, pseudonymization, encryption
The anonymization, pseudonymization of data plays a key role (the latter is mentioned 15-times in the GDPR, so I read). Especially for data that's online (also if it's behind a Login) the encryption of data is essential, since also pseudonymizating data in a data breach endangers respondents, if there are keys or personal info like email-adresses or other ways to infer from the answers to a person.). In case data gets lost, the collectors of unencrypted data will probably be in trouble. The question is how Limesurvey could be equipped to not to save and store personal information, the token table and the response table, unencrypted in the database, which is problematic and which in a data breach (unauthorized access in LS or just the DB) would definitely have very serious consequences (e.g. obligatory information of all potentially affected respondents and: very serious penalties)

I'd like to propose two functionalities that would help me (and maybe some other survey researchers here to sleep a little better before May 25th and after :-) . I'm not a programmer and thus cannot implement these functionalities - I'm still interested to help wherever I can - but I hope that there are other more sophisticated LS users around who are interested (I think most of us working with data of EU citizens should be interested ;-)

Two aims:
1) Encrypted personal data
The personal data in the token table should be protected by all means - it should not be accessible as plaintext in the database (not for people having unauthorized or even authorized access to the data - e.g. employees at the provider hosting your LS installation).

A possible solution would be to upload the token table with certain fields preencrypted (first name, last name, email) and have a plugin that does the temporary decryption (after entering a password) only when LS accesses the token table to send out the invitation and reminder-emails. The plugin would have "see" which entries are encrypted, maybe with a simple flag 0/1 as additional field. And: ideally the plugin would only decrypt the rows, i.e. respondents, the mails are sent to.

The offline pre-encryption could be done in many ways - e.g. in R package sodium (I had to delete the link to be able to post this). The encryption algorithm used for preencrypting the data "offline" certainly has to match the decryption algorithm "online" in LS. This would be the first step to have the personal information encrypted completely (almost all the time).

2) Encrypted Responses
As I wrote before in another thread on the gdpr and encryption (s. bottom) (and in www.limesurvey.org/forum/plugins/113944-...crypt-how-to-decrypt ) the LSEncrypt Plugin was developed to encrypt the whole response table via asymmetrical encryption, but a) does not work at the moment - not for 2.6.7LTS or later versions) and b) if it worked it would bring security at the cost of comfort working with LS: encrypted responses cannot be viewed, edited or deleted inside LS, the standard export function does not work, the timings are not exported (?), the data cannot be accessed via API (which is bad, since directly loading it into R via the LimeR Plugin ( github.com/cloudyr/limer ) works like a charm.

Imho a solution to this could be to a new/adapted plugin that on "Submit" -
a) takes the unencrypted answers and saves these as an encrypted string into one single table cell in the existing answer table
b) keeps the entries unencrypted that are not personal information and helpful when displayed in the Response Summary (id, lastpage, completed, startlanguage, startdate, datestamp)
c) keeps an encrypted version of the token
d) overwrites the rest of the answers with empty strings ""

The data would then be exported as usual and unencrypted offline with the private key, e.g. in R. This solution would keep the functionality of the response table, the export function (incl. export of timings), the ability to access the responses via api etc.

(My) conclusion
1) (i.e. Encrypting the token table) is necessary, since losing an unencrypted table with first name, last name and email-adress (and further information) of respondents would be a clear data breach and make it necessary to contact all these respondents and to inform them.
2) (i.e. Encrypting the response table) is not "as necessary" as 1), if the data is anonymized or pseudonomynized and it's not possible to infer from answers to respondents. As I'd question the latter, I'd say that 2) is also necessary, definitely!, since there is a chance that respondents become identifiable by their answers, especially when they enter text and give information about themselves.

So I think that 1) and 2) both are necessary and together would be absolutely great. We'd have a high security level for both the personal information and the potentially sensitive responses. There would be some additional work before and after a survey, caused by encryption/decryption, but there would not be any other loss of comfort in working with LS.



I'd like to help here, but I'm not a programmer, nor an encryption expert, but LS and R user interested in the protection of the respondents and their (sensitive) answers. If you think, that I can be of help, please let me know.
Remark: I still use 2.6.7LTS - so a solution that also works with the LTS Version would be great!

Thanks for reading and all the best,
G


@LS Forum Team: Please, please increase the number of links that can be mentioned in the forum entries - I'd like to reference my sources correctly, but again had to delete some (also links to other threads, etc.). Thanks a lot!
The following user(s) said Thank You: ritapas
The topic has been locked.
  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
6 years 8 months ago - 6 years 8 months ago #167960 by DenisChenu
Replied by DenisChenu on topic Encryption of token table and responses

socius wrote: e.g. employees at the provider hosting your LS installation

At contrary :
Before RGPD : you are responsible of data-breach
After RGPD : you AND your subcontractor are responsible. Your hoster are a subcontractor, then if databreach was done by one of his employee : he's responsible , not you.


My general opinion about encryption :
1. If you allow decrypting on server : there are no difference … you need to put the private key on the server … then server breach : security broken.
2. Your SQL server must be protected, then why decrypting on server change something ? Same than 1 : need server breach for token access.

The only real solution is to decrypt only on local, not on other way and never share the private key. Then LimeSurvey GUI can not give access to data decrypted.

[edit]
We can not encrypt token and have a decrypted access to it. We can only:
- Delete token when it's not needed anymore (for example in afterSurveyCompleted)

Encrypt all responses seems really the best solution, maybe we can add a way to : encrypt all response and delete some response answer when survey is submitted.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.
Last edit: 6 years 8 months ago by DenisChenu.
The following user(s) said Thank You: socius
The topic has been locked.
  • socius
  • socius's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
More
6 years 7 months ago #168220 by socius
Replied by socius on topic Encryption of token table and responses
Hi Denis,

thanks a lot for your answer! I'm really sorry for answering so late - I did not receive a note that there was a new answer and did not check till yesterday.

Concerning responsibility: yes, but I think the damage would still stick to the ones who conducted or commissioned the survey - even if the ISP is responsible here - I do not want to check this out ;-)

1) encryption of token table

As understand you @Denis, the encryption of the token table (or temporary decryption of the email-adresses etc. for sending out invitations, etc.) does not make sense, since the decryption on the server would render this insecure, right?

If you allow decrypting on server : there are no difference … you need to put the private key on the server … then server breach : security broken. [...] The only real solution is to decrypt only on local, not on other way and never share the private key. Then LimeSurvey GUI can not give access to data decrypted.


Would that still hold in the case of asymmetric encryption where you encrypt via public key and only enter the private key to decrypt each time something needs to be decrypted? I see that the private key - that would be entered - could be intercepted, but if the latter is not the case: would that be a viable solution?


2) encryption of the response table.

Encrypt all responses seems really the best solution, maybe we can add a way to : encrypt all response and delete some response answer when survey is submitted.

I think so - that sounds promising! My question here is, whether the responses have to be encrypted altogether or respondent-wise... The result of thinking about the response table so far (I repeat myself, but I hope this also gets more precise every time - does it? ;-)

If it's possible to collect all the answers as usually in the response table and on "Submit" save the individual answers collected in an encrypted string in a single cell of this table (deleting most and keeping some of the entries) this would be perfect. I imagine saving each of the responses encrypted and not the whole DB, so in each row in the response DB (so for each respondent) there would finally be
  • lots of empty cells in the table (deleted or overwritten with ""),
  • some not empty cells (id, lastpage, Completed, startdate, datestamp, and perhaps the token.
Remark: I'd need to keep the token, since I have to connect the data of a number of surveys by a key - so the token would be the easiest way. If the responses are encrypted, the availability of the key seems not too much danger to me.)
  • one cell in the table that contains all the answers collected in a string and is encrypted

Consequences
I assume that this functionality would make it relatively easy to manage encryption and work with encrypted data.
  • The unencrypted, not emptied cells of the response would be visible in the LS GUI.
We and LS could "see" which respondents took part (and which did not) and it would thus be possible to send out reminder emails. (The latter is not possible with LSEncrypt, because the encrypted responses are not accessible for LS and it's not possible to monitor the response, send reminders, etc.)
  • This would also make it possible to access the response-table via API.
I access Limesurvey with R ("free software environment for statistical computing and graphics" s. www.r-project.org/ ) with the functionality of limeR ( github.com/cloudyr/limer ) and this works perfectly - highly recommended!). The question is then, whether it's possible to decrypt the downloaded data directy in R. There is a number of packages in R for encrypting data (e.g. openssl CRAN.R-project.org/package=openssl , Sodium CRAN.R-project.org/package=sodium ). If this workflow was possible, this would be the easiest, most platform-independent (more than e.g. LSEncrypt) approach - since R is free and runs on Windows, Mac, Linux, Unix), and even if for the further analysis not R, but tools like SPSS, Stata, SAS etc. are used (since R can save the data in any of these formats).

Additionally (to me) it would also be important to:
  • have the timings table exported together with the data. That should not be a problem using the usual export facilities in LS (but I'm not sure if the timings data can be accessed via API, can they?)


Ok - so many wishes at once ;-) Thanks for your effort - and for reading this.

Thanks and all the best,
G
The topic has been locked.
  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
6 years 7 months ago - 6 years 7 months ago #168282 by DenisChenu
Replied by DenisChenu on topic Encryption of token table and responses
1) encryption of token table

As understand you @Denis, the encryption of the token table (or temporary decryption of the email-adresses etc. for sending out invitations, etc.) does not make sense, since the decryption on the server would render this insecure, right?

Yes, and LimeSurvey is http accessible, not the database.

I see that the private key - that would be entered - could be intercepted, but if the latter is not the case: would that be a viable solution?

Security is here, but i'm unsure we have a way to do it easily …

2) encryption of the response table.

My question here is, whether the responses have to be encrypted altogether or respondent-wise...

It can be only respondent-wise since it's done when user submit (for the last time) the survey.
But maybe it's here the issue with current plugin ? We decrypt whole, but it's crypted one by one … Thansk for the idea ;)

The question is then, whether it's possible to decrypt the downloaded data directy in R.

Arg here, difficult too …

There is a number of packages in R for encrypting data

Yes encrypt, not decrypt, right ?


else

be still able to pipe data from one survey to another via end-url (I guess the URL will be built before encrypting the data, right?) manual.limesurvey.org/Workarounds:_Surve...using_the_survey_URL

Current plugin do it after yes.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.
Last edit: 6 years 7 months ago by DenisChenu.
The following user(s) said Thank You: socius
The topic has been locked.
  • socius
  • socius's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
More
6 years 7 months ago #168432 by socius
Replied by socius on topic Encryption of token table and responses
Hi!

Thank you Denis for your response!

Ok - so (1) the temporary decryption of pre-encrypted personal data for sending out emails etc. seems not feasible, but (2) has still a chance (if we get the decryption part working or someone can replace the encryption part with one that matches with the according decryption function (e.g. in R using openssl - see below).

>> There is a number of packages in R for encrypting data
> Yes encrypt, not decrypt, right ?

Almost "anything" is possible in R since it's a programming language with lots of user-contributed packages - and of course it's possible to either encrypt and decrypt :-) The question is, whether it's possible to use the same algorithm in Limesurvey for encryption and in R for decryption. That could be possible with the openssl-Package CRAN.R-project.org/package=openssl.

With this package it's possible to use openssl functions via R, e.g. create different kinds of keypairs (rsa, dsa, ec) cran.r-project.org/web/packages/openssl/openssl.pdf#page=11 and to encrypt and decrypt via envelope_encrypt() resp. envelope_decrypt() cran.r-project.org/web/packages/openssl/openssl.pdf#page=7 .



I remember that I such tried to decrypt the via LSEncrypt encoded data - but I was missing at least one piece of information - the session key:

An envelope contains ciphertext along with an encrypted session key and optionally and initial-ization vector. Theencrypt_envelopegenerates a random IV and session-key which is used toencrypt thedatawithAESstream cipher. The session key itself is encrypted using the given RSAkey (seersa_encrypt) and stored or sent along with the encrypted data. Each of these outputs isrequired to decrypt the data with the corresponding private key


I have no idea how to get this out of LS - though I think that it's possible - are there any encryption experts around? :-)

LSEncrypts encrypt() function uses openssl_public_encrypt() and to me it seems that this function makes use of the all these things we'd need to dycrypt offline (but as a php and encryption novice I have my problems here :-)
See this part of the LSEncrypt code on github.com/SamMousa/limesurvey-encrypt/c...9b1edf2c2d69fe60R167
Code:
        protected function encrypt($publicKey, $data)
        {
            // Generate a random password for symmetric encryption.
            $symmetricKey = openssl_random_pseudo_bytes(50);
            $encryptedKey = '';
            if (openssl_public_encrypt($symmetricKey, $encryptedKey, $publicKey))
            {
                // Use the encrypted key as basis for the IV.
                $iv = substr($encryptedKey, 0, mcrypt_get_iv_size(MCRYPT_BLOWFISH, MCRYPT_MODE_CBC));
                $encryptedData = mcrypt_encrypt(MCRYPT_BLOWFISH, $symmetricKey, $data, MCRYPT_MODE_CBC, $iv);
                // Key has constant length so no separator is necessary.
                return $encryptedKey . $encryptedData;
            }
            /**
             * @todo Proper error handling.
             * 
             */
 
        }
 

My question is also if this can handle entries that are longer than the key (I hope that this is not a stupid question, but some entries on php.net/manual/de/function.openssl-public-encrypt.php mention this).


So. I will play around with these R functions to get more familiar with encryption per se, but I'm sure I won't be much help in finding a way to find a way to do the encryption part in LS. I only can wait for some encrypted data to come and to try to decrypt it in R ;-) and I will try to get funding for this part of my wishlist and contribute that way. I think I asked this somewhere before, but are there any crowdfunding plans where users could gather to collect budget for certain functions in R?


Thanks a lot and all the best,
G
The topic has been locked.
Moderators: holchtpartner

Lime-years ahead

Online-surveys for every purse and purpose