Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Why store private key on server?

More
2 years 4 months ago #242103 by r0bis
## Limesurvey user permissions

For the readonly permissions the user needs to have permissions to Export the survey.

Thanks, that is very good to know.

- Keep the current Global key
- Key management page : list of public key
- Create a new key : allow to get the private key, but not saved on server.

Yes, this last point is likely to be difficult to explain to _all_ users - your private key (decryption key) is not going to be stored, save it, or else the data you collect will be irretrievably lost. I am tempted to say - we can give users a tutorial how to create keys suitable to sodium, we could provide a method how to verify them, but let the user create both keys and only give to LS the public key. Otherwise there might be users who will just say yes-yes-yes to everything without understanding and will end up with losing limeSurvey generated private key AND will still collect the data, but then will end up being unable to decrypt (because they did not read everything and lost/didn't save their private key). If they are capable to go through the trouble of generating the key pair (which is fairly trivial, actually), they should be in good position to have the keys and not bother limesurvey developers with un-answerable questions, like - how do I decrypt my data if I don't have...my private key. 

- In survey settings : list of key : «Key used to encrypt data» (Default to Global one)

Okay, the global probably means the one that LS generates for the current encryption method. I would say - let the user clearly choose between the two encryption methods. The current method protects mainly against rogue database administrators, the method we are talking about - against webserver attacks and state actors.
 
- Survey activation disable update of the key (what for Token ?)

I think token is just something that belongs to service data (like submitdate and response id, seed etc). I would say - encrypt _just_ the submitted reponse fields, not service data. Nothing generated by limesurvey should be encrypted. Tokens are only needed to ensure one user - one response. We need to protect only what is inside the response, not the token.  The token does not identify the data uniquely - unless we can decrypt the user responses (like "my name, my address" etc).

- If key is not global : no decrypt : some warning
     - Response can not be reloaded for crypted question
     - When export get data crypted (specific format ?)
     - Statistics are disable for crypted question

Yes, statistics are disabled for encrypted questions, that is for sure. The other points - yes, they make sense. 

- About the way to decrypt : we need to offer some tools ?
Maybe create a Plugin near limesurvey-encrypt but with sodium is more easy.

I would say we need to allow this to be done by some publicly available tools. I could vouch for R. Python, obviously, would do the trick. If we have the scheme where we encrypt the responses with the public key, I think we are stuck with data tables where the delimiters (such as commas in .csv) are unencrypted and service fields (see above) too are not encrypted, but the responses from the participants are. So - if they (just responses) are encrypted once with a public key we cannot decrypt them with a private key (because we do not have on the server). This means we cannot give the user an file (like .csv) encrypted with the public key, because then we would have encrypted the responses twice (the second time when we are making the export file). So then - we can just give the .csv file where structures (the commas and linebreaks) and service data are plaintext but responses are encrypted. Then it is up to the user to decrypt them locally. I am not sure if Limesurvey developers could provide a java gui tool that would take an user pasted private key and decrypt those responses - i.e. save the csv fully in plaintext locally. They could, but then it is added maintenance, although it would be pretty cool. But maybe such a tool is already available; it would be pretty simple. As for R, I could write a readme how to do that, but making a reliable gui tool is beyond me <sorry>. 

- When user submit : whole response line was deleted and saved crypted in another table (way to save before cryot ? CSV ? JSON ?)

I think the responses are an array of responses and each element of the array gets encrypted individually and stored into the respective mysql table cell. Maybe there are better ways to do it, but I just don't know. 

- A new button to get whole crypted data : all file was crypted.

Yes, a button should be there to get the responses, however the file would be a plaintext structure containing encrypted responses, demarcated by the said structures. I just don't think it makes sense to even try to encrypt the file. As I said above - if the responses are encrypted at the point of storing them into the database, and we do not have the decryption key on the server (the whole idea of this thread), then we cannot encrypt the whole downloadable file. If we try to do so, we will encrypt the data structures and service data once, and the survey responses twice. I would say - just download the file as a regular .csv, know the responses are encrypted and apply your decrypting method - either automatic like with R or Python, or use an open source GUI tool to do that. 

That's it, I hope this is not confusing.

r0berts

Please Log in to join the conversation.

More
2 years 4 months ago #242109 by DenisChenu

Yes, this last point is likely to be difficult to explain to _all_ users - your private key (decryption key) is not going to be stored, save it, or else the data you collect will be irretrievably lost. I am tempted to say - we can give users a tutorial how to create keys suitable to sodium, we could provide a method how to verify them, but let the user create both keys and only give to LS the public key. Otherwise there might be users who will just say yes-yes-yes to everything without understanding and will end up with losing limeSurvey generated private key AND will still collect the data, but then will end up being unable to decrypt (because they did not read everything and lost/didn't save their private key). If they are capable to go through the trouble of generating the key pair (which is fairly trivial, actually), they should be in good position to have the keys and not bother limesurvey developers with un-answerable questions, like - how do I decrypt my data if I don't have...my private key. 

 
Good point here ! Your idea is better !
:+1:

Okay, the global probably means the one that LS generates for the current encryption method. I would say - let the user clearly choose between the two encryption methods. The current method protects mainly against rogue database administrators, the method we are talking about - against webserver attacks and state actors.

 
Yes : global/default is the current one.

I think token is just something that belongs to service data (like submitdate and response id, seed etc). I would say - encrypt _just_ the submitted reponse fields, not service data. Nothing generated by limesurvey should be encrypted. Tokens are only needed to ensure one user - one response. We need to protect only what is inside the response, not the token.  The token does not identify the data uniquely - unless we can decrypt the user responses (like "my name, my address" etc).

 
For token : i mean token data : firstname, email and attribute. But if we crypt without the key to decrypt : we can not use it .
You'r right : using really private key for Response table.

Yes, statistics are disabled for encrypted questions, that is for sure. The other points - yes, they make sense. 
 
We need to «just» replace the function for crypt / decryt for SurveyDynamic and Response : is survey settings use a really private key, return crypted data in all other function (export, statistics, reload etc …(maybe empty data when reload only but it's a issue to fix after))

- About the way to decrypt : we need to offer some tools ?
Maybe create a Plugin near limesurvey-encrypt but with sodium is more easy.
If user create own keys pair : it less needed. We can add it after feature is done.

I think the responses are an array of responses and each element of the array gets encrypted individually and stored into the respective mysql table cell. Maybe there are better ways to do it, but I just don't know. 

 
All here is for another way to crypt, not the proposed solution where you can keep some uncryoted data for analysis, export etc … for user without the private key and have some more if need validation by the user with porivate key.
Sorry to introduce this solution here and create more confusion

PS : i update the mantis issue with the previous proposition

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.
The following user(s) said Thank You: r0bis

Please Log in to join the conversation.

More
2 years 4 months ago - 2 years 4 months ago #242192 by r0bis
Thank you. I think this is quite right that the offline decryption tool could be something of a future benefit/feature. 

I understand about the encryption of tokens alongside names and e-mails in the central participant database. I think this might be not as critical as one might think. If I am a very security conscious organisation with a strong perimeter and I do not particularly trust the servers that data are stored on, I think I would not wish to put any personally identifiable information on a LS server, unless I am fully in control over the server (in which case I would install LS within the perimeter). And then I might not need encryption of the records.

So if I need the benefits of Limesurvey (which there are many of) and I want to be really safe with my data, I will use a different method for sensitive information. I will create a local protected database file (say encypted Excel, or sqlite) and store the sensitive personally identifiable information there. One table column will contain a one way hash generated from the sensitive information, so that the record can be identified uniquely. 

In distributing survey links to my respondents I will embed the one-way hash as the answer to an invisible question in my survey, therefore I will be able to attribute the responses to my population locally and I will not store any identifiable information on the LS server outside my perimeter. One could ask - why encrypt responses then?

Well, response encryption can be very important for two reasons. One is that it enhances my status as an organisation that takes all reasonable steps to protect confidential information. The second one is when I need free text responses. What if it is a survey of patients and one of them either deliberately or by accident writes personally identifiable information in those responses. It should happen rarely, but if there is a large number of patients it will happen at some point. If I have made sure the responses are encrypted and only I can decrypt them - this is not a problem. To think of it - encryption could be of benefit even in the case of just numerical answers. Even in the case if the patient by accident extracts their one-way hash and makes it public on social media, even then the hash is stored encrypted (because it is just one of the survey responses) and if an attacker were to take control of the server, all they would have is a set of encrypted responses.

I hope it is clear why in this scheme I am not worried about storage in the central participant database (although some organisations might be). I just see that storage in the Limesurvey central database necessitates decryption method with the secret key on the server because how otherwise the server could send out e-mails. 

As regarding the token - I think that there might not be a need to encrypt that in the scheme that I have described. It's function is primarily to not allow duplicate responses or link the responses to the perticipant in the central participant database. If an anonymous token were to be stolen, the worst thing would be - one could submit a bogus reponse before the real respondent manages to. In which case the respondent (if they cannot submit the response) would alert us and we would know something is amiss. We would identify the real responses by our one-way-hash code anyway, we would therefore discard the bogus response. I think that current mechanism of encryption might be enough for LS central database, because organisations with less stringent imperative for protecting confidential data will store respondent information (such as their e-mail) on the server, but those that have a high bar for protecting data - will very likely want to use LS in the way how I have described.

Sorry, this was a long answer, but I hope it makes clear why encrypting just the responses would be the simpler and better way to go.
 

r0berts
Last edit: 2 years 4 months ago by r0bis.
The following user(s) said Thank You: DenisChenu

Please Log in to join the conversation.

More
2 years 4 months ago #242196 by DenisChenu

In distributing survey links to my respondents I will embed the one-way hash as the answer to an invisible question in my survey, therefore I will be able to attribute the responses to my population locally and I will not store any identifiable information on the LS server outside my perimeter. One could ask - why encrypt responses then?
 
I have already worked for staff delegations where the anonymity of the answers had to be strong.
We ended up handing out token codes by the hat.
Even if i'm the only person with access to token code.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.

Please Log in to join the conversation.

More
2 years 4 months ago #242208 by r0bis
Hmm, here I think I have a different paradigm. The responses must NOT be anonymous. I.e. WE need to be able to identify which patient the responses belong to. What we cannot allow is to for anyone else (which includes server administrators and potential attackers) to identify the patient and his/her response.

r0berts

Please Log in to join the conversation.

More
2 years 4 months ago #242222 by DenisChenu

Hmm, here I think I have a different paradigm. The responses must NOT be anonymous. I.e. WE need to be able to identify which patient the responses belong to. What we cannot allow is to for anyone else (which includes server administrators and potential attackers) to identify the patient and his/her response.
 
I think there are no automatic way to do this.

Token table without any information, have the token information on your computer.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.

Please Log in to join the conversation.

More
2 years 4 months ago - 2 years 4 months ago #242241 by r0bis
I am sorry, I think I am not making myself clear. We do not make use of tokens at all, because there is not a participant table on LS server. In survey responses the limesurvey token field will be NULL, but I will identify the participant on my end - locally, by the code that I have assigned to the participant (as described above). This kind of security is important if you want to utilise survey for the same population repeatedly. For example you want to monitor treatment progress and you employ the same questionnaire every 2 weeks for every individual patient over the course of a year, or several years. 

Because I manage the patient information  on my end it is my problem how to generate the patient identifying code (which I call one-way hash, but I think you are calling a token - which is why I think of Limesurvey token then). I can write up how to automate the process locally using R, or perhaps Python. I definitely do not think that Limesurvey should try to make identifiers for the patients. The way how limesurvey operates is fine - I already can do the process automatically on my end, so really just having the encrypted responses on LS server and decryption key with me, locally -- that is what would be useful for organisations dealing with information submitted by patients.

 

r0berts
Last edit: 2 years 4 months ago by r0bis.

Please Log in to join the conversation.

More
2 years 4 months ago #242242 by holch
If there is no personal, identifiable data on Limesurvey, what additional value does the encryption give?

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
The following user(s) said Thank You: DenisChenu, r0bis

Please Log in to join the conversation.

More
2 years 4 months ago - 2 years 4 months ago #242260 by r0bis
I have explained it a couple of posts above , but I will gladly recap:
It is relatively OK with numeric data. If you want to go a step further and allow for free text responses, you can have a situation where people disclose personal data either theirs or of others inadvertently (or deliberately). For an example, think of mental healthcare settings. But if you have the responses encrypted, you are not worried with free text responses. Even if Limesurvey is compromised, the responses are safe. 

I am not sure if you know just how difficult it is to trust with personal data in healthcare settings, but they are big organisations. And they need good open-source mechanisms to collect and use patient feedback. Which the setup with LS and R or Python would fit very well, especially if responses could be encrypted so that they could not be decrypted if the server were to be compromised.

r0berts
Last edit: 2 years 4 months ago by r0bis.

Please Log in to join the conversation.

Moderators: tpartnerholch

Lime-years ahead

Online-surveys for every purse and purpose