Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Why store private key on server?

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago #242063 by r0bis
Replied by r0bis on topic Why store private key on server?
Many thanks for discussing this. 

Just to note about keys. Sodium supports both symmetric cryptography - i.e. one key is used to encrypt and decrypt. It must be stored somewhere on the server/database so that LS would be able to encrypt. It is not what is used. If server were to be compromised data can be decrypted using the said key. Instead asymmetric encryption is used on LS with two keys private and public. They are generated at the same time by a program (similar to ssh-keygen); the private key _can_ be password protected, but does not have to. The essential bit here is that the private key should be kept in a very safe place (e.g. your own computer, memory stick, etc).  The public key may be kept wherever (even on public keyservers). The data you encrypt with the public key can be only decrypted by the private key. If both keys (pub&priv) are on the server and the server is compromised, the situation is as good as with the symmetric encryption key.

> You might consider having a key-management section, where you can import/export public/private keys and assign and remove them on surveys and surveygroups.
> A way to backup keys might be important for LimeSurvey cloud users as well.

Yes indeed. Absolutely - something like a key management for gitea or github. In gitea scenario you are responsible for making the pub&priv key pair and then you have to upload the public key (which is easy - it is just a string of text). That might be actually the better way to implement the second encryption scenario. If that is done, I would volunteer to write the manpage in English for this. That would be encryption "method-two" (the more secure one) and would require the user to generate the keys, which is not too difficult actually. The benefit is - LS does not have to temporarily store the private key - even in server memory (if LS is to generate the key pair, then it should be reasonably sure the user has got the private key - which is doable, but may be cumbersome). What should be there - a mechanism for the user to verify that the key pair works, before they start data collection. It should not be too difficult - the user could be given some string encrypted by the public key and should be given clear instructions how to decrypt it and paste back to LS for verification. Something like that. Or it maybe just as simple as asking the user to submit some test data to survey and decrypting on the local computer, using their private key. Which does bring up some complication - but there are open source solutions for that such as opencsv  and on basis of that there could be some local application to do that. The problem is that the csv arrives with the csv structure (commas and line breaks) unencrypted, in plain text so to speak, but the values between commas are encrypted. Processing it in R would be fairly trivial, but I am sure there could be some solutions, however, I think, they do not need to be provided by Limesurvey. Only something to make sure the public key that LS sees on the server works with the private key that the user has.

I understand - it would make things a little bit more complex, but not everyone needs to care about survey data encryption. People in healthcare have to, though.  

> Slightly off-topic:
> That scenario is the only one really offering a security.  With an private key on the webspace I don't see much protection. The main attack vector is via the webserver > and not via the database server. The credentials to access the database would be available to an attacker of the webspace.

Thanks; yes, exactly, this is why I started this topic and raised the feature request.

r0berts
The following user(s) said Thank You: DenisChenu

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago #242064 by r0bis
Replied by r0bis on topic Why store private key on server?
In a public-key encryption system, anyone with a public key can encrypt a message, yielding a ciphertext, but only those who know the corresponding private key can decrypt the ciphertext to obtain the original message. [wpedia]

r0berts
The following user(s) said Thank You: DenisChenu

Please Log in to join the conversation.

  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
1 year 9 months ago #242068 by DenisChenu
Replied by DenisChenu on topic Why store private key on server?
Great thanks !

Then we can have something like

- Keep the current Global key
- Key management page : list of public key
- Create a new key : allow to get the private key, but not saved on server.
- In survey settings : list of key : «Key used to encrypt data» (Default to Global one)
- Survey activation disable update of the key (what for Token ?)
- If key is not global : no decrypt : some warning
     - Response can not be reloaded for crypted question
     - When export get data crypted (specific format ?)
     - Statistics are disable for crypted question

About the way to decrypt : we need to offer some tools ?

Maybe create a Plugin near limesurvey-encrypt but with sodium is more easy.

When user submit : whole response line was deleted and saved crypted in another table (way to save before cryot ? CSV ? JSON ?)
A new button to get whole crypted data : all file was crypted.
 

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.
The following user(s) said Thank You: r0bis

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago #242103 by r0bis
Replied by r0bis on topic Why store private key on server?
## Limesurvey user permissions

For the readonly permissions the user needs to have permissions to Export the survey.

Thanks, that is very good to know.

- Keep the current Global key
- Key management page : list of public key
- Create a new key : allow to get the private key, but not saved on server.

Yes, this last point is likely to be difficult to explain to _all_ users - your private key (decryption key) is not going to be stored, save it, or else the data you collect will be irretrievably lost. I am tempted to say - we can give users a tutorial how to create keys suitable to sodium, we could provide a method how to verify them, but let the user create both keys and only give to LS the public key. Otherwise there might be users who will just say yes-yes-yes to everything without understanding and will end up with losing limeSurvey generated private key AND will still collect the data, but then will end up being unable to decrypt (because they did not read everything and lost/didn't save their private key). If they are capable to go through the trouble of generating the key pair (which is fairly trivial, actually), they should be in good position to have the keys and not bother limesurvey developers with un-answerable questions, like - how do I decrypt my data if I don't have...my private key. 

- In survey settings : list of key : «Key used to encrypt data» (Default to Global one)

Okay, the global probably means the one that LS generates for the current encryption method. I would say - let the user clearly choose between the two encryption methods. The current method protects mainly against rogue database administrators, the method we are talking about - against webserver attacks and state actors.
 
- Survey activation disable update of the key (what for Token ?)

I think token is just something that belongs to service data (like submitdate and response id, seed etc). I would say - encrypt _just_ the submitted reponse fields, not service data. Nothing generated by limesurvey should be encrypted. Tokens are only needed to ensure one user - one response. We need to protect only what is inside the response, not the token.  The token does not identify the data uniquely - unless we can decrypt the user responses (like "my name, my address" etc).

- If key is not global : no decrypt : some warning
     - Response can not be reloaded for crypted question
     - When export get data crypted (specific format ?)
     - Statistics are disable for crypted question

Yes, statistics are disabled for encrypted questions, that is for sure. The other points - yes, they make sense. 

- About the way to decrypt : we need to offer some tools ?
Maybe create a Plugin near limesurvey-encrypt but with sodium is more easy.

I would say we need to allow this to be done by some publicly available tools. I could vouch for R. Python, obviously, would do the trick. If we have the scheme where we encrypt the responses with the public key, I think we are stuck with data tables where the delimiters (such as commas in .csv) are unencrypted and service fields (see above) too are not encrypted, but the responses from the participants are. So - if they (just responses) are encrypted once with a public key we cannot decrypt them with a private key (because we do not have on the server). This means we cannot give the user an file (like .csv) encrypted with the public key, because then we would have encrypted the responses twice (the second time when we are making the export file). So then - we can just give the .csv file where structures (the commas and linebreaks) and service data are plaintext but responses are encrypted. Then it is up to the user to decrypt them locally. I am not sure if Limesurvey developers could provide a java gui tool that would take an user pasted private key and decrypt those responses - i.e. save the csv fully in plaintext locally. They could, but then it is added maintenance, although it would be pretty cool. But maybe such a tool is already available; it would be pretty simple. As for R, I could write a readme how to do that, but making a reliable gui tool is beyond me <sorry>. 

- When user submit : whole response line was deleted and saved crypted in another table (way to save before cryot ? CSV ? JSON ?)

I think the responses are an array of responses and each element of the array gets encrypted individually and stored into the respective mysql table cell. Maybe there are better ways to do it, but I just don't know. 

- A new button to get whole crypted data : all file was crypted.

Yes, a button should be there to get the responses, however the file would be a plaintext structure containing encrypted responses, demarcated by the said structures. I just don't think it makes sense to even try to encrypt the file. As I said above - if the responses are encrypted at the point of storing them into the database, and we do not have the decryption key on the server (the whole idea of this thread), then we cannot encrypt the whole downloadable file. If we try to do so, we will encrypt the data structures and service data once, and the survey responses twice. I would say - just download the file as a regular .csv, know the responses are encrypted and apply your decrypting method - either automatic like with R or Python, or use an open source GUI tool to do that. 

That's it, I hope this is not confusing.

r0berts

Please Log in to join the conversation.

  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
1 year 9 months ago #242109 by DenisChenu
Replied by DenisChenu on topic Why store private key on server?

Yes, this last point is likely to be difficult to explain to _all_ users - your private key (decryption key) is not going to be stored, save it, or else the data you collect will be irretrievably lost. I am tempted to say - we can give users a tutorial how to create keys suitable to sodium, we could provide a method how to verify them, but let the user create both keys and only give to LS the public key. Otherwise there might be users who will just say yes-yes-yes to everything without understanding and will end up with losing limeSurvey generated private key AND will still collect the data, but then will end up being unable to decrypt (because they did not read everything and lost/didn't save their private key). If they are capable to go through the trouble of generating the key pair (which is fairly trivial, actually), they should be in good position to have the keys and not bother limesurvey developers with un-answerable questions, like - how do I decrypt my data if I don't have...my private key. 

 
Good point here ! Your idea is better !
:+1:

Okay, the global probably means the one that LS generates for the current encryption method. I would say - let the user clearly choose between the two encryption methods. The current method protects mainly against rogue database administrators, the method we are talking about - against webserver attacks and state actors.

 
Yes : global/default is the current one.

I think token is just something that belongs to service data (like submitdate and response id, seed etc). I would say - encrypt _just_ the submitted reponse fields, not service data. Nothing generated by limesurvey should be encrypted. Tokens are only needed to ensure one user - one response. We need to protect only what is inside the response, not the token.  The token does not identify the data uniquely - unless we can decrypt the user responses (like "my name, my address" etc).

 
For token : i mean token data : firstname, email and attribute. But if we crypt without the key to decrypt : we can not use it .
You'r right : using really private key for Response table.

Yes, statistics are disabled for encrypted questions, that is for sure. The other points - yes, they make sense. 
 
We need to «just» replace the function for crypt / decryt for SurveyDynamic and Response : is survey settings use a really private key, return crypted data in all other function (export, statistics, reload etc …(maybe empty data when reload only but it's a issue to fix after))

- About the way to decrypt : we need to offer some tools ?
Maybe create a Plugin near limesurvey-encrypt but with sodium is more easy.
If user create own keys pair : it less needed. We can add it after feature is done.

I think the responses are an array of responses and each element of the array gets encrypted individually and stored into the respective mysql table cell. Maybe there are better ways to do it, but I just don't know. 

 
All here is for another way to crypt, not the proposed solution where you can keep some uncryoted data for analysis, export etc … for user without the private key and have some more if need validation by the user with porivate key.
Sorry to introduce this solution here and create more confusion

PS : i update the mantis issue with the previous proposition

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.
The following user(s) said Thank You: r0bis

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago - 1 year 9 months ago #242192 by r0bis
Replied by r0bis on topic Why store private key on server?
Thank you. I think this is quite right that the offline decryption tool could be something of a future benefit/feature. 

I understand about the encryption of tokens alongside names and e-mails in the central participant database. I think this might be not as critical as one might think. If I am a very security conscious organisation with a strong perimeter and I do not particularly trust the servers that data are stored on, I think I would not wish to put any personally identifiable information on a LS server, unless I am fully in control over the server (in which case I would install LS within the perimeter). And then I might not need encryption of the records.

So if I need the benefits of Limesurvey (which there are many of) and I want to be really safe with my data, I will use a different method for sensitive information. I will create a local protected database file (say encypted Excel, or sqlite) and store the sensitive personally identifiable information there. One table column will contain a one way hash generated from the sensitive information, so that the record can be identified uniquely. 

In distributing survey links to my respondents I will embed the one-way hash as the answer to an invisible question in my survey, therefore I will be able to attribute the responses to my population locally and I will not store any identifiable information on the LS server outside my perimeter. One could ask - why encrypt responses then?

Well, response encryption can be very important for two reasons. One is that it enhances my status as an organisation that takes all reasonable steps to protect confidential information. The second one is when I need free text responses. What if it is a survey of patients and one of them either deliberately or by accident writes personally identifiable information in those responses. It should happen rarely, but if there is a large number of patients it will happen at some point. If I have made sure the responses are encrypted and only I can decrypt them - this is not a problem. To think of it - encryption could be of benefit even in the case of just numerical answers. Even in the case if the patient by accident extracts their one-way hash and makes it public on social media, even then the hash is stored encrypted (because it is just one of the survey responses) and if an attacker were to take control of the server, all they would have is a set of encrypted responses.

I hope it is clear why in this scheme I am not worried about storage in the central participant database (although some organisations might be). I just see that storage in the Limesurvey central database necessitates decryption method with the secret key on the server because how otherwise the server could send out e-mails. 

As regarding the token - I think that there might not be a need to encrypt that in the scheme that I have described. It's function is primarily to not allow duplicate responses or link the responses to the perticipant in the central participant database. If an anonymous token were to be stolen, the worst thing would be - one could submit a bogus reponse before the real respondent manages to. In which case the respondent (if they cannot submit the response) would alert us and we would know something is amiss. We would identify the real responses by our one-way-hash code anyway, we would therefore discard the bogus response. I think that current mechanism of encryption might be enough for LS central database, because organisations with less stringent imperative for protecting confidential data will store respondent information (such as their e-mail) on the server, but those that have a high bar for protecting data - will very likely want to use LS in the way how I have described.

Sorry, this was a long answer, but I hope it makes clear why encrypting just the responses would be the simpler and better way to go.
 

r0berts
Last edit: 1 year 9 months ago by r0bis.
The following user(s) said Thank You: DenisChenu

Please Log in to join the conversation.

  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
1 year 9 months ago #242196 by DenisChenu
Replied by DenisChenu on topic Why store private key on server?

In distributing survey links to my respondents I will embed the one-way hash as the answer to an invisible question in my survey, therefore I will be able to attribute the responses to my population locally and I will not store any identifiable information on the LS server outside my perimeter. One could ask - why encrypt responses then?
 
I have already worked for staff delegations where the anonymity of the answers had to be strong.
We ended up handing out token codes by the hat.
Even if i'm the only person with access to token code.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago #242208 by r0bis
Replied by r0bis on topic Why store private key on server?
Hmm, here I think I have a different paradigm. The responses must NOT be anonymous. I.e. WE need to be able to identify which patient the responses belong to. What we cannot allow is to for anyone else (which includes server administrators and potential attackers) to identify the patient and his/her response.

r0berts

Please Log in to join the conversation.

  • DenisChenu
  • DenisChenu's Avatar
  • Offline
  • LimeSurvey Community Team & Official Partner
  • LimeSurvey Community Team & Official Partner
More
1 year 9 months ago #242222 by DenisChenu
Replied by DenisChenu on topic Why store private key on server?

Hmm, here I think I have a different paradigm. The responses must NOT be anonymous. I.e. WE need to be able to identify which patient the responses belong to. What we cannot allow is to for anyone else (which includes server administrators and potential attackers) to identify the patient and his/her response.
 
I think there are no automatic way to do this.

Token table without any information, have the token information on your computer.

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member. - Professional support - Plugins, theme and development .
I don't answer to private message.

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago - 1 year 9 months ago #242241 by r0bis
Replied by r0bis on topic Why store private key on server?
I am sorry, I think I am not making myself clear. We do not make use of tokens at all, because there is not a participant table on LS server. In survey responses the limesurvey token field will be NULL, but I will identify the participant on my end - locally, by the code that I have assigned to the participant (as described above). This kind of security is important if you want to utilise survey for the same population repeatedly. For example you want to monitor treatment progress and you employ the same questionnaire every 2 weeks for every individual patient over the course of a year, or several years. 

Because I manage the patient information  on my end it is my problem how to generate the patient identifying code (which I call one-way hash, but I think you are calling a token - which is why I think of Limesurvey token then). I can write up how to automate the process locally using R, or perhaps Python. I definitely do not think that Limesurvey should try to make identifiers for the patients. The way how limesurvey operates is fine - I already can do the process automatically on my end, so really just having the encrypted responses on LS server and decryption key with me, locally -- that is what would be useful for organisations dealing with information submitted by patients.

 

r0berts
Last edit: 1 year 9 months ago by r0bis.

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Offline
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
1 year 9 months ago #242242 by holch
Replied by holch on topic Why store private key on server?
If there is no personal, identifiable data on Limesurvey, what additional value does the encryption give?

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
The following user(s) said Thank You: DenisChenu, r0bis

Please Log in to join the conversation.

  • r0bis
  • r0bis's Avatar Topic Author
  • Offline
  • Senior Member
  • Senior Member
More
1 year 9 months ago - 1 year 9 months ago #242260 by r0bis
Replied by r0bis on topic Why store private key on server?
I have explained it a couple of posts above , but I will gladly recap:
It is relatively OK with numeric data. If you want to go a step further and allow for free text responses, you can have a situation where people disclose personal data either theirs or of others inadvertently (or deliberately). For an example, think of mental healthcare settings. But if you have the responses encrypted, you are not worried with free text responses. Even if Limesurvey is compromised, the responses are safe. 

I am not sure if you know just how difficult it is to trust with personal data in healthcare settings, but they are big organisations. And they need good open-source mechanisms to collect and use patient feedback. Which the setup with LS and R or Python would fit very well, especially if responses could be encrypted so that they could not be decrypted if the server were to be compromised.

r0berts
Last edit: 1 year 9 months ago by r0bis.

Please Log in to join the conversation.

Moderators: tpartnerholch

Lime-years ahead

Online-surveys for every purse and purpose