Caching and Load Balancing with Redis

2 years 1 month ago #178510 by bruce78
I've been playing around with ways to speed up and potentially scale LS with Redis and I've got a couple of findings that I think are quite interesting.

LimeSurvey's own cache
Assuming you've installed redis to your satisfaction, adding the following to your config.php file. As I understand it and please correct me if I'm wrong, this uses redis as a cache for the admin section of LS. I don't think this impacts people taking surveys.
		'hostname' => '',

Using Redis to cache user sessions
So Redis doesn't solve the problem of LS writing out the full survey shown to a respondent before the first page of the survey is displayed as outlined here . It does let you save your session files to ram on either the same machine or a different one as you choose.

I added the following to my php config...
php_admin_value open_basedir /usr/share/php
php_admin_value session.save_handler redis
php_admin_value session.save_path tcp://



stuff came up during debugging, I'm not if it's essential but this came up on my machine.

The advantage here is that LS sessions are now first written to ram and not an SSD (redis subsequently writes its own db to disk to ensure a degree of session persistence). The session file is then always called via ram and not a drive. You can also use Redis to help load balance LS by having multiple servers running LS and if they all use the same Redis db, you should be able to build in some redundancy.

I've setup HAproxy to do ssl termination and I'm sure you can do the same with nginx.
The topic has been locked.
LimeSurvey Partners
2 years 1 month ago #178516 by DenisChenu
About open_basedir : leave it to null if you can
php_admin_value open_basedir none

After, when it work : you can restrict

Assistance on LimeSurvey forum and LimeSurvey core development are on my free time.
I'm not a LimeSurvey GmbH member, professional service on demand , plugin development . I don't answer to private message.
The topic has been locked.
2 years 1 month ago #178518 by bruce78
Yeah, I had to turn on the open_basedir stuff due to it throwing up an error message...
The topic has been locked.
1 year 8 months ago #184604 by fclim19
Sorry to reply to a 4 month old thread, but I have been trying to get a redis cache system going for a while now, and I have done the same options listed in the first post, except for the open_basedir, that option is commented out in my php.ini file.

Testing with using redis cache works great, but once I get under a high load, it starts to fail and my users are asked for their token again in the middle of the survey (I used a token required system, so without the token, which is supplied in the URL to start).

I'm running version 3.17.3+190429 with php-fpm 7.2 and apache 2.4. Does anyone have a suggestion on what I can look for to see why this is happening? Due to the high volume of survey takers I get (and the fact that it takes between 30 and 40 minutes to complete), I would like to cluster and load balance multiple servers during peak times.

		'hostname' => 'localhost',

php_value[session.save_handler] = redis
php_value[session.save_path]    = "tcp://"
php_value[soap.wsdl_cache_dir]  = /var/opt/rh/rh-php72/lib/php/wsdlcache

I thought at first it might be the timeout options in php.ini since it is set to timeout after 24 minutes by default, but this doesn't seem to be the case as I get reports of users getting a missing token message after only a couple questions in.
The topic has been locked.
1 year 8 months ago #184618 by jelo
Are you able to reproduce the issue on your own? I wouldn't rule out an bug inside the LimeSurvey release. The nonfilebased sessions are not that much tested.

Issue only under load sounds like a hint. But perhaps it's not load, but the amount of unfinished surveys (different than a direct peak load issue).

Check the phpinfo inside LimeSurvey if PHP-FPM settings are really applied to the application. PHP-PFM adds an additional layer of confusion when it comes to PHP-Settings and Apache htaccess isn't working in the same way when it comes to PHP-FPM.

What is the exact error message/behavior? Somewhere in the survey, people see the token enter screen. I'm asking again, cause the session error message "The CSRF token could not be verified" was sometimes confused with LimeSurvey token.

The meaning of the word "stable" for users
The topic has been locked.
1 year 8 months ago #184621 by fclim19
I'm told that the message is not the CSRF token could not be verified. It just shows a screen asking for a token to continue. It does this at random intervals for each user. Sometimes one question in, sometimes 20.

To better clarify, our users are given a link that gets them to the correct survey as well as a token to proceed. The survey can take some time to complete as there are around 50 to 60 questions on average, and at some point (randomly), a user is shown a screen asking for a token to continue.

I double checked the phpinfo inside lime survey and php-fpm is enabled with error tracking. No errors are showing up in the php error log.

Now I did notice a couple items while checking on some of the apache worker settings. I made sure I had the correct number of threads and child processes with the event worker. I'm using an RDS database, and it looks like I may have even hit a limit on simultaneous connections. I modified the setting to go as high at 1200 at once (which before was somewhere around 320, AWS does a weird calculation to get their values here)

I won't get hit hard again until Monday morning, so any chances will have to be verified then. If this is going to be an issue using redis, has anyone considered using a shared file system like glusterfs to share sessions? Would that even work? Thanks again,
The topic has been locked.
1 year 7 months ago #185325 by fclim19
So I haven't gotten hit like I expected, and I won't for a couple months. In the mean time, I have found a couple things that maybe cause of my issue. I suspect that some data stored in cookies, specifically the _ga and _gid there both contain timestamps (in Unix Epoch) and reading through my apache logs, I see POST requests such as:

POST /survey/index.php?r=survey/index&sid=895875&1559328555545

I see that last number there (1559328555545) is the same number stored in the _ga and _gid cookies. What I'm seeing is that at the time I received complaints of the survey breaking, I get multiple POST requests with that same timestamp (the 895875 is my survey id). Could it be that at times I'm getting hit so hard that I'm getting requests at the same time, down to the millisecond? That's the only thing I can think of. So now I'm wondering what can I do to extend this timestamp and add some random numbers at the end as an extra measure to prevent these collisions? Maybe this is a topic for another thread, I worry I'm getting too far away from the original topic. Any suggestions are greatly appreciated.

Also, for those who maybe curious, at my peak over the month of May, I was hit hard from 6am to noon every week day (outside those times were minimum and I received no complaints). My users completed over 188,000 surveys in that time frame. I expect this to grow as my organization is moving international.
The topic has been locked.

Start now!

Just create your account and start using Limesurvey today.

Register now