Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Bias in rand function for random number generation ?

More
1 year 5 months ago #258040 by lea_crtl
Hi,

I've observed a weird phenomenon. I defined a random variable in my survey using an equation type question and defining it at such :
{floor(rand(1,2.9999))}
This is meant to carry out some A/B (here, 1/2) testing. However, out of 65 responses, I recorded 43 "1" versus 22 "2". I would have expected this to be much more balanced. Is something wrong in my code? Or may it be that the probabilities went against me and that it will end up being balanced with more respondents?
(I know there are other ways to do A/B testing in LimeSurvey, using randomization groups for example, but since I launched the survey already I would like to avoid doing such major modifications.)

Thank you very much for your help !

Please Log in to join the conversation.

More
1 year 5 months ago #258041 by holch
Getting close to even distribution will only happen with big samples. n=65 for sure is not a big sample and I am not overly surprised that you are getting a distribution fo 1 vs 2. However, why are you using (1,2.999 and not (1,2)?

Afaik PHP rand already creates random integers.
www.php.net/manual/en/function.rand.php

Of course we can't exclude a possible bias (afaik no random number generator is 100% unbiased) in the rand function of PHP, but I would attribute the distribution you are showing rather to chance than to a bias in this dimension.

Test it for n=1000 or n=10000 and see what happens.

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258042 by holch
Also, you are aware that whenever you call this equation within the same response the random number is generated again, right? E.g. when the respondent get to this equation question, the "dice is rolled" and a random number is created. However, if you call this equation later in the questionnaire again, e.g. to check which random number it was, the dice is rolled again and the random number can be different to the initial random number.

You need to prevent that by checking if they equation already has been filled and then keep the current number, only if it is empty generate a random number.

There are many examples here in the forum how to do this.

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
Last edit: 1 year 5 months ago by holch.

Please Log in to join the conversation.

More
1 year 5 months ago #258045 by Joffm
And for your information:
LimeSurvey uses the Mersenne-Twister
[url] en.wikipedia.org/wiki/Mersenne_Twister [/url]

Joffm

Volunteers are not paid.
Not because they are worthless, but because they are priceless
The following user(s) said Thank You: lea_crtl

Please Log in to join the conversation.

More
1 year 5 months ago #258047 by lea_crtl
Thank you for this reply. Since I only use conditions based on this equation (e.g. I display a question group if x = = 1), the dice is not rolled multiple times. Thank God, since this would have thrown away the whole experiment.
But you're right: I'm going to use rand(1,2) instead. And I will wait for more respondents to see if it balances.
Thanks again!
Léa

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258048 by holch

Thank you for this reply. Since I only use conditions based on this equation (e.g. I display a question group if x = = 1), the dice is not rolled multiple times.


I think it is. Afaik the condition of the question will call the equation and the function is called again.Let's see if Joffm agrees. I think he has tested this in the past in practice. I think I have tested it too, but not sure anymore. 

But you can easily test it. Create a small sample survey, let Limesurvey generate the random number, then on another page create a condition for two questions, one shown if 1, the other if 2 and not down the results.

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
Last edit: 1 year 5 months ago by holch.

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258050 by holch
OK, I can confirm that the dice is rolled again if a condition is called. Just did 4 tests in a small survey with the equation question on the first page and then two text display questions, one is shown if the random number is 1, the other if the random number is 2 (all in LS 6, but this should be valid for ALL previous Limesurvey versions down to 1.x).

Then I went through the survey 4 times.

Results:
1) (2) --> (2)
2) (2) --> (1)
3) (1) --> (2)
4) (2) --> (1)

Sorry for being the barer of bad news. But better now than later. Do your own tests, but I think you will have to start over. 

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
Last edit: 1 year 5 months ago by holch.

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258051 by lea_crtl
Hmm this is very concerning but I just checked and out of 63 responses I noticed no occurrences of cases where the value of the variable changes after condition call. I attached the .lss export of my survey, in case you're interested to investigate. The random variable is called "randDecision" and groups 24412 to 24424 and 24413 to 24433 are called only if it is equal to 1. Groups 25262 to 25271 and 25272 to 25281 are called only if it is equal to 2.
Last edit: 1 year 5 months ago by lea_crtl. Reason: clarification

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258055 by Joffm
Well, we do not know your design.
But:
If your construct "rand(1,2)" is in a group with other questions the value is changed each time you click.
This is the same behaviour that you see in Excel, where a random number is changed each time you enter something in any cell of the sheet.

In many cases this doesn't matter.
If you later display some groups with "randnum==1" resp. "randnum==2" you see in your data which group was answered.
So the stored value of the randnum is not really relevant.
BUT it is really relevant if you use tayloring, e.g. display images with something like <img src=".../myimage{randnum}.jpg" />

Therefore we use this construct {if(is_empty(randnum),rand(1,2),randnum)}

Coming back to your distribution. As @holch, it is not really surprising in a sample of N=65.
Did you do a test in Excel.
I did.
Here the result of 50 runs
 

If you really have only a small sample, you have to observe your survey.
If you notice that the group 1 is full, you can always change the equation to "rand(2,2)" that the rest of your participants will be leaded to group 2.

Joffm


 

Volunteers are not paid.
Not because they are worthless, but because they are priceless
Last edit: 1 year 5 months ago by Joffm.
The following user(s) said Thank You: lea_crtl

Please Log in to join the conversation.

More
1 year 5 months ago #258057 by lea_crtl
Thank you. I'm realizing that my original question was a bit stupid (it's just the basic behaviour of a random variable) but at least it made me aware of the fact that my random variables change during the survey. So if I understood you well the condition == 1 call does not change the value of the variable, right? At least this is what I've observed until now...
And although I don't use tayloring for this specific variable, I do for others. I'm going to use your piece of code to circumvent the problem.
Thank you very much to both of you!

Please Log in to join the conversation.

More
1 year 5 months ago - 1 year 5 months ago #258059 by holch

So if I understood you well the condition == 1 call does not change the value of the variable, right?


In my opinion it does. Well, it doesn't change the number necessarily. It just triggers a new draw, which in your case has a 50/50 chance of changing the number or leaving as is.

Because in your case the condition doesn't call a number, but rather a question which contains a formula.

What is strange is that you don't seem to perceive any changes throught your test, because in my opinion you should see changes, as with every call to the equation the dice should be rolled if you do not use the if(is_empty( construct that Joffm posted. My test also shows this. The random number was for some cases 2 at the beginning and on the next page where the conditions were it was another one.

I will have a look at your survey later, but the group numbers already scare me a little.... :-)

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.
Last edit: 1 year 5 months ago by holch.

Please Log in to join the conversation.

More
1 year 5 months ago #258060 by holch

The random variable is called "randDecision" and groups 24412 to 24424 and 24413 to 24433 are called only if it is equal to 1. Groups 25262 to 25271 and 25272 to 25281 are called only if it is equal to 2.


Those numbers don't help at all, because these are group numbers created by the Limesurvey-Installation and obviously the numbers in my installation are different to the numbers in your installation. I would need the names of the groups.

And you have about 5 different equations that generate a random number. Not sure I understand.

This will take a while to understand what you are doing there.

Help us to help you!
  • Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
  • Always provide a LSS file (not LSQ or LSG).
Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.

Please Log in to join the conversation.

More
1 year 5 months ago #258061 by lea_crtl
Hi, yes I'm sorry this is not a minimal example at all, my survey is quite complicated. The names might not help either (many are identical), so I'm just going to give you the Group and Question orders. I hope this works.

Variables defined in questions 5/1 to 5/4 are other random variables that I use for tayloring (I'm making a reference to what Joffm explained) respectively in questions 7/0 & 8/0, 9/0 & 10/0, 11/0 & 12/0, 13/0 & 14/0. Again, contrarily to what Joffm said, I did not notice any switch in values when I call these variables in my image paths. Out of 63 responses, I conclude that it simply does not happen?

Variable defined in question 5/7 is the variable my original question was about. If it is equal to 1, groups 15 to 18, 24 to 33, and 46 to 55 will display. If it is 2, it will be groups 19 to 22, 34 to 43 and 56 to 65.

Please Log in to join the conversation.

More
1 year 5 months ago #258065 by Joffm
What I see, is that the random numbers are generated as
 
but in the dataset you see this
 

The click on "next" changes the random numbers.
So it is dangerous to use the plain "rand()" function.

If it is fine in your survey, the correct questions/groups are displayed as stored in the dataset, don't care.

And your question about the distribution is answered.

BTW: In version 6.x there is no class "hidden".
As Limesurvey now is based on bootstrap 5, this class is called "d-none"

Joffm

Volunteers are not paid.
Not because they are worthless, but because they are priceless

Please Log in to join the conversation.

More
1 year 5 months ago #258068 by Joffm
One last word.
Your advantage is that the generation of random numbers take place in a separate group.
So it doesn't matter that they change when you leave this group.
Would affect the survey if there were questions inside this group that depend on the random number.

And of course if participants go back to the demographics, pass the "random" group again, they will be surprised to see different questions this time.
But this is the only point that you can avoid by hiding the "back" button generally or by javascript in the first question after the "random" group.

Joffm
 

Volunteers are not paid.
Not because they are worthless, but because they are priceless

Please Log in to join the conversation.

Moderators: holchtpartner

Lime-years ahead

Online-surveys for every purse and purpose