Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

Bias in rand function for random number generation ?

  • lea_crtl
  • lea_crtl's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
8 months 4 weeks ago #258040 by lea_crtl
Hi,

I've observed a weird phenomenon. I defined a random variable in my survey using an equation type question and defining it at such :
{floor(rand(1,2.9999))}
This is meant to carry out some A/B (here, 1/2) testing. However, out of 65 responses, I recorded 43 "1" versus 22 "2". I would have expected this to be much more balanced. Is something wrong in my code? Or may it be that the probabilities went against me and that it will end up being balanced with more respondents?
(I know there are other ways to do A/B testing in LimeSurvey, using randomization groups for example, but since I launched the survey already I would like to avoid doing such major modifications.)

Thank you very much for your help !

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago #258041 by holch
Getting close to even distribution will only happen with big samples. n=65 for sure is not a big sample and I am not overly surprised that you are getting a distribution fo 1 vs 2. However, why are you using (1,2.999 and not (1,2)?

Afaik PHP rand already creates random integers.
www.php.net/manual/en/function.rand.php

Of course we can't exclude a possible bias (afaik no random number generator is 100% unbiased) in the rand function of PHP, but I would attribute the distribution you are showing rather to chance than to a bias in this dimension.

Test it for n=1000 or n=10000 and see what happens.

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago - 8 months 4 weeks ago #258042 by holch
Also, you are aware that whenever you call this equation within the same response the random number is generated again, right? E.g. when the respondent get to this equation question, the "dice is rolled" and a random number is created. However, if you call this equation later in the questionnaire again, e.g. to check which random number it was, the dice is rolled again and the random number can be different to the initial random number.

You need to prevent that by checking if they equation already has been filled and then keep the current number, only if it is empty generate a random number.

There are many examples here in the forum how to do this.

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Last edit: 8 months 4 weeks ago by holch.

Please Log in to join the conversation.

  • Joffm
  • Joffm's Avatar
  • Offline
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago #258045 by Joffm
And for your information:
LimeSurvey uses the Mersenne-Twister
[url] en.wikipedia.org/wiki/Mersenne_Twister [/url]

Joffm

Volunteers are not paid.
Not because they are worthless, but because they are priceless
The following user(s) said Thank You: lea_crtl

Please Log in to join the conversation.

  • lea_crtl
  • lea_crtl's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
8 months 4 weeks ago #258047 by lea_crtl
Thank you for this reply. Since I only use conditions based on this equation (e.g. I display a question group if x = = 1), the dice is not rolled multiple times. Thank God, since this would have thrown away the whole experiment.
But you're right: I'm going to use rand(1,2) instead. And I will wait for more respondents to see if it balances.
Thanks again!
Léa

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago - 8 months 4 weeks ago #258048 by holch

Thank you for this reply. Since I only use conditions based on this equation (e.g. I display a question group if x = = 1), the dice is not rolled multiple times.


I think it is. Afaik the condition of the question will call the equation and the function is called again.Let's see if Joffm agrees. I think he has tested this in the past in practice. I think I have tested it too, but not sure anymore. 

But you can easily test it. Create a small sample survey, let Limesurvey generate the random number, then on another page create a condition for two questions, one shown if 1, the other if 2 and not down the results.

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Last edit: 8 months 4 weeks ago by holch.

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago - 8 months 4 weeks ago #258050 by holch
OK, I can confirm that the dice is rolled again if a condition is called. Just did 4 tests in a small survey with the equation question on the first page and then two text display questions, one is shown if the random number is 1, the other if the random number is 2 (all in LS 6, but this should be valid for ALL previous Limesurvey versions down to 1.x).

Then I went through the survey 4 times.

Results:
1) (2) --> (2)
2) (2) --> (1)
3) (1) --> (2)
4) (2) --> (1)

Sorry for being the barer of bad news. But better now than later. Do your own tests, but I think you will have to start over. 

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Last edit: 8 months 4 weeks ago by holch.

Please Log in to join the conversation.

  • lea_crtl
  • lea_crtl's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
8 months 4 weeks ago - 8 months 4 weeks ago #258051 by lea_crtl
Hmm this is very concerning but I just checked and out of 63 responses I noticed no occurrences of cases where the value of the variable changes after condition call. I attached the .lss export of my survey, in case you're interested to investigate. The random variable is called "randDecision" and groups 24412 to 24424 and 24413 to 24433 are called only if it is equal to 1. Groups 25262 to 25271 and 25272 to 25281 are called only if it is equal to 2.
Last edit: 8 months 4 weeks ago by lea_crtl. Reason: clarification

Please Log in to join the conversation.

  • Joffm
  • Joffm's Avatar
  • Offline
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago - 8 months 4 weeks ago #258055 by Joffm
Well, we do not know your design.
But:
If your construct "rand(1,2)" is in a group with other questions the value is changed each time you click.
This is the same behaviour that you see in Excel, where a random number is changed each time you enter something in any cell of the sheet.

In many cases this doesn't matter.
If you later display some groups with "randnum==1" resp. "randnum==2" you see in your data which group was answered.
So the stored value of the randnum is not really relevant.
BUT it is really relevant if you use tayloring, e.g. display images with something like <img src=".../myimage{randnum}.jpg" />

Therefore we use this construct {if(is_empty(randnum),rand(1,2),randnum)}

Coming back to your distribution. As @holch, it is not really surprising in a sample of N=65.
Did you do a test in Excel.
I did.
Here the result of 50 runs
 

If you really have only a small sample, you have to observe your survey.
If you notice that the group 1 is full, you can always change the equation to "rand(2,2)" that the rest of your participants will be leaded to group 2.

Joffm


 

Volunteers are not paid.
Not because they are worthless, but because they are priceless
Last edit: 8 months 4 weeks ago by Joffm.
The following user(s) said Thank You: lea_crtl

Please Log in to join the conversation.

  • lea_crtl
  • lea_crtl's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
8 months 4 weeks ago #258057 by lea_crtl
Thank you. I'm realizing that my original question was a bit stupid (it's just the basic behaviour of a random variable) but at least it made me aware of the fact that my random variables change during the survey. So if I understood you well the condition == 1 call does not change the value of the variable, right? At least this is what I've observed until now...
And although I don't use tayloring for this specific variable, I do for others. I'm going to use your piece of code to circumvent the problem.
Thank you very much to both of you!

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago - 8 months 4 weeks ago #258059 by holch

So if I understood you well the condition == 1 call does not change the value of the variable, right?


In my opinion it does. Well, it doesn't change the number necessarily. It just triggers a new draw, which in your case has a 50/50 chance of changing the number or leaving as is.

Because in your case the condition doesn't call a number, but rather a question which contains a formula.

What is strange is that you don't seem to perceive any changes throught your test, because in my opinion you should see changes, as with every call to the equation the dice should be rolled if you do not use the if(is_empty( construct that Joffm posted. My test also shows this. The random number was for some cases 2 at the beginning and on the next page where the conditions were it was another one.

I will have a look at your survey later, but the group numbers already scare me a little.... :-)

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Last edit: 8 months 4 weeks ago by holch.

Please Log in to join the conversation.

  • holch
  • holch's Avatar
  • Away
  • LimeSurvey Community Team
  • LimeSurvey Community Team
More
8 months 4 weeks ago #258060 by holch

The random variable is called "randDecision" and groups 24412 to 24424 and 24413 to 24433 are called only if it is equal to 1. Groups 25262 to 25271 and 25272 to 25281 are called only if it is equal to 2.


Those numbers don't help at all, because these are group numbers created by the Limesurvey-Installation and obviously the numbers in my installation are different to the numbers in your installation. I would need the names of the groups.

And you have about 5 different equations that generate a random number. Not sure I understand.

This will take a while to understand what you are doing there.

I answer at the LimeSurvey forum in my spare time, I'm not a LimeSurvey GmbH employee.
No support via private message.

Please Log in to join the conversation.

Moderators: holchtpartner

Lime-years ahead

Online-surveys for every purse and purpose