Welcome to the LimeSurvey Community Forum

Ask the community, share ideas, and connect with other LimeSurvey users!

random values: a more stable distribution?

jonathanz
Topic Author
Offline
New Member

More

2 years 7 months ago #234220 by jonathanz

random values: a more stable distribution? was created by jonathanz

Hello everyone,

i have a general question regarding the topic of creating a random number. Thanks to the help of all the people in this forum it was pretty easy for me to understand how we can create random numbers and how we can assign different questions and groups to those random numbers.

What i did to create a random number and assign it to one of 3 different groups was this code: if(is_empty( randnumber.NAOK ), rand(1, 3), randnumber.NAOK )
Inside of my 3 groups i used another code to get randomly 1 of 2 questions:
rand(1,2)

So far everything works out and it does exactly what i want. But, i did my survey 26 times and i noticed huge differences between the 3 groups. The first group of questions appeard 13 times the second 5 times and the third 8 times.

So my question would be if there is any chance to get a more stable distribution? I did not understand the difference between my first code and my second if i am honest and i don´t know if any of them are better suited for a more even distribution. Is there anything I can do generally better or is this just the way how things work?

Best,
Jonathan

Please Log in to join the conversation.

holch
Offline
LimeSurvey Community Team

More

2 years 7 months ago - 2 years 7 months ago #234225 by holch

Replied by holch on topic random values: a more stable distribution?

So far everything works out and it does exactly what i want. But, i did my survey 26 times and i noticed huge differences between the 3 groups. The first group of questions appeard 13 times the second 5 times and the third 8 times.

This isn't really surpising, is it? n=26 is a fairly small sample and thus a not so even distribution isn't overly suprising in this case.

You need big numbers to get a fairly even distribution, but even then it is not guaranteed. That is how random numbers work.

With each draw of the random number, each number has the same chance to be drawn. So while commonly people would be surprised if the same number would be drawn 3 or 4 times in a row, this isn't that suprising at all. Each time, there was a 33,333% chance that it is the same number as before. Which is lower than the chance of 66,66% that another number is drawn, but it is still fairly high.

I did not understand the difference between my first code and my second if i am honest and i don´t know if any of them are better suited for a more even distribution. Is there anything I can do generally better or is this just the way how things work?

The difference between the two codes will not improve / worsen the distribution. What the first code does is to check if there is already a random number stored in the question. If so, it will take the already stored number, if not (so the question is currently empty) it will create a random number.

This is important, because the random number is drawn every time you "touch" the question that creates the random number. This means, depending in your survey, the random number might chance throughout the survey.

E.g. imagine you draw the random number at the beginning, let's say it is "2". Later in the questionnaire you access it again (e.g. for a relevance equation to show/hide a certain question or question group). I this case the equation will be calculated again and there is again an equal chance of 33,3% for 1, 2 or 3 to be drawn. If you use this random number just once, this has not a huge impact on your survey, but if you want to use the random number more than once in the survey, it is absolute essential to use this code (and if you allow to go back in the questionnaire as well):

Code:

{if(is_empty( randnumber.NAOK ), rand(1, 3), randnumber.NAOK )}

and never just

Code:

{rand(1, 3)}

I see very little chance to make the random number more equally distributed, other than have a big number of respondents.

I was wondering if using more numbers e.g. rand(1,9) and then use 1-3, 4-6 and 7-9 to determine the group the respondent falls into would make a difference. I would need to wrap my head around this and study if it makes sense that this could have any impact. My stomach says yes, because each number has a smaller chance of being drawn next. But then the chance that it is one of the 3 numbers that determine the same group is als 33,33%. So I am not sure if this makes any difference.

But in any way, you need a bigger sample to get a more even distribution.

What you can do is adapt your relevance equations manually when you see that one bucket got filled more than another.

E.g. if the 1 gets more, just hide this question for a while and add the 1 to the other question which is shown for 2 (so for some time it will be shown for 1 AND 2). This is something you can use towards the end to fill up the "least filled bucket" to have a more or less even distribution.

Help us to help you!

Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
Always provide a LSS file (not LSQ or LSG).

Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.

Last edit: 2 years 7 months ago by holch.

Please Log in to join the conversation.

Joffm
Offline
LimeSurvey Community Team

More

2 years 7 months ago - 2 years 7 months ago #234230 by Joffm

Replied by Joffm on topic random values: a more stable distribution?

Hi,

But, i did my survey 26 times and i noticed huge differences between the 3 groups. The first group of questions appeard 13 times the second 5 times and the third 8 times.

If you had a look into a statistical table like "Documenta Geigy" (I know, it's a bit old),
you see that the 95% confidence interval of
a base of 26 and
an expected value of 8.67
is 4.48 - 14.48

Did you check it with Excel?
I did it 10 times and got the following distributions.
1. G1: 8, G2: 9, G3: 9
2. G1: 3, G2: 12, G3: 11
3. G1: 11, G2: 8, G3: 7
4. G1: 8, G2: 6, G3: 12
5. G1: 8, G2: 8, G3: 10
6. G1: 6, G2: 11, G3: 9
7. G1: 5, G2: 11, G3: 10
8. G1: 6, G2: 11, G3: 9
9. G1: 15, G2: 8, G3: 3
10. G1: 8, G2: 12, G3: 6

As@holch:
If you have such a small sample, you have to "fine tune" manually. If you think one group is full, change the relevance equations.
This is possible even in an activated survey.

There are other options like using the SAVEDID like
{SAVEDID-3*floor((SAVEDID-1)/3)}
to achieve a serie of 1,2,3,1,2,3,1,2,3,...
But beware of terminations. If each third respondent terminates, the third group again will be not be filled .

And this way you can calculate both your random numbers at the same time by
{SAVEDID-6*floor((SAVEDID-1)/6)}Here you get 1,2,3,4,5,6,1,2,3,4,5,6 and can set the relevance equations of your three groups like
1.randnum<3
2. randnum=

or randnum==4
3. randnum>4

And the randomization inside (1-2) you do by
1: randnum/2-floor(randnum/2)==0 (even numbers) or you enter: randnum==2 or randnum==4 or randnum==6
2: randnum/2-floor(randnum/2)!=0 (odd numbers)

Joffm
And now a last word.
You did not answer the initial questions.
I wonder, why.
So unfortunately I can't explain more sophisticated options which use plugins.
And you did not say, how the survey is done. Is it an open survey, is it closed with a participant table?

Volunteers are not paid.
Not because they are worthless, but because they are priceless

Last edit: 2 years 7 months ago by Joffm.

Please Log in to join the conversation.

jonathanz
Topic Author
Offline
New Member

More

2 years 7 months ago - 2 years 7 months ago #235480 by jonathanz

Replied by jonathanz on topic random values: a more stable distribution?

Thank you for your fast reply. We want to do an open survey which is advertised in a newsletter. That´s why it´s also hard for us to estimate how many people are going to do it and why we think we somehow need to better balance the distribution. Could be way over 100 or just 20. Thats why the SAVEDID option sounds pretty good and i directly tried to implement it. Unfortunately it did not work out for me

.

I also played around with LimeSurvey and have now 2 approaches:
- The first one is exactly like i described it in my question above. 3 Groups with different questions and inside those Groups 1 random out of 2 questions.
- The second approach is that i have 6 different groups and all of them are unique. That makes it a little easier for me to not have this random(1,2) inside the randomized groups and i just need one random(1,6) in the beginning of the survey. But i would only use that for the SAVEDID-approach because in my tests i got the feeling that the distribution problems are even worse with higher amounts of groups. I may be wrong but that´s the feeling i got.

Maybe it´s important to say that after my questions for the groups i have some personal questions that are the same for everyone and are not assigned to a special group.

As you described the SAVEDID function it sounds more suitable for our survey but as i already mentioned it, it didn´t work out for me. The survey jumps over all of the groups and goes directly to the unassigned questions.

I assume that the name of the equation in the beginnig of the survey should be randnum and that i should write {SAVEDID-6*floor((SAVEDID-1)/6)} as the question (for my second approach). Then i assign the groups with:
randnum==1
randnum==2
...
randnum==6Do you maybe see what my mistake is? And would it be okay to send the .lss if i dont´t manage to solve it on my own? We use Community Edition Version 5.3.22+220628
Thank you again for your help!

Last edit: 2 years 7 months ago by jonathanz.

Please Log in to join the conversation.

Joffm
Offline
LimeSurvey Community Team

More

2 years 7 months ago #235529 by Joffm

Replied by Joffm on topic random values: a more stable distribution?

If you do not show, what you exactly did, I do not see.

Provide a lss export.

Joffm

Volunteers are not paid.
Not because they are worthless, but because they are priceless

Please Log in to join the conversation.

jonathanz
Topic Author
Offline
New Member

More

2 years 7 months ago #236023 by jonathanz

Replied by jonathanz on topic random values: a more stable distribution?

Sorry for my late reply.I modified the survey right now because the lss file was way to big to upload it. I use for every question a picture, so now they are deleted but i guess that doesn´t change anything in terms of how the survey logically should work like. I hope that helps to see my mistake.

I also want to ask you if this SAVEDID logic can be applied to my first approach? The one with a random inside a random... I assume that it should be the same, right? Once i figuered out how it actually works like...

My last question would be if there are any differences in performance? We don´t know how many people are going to do the survey. It could be 1000 or less than 100 (Its advertised in a very large newsletter but it is just for the sake of science, so no one can win something etc.). So performance-wise, are there any differences between the SAVEDID- and the Random-method?

Have a great evening!

Please Log in to join the conversation.

jonathanz
Topic Author
Offline
New Member

More

2 years 7 months ago #236025 by jonathanz

Replied by jonathanz on topic random values: a more stable distribution?

File Attachment:

File Name: limesurvey...8(1).lss
File Size:1,843 KB

Please Log in to join the conversation.

holch
Offline
LimeSurvey Community Team

More

2 years 7 months ago #236027 by holch

Replied by holch on topic random values: a more stable distribution?

Depending on how you host, n=1000 is nothing. Yes, if they all would receive and open the newsletter at the same time and everyone would also click on the survey link at the same time. But that usually never happens.

First of all, it is good practice to send out mass emails in "waves" anyway. Then everyone has a different schedule and opens emails at different times, and then they might save to respond the survey for later.

Help us to help you!

Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
Always provide a LSS file (not LSQ or LSG).

Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.

Please Log in to join the conversation.

holch
Offline
LimeSurvey Community Team

More

2 years 7 months ago #236028 by holch

Replied by holch on topic random values: a more stable distribution?

And don't expect a much better distribution based on the SAVEDID approach. Yes, each number might be distributed more evenly. But a SAVEDID is given whenever someone clicks the link, even if they don't really start the survey or if they don't complete it. So depending on your luck, a lot of people with the same ending of the savedid will not continue...

Help us to help you!

Provide your LS version and where it is installed (own server, uni/employer, SaaS hosting, etc.).
Always provide a LSS file (not LSQ or LSG).

Note: I answer at this forum in my spare time, I'm not a LimeSurvey GmbH employee.

Please Log in to join the conversation.

davebostockgmail
Offline
Elite Member

More

2 years 7 months ago #236284 by davebostockgmail

Replied by davebostockgmail on topic random values: a more stable distribution?

It would seem to me that you are not looking for a randomisation of groups for your project but an even distribution (I may be wrong?)

If you want evenly distributed groups then you could look at this

[url] forums.limesurvey.org/forum/can-i-do-thi...er-all-needed-solved [/url]

as a way to ensure that distribution

Please Log in to join the conversation.

Joffm
Offline
LimeSurvey Community Team

More

2 years 7 months ago - 2 years 7 months ago #236292 by Joffm

Replied by Joffm on topic random values: a more stable distribution?

Well, each approach has its pros and cons.
SAVEDID: If most of the starters will complete - fine; if a lot of respondents will not complete, not better than the simple random number.
Function "StatCount" (only completes): If a lot of respondents start the survey at the same time they all are put into the same group.
"Least Filled": No matter if you count "ALL" or only "COMPLETES" the same two problems. And no matter if you use the plugin or an ajax call.

So, no method is the "only one".
If the responses come in slowly, you may use a "least filled" approach.
As the survey is advertised in a newsletter, there will be a lot who start, see the first page and terminate.

In the German part I wrote a long article about it
[url] forums.limesurvey.org/forum/german-forum...ragen-gruppen#211405 [/url]
and again in my "Tutorial 4. Gleichungen, Zu- und andere Fälle", Chapter 2.1.
(Was bedeutet, dass wir das Ganze auch im deutschen Teil hätten abhandeln können)

But, just try in Excel.
Compare the frequencies of a simple random number (1-6) to a clockwise distribution (1,2,3,4,5,6,1,2,3,...) with randomly distributed terminations (here you may vary between 20% and 70%, or whatever)

Consider: the 95%-confidence intervall is
n=1000, x=16,6% 14.5 - 19
So you can expect 145 to 190 respondents in each group. That's not too bad.
Now you may adapt the relevance equation that only the under average groups are filled

And the only way to get a real equal distribution.
Let the survey be open till all groups reached the goal.
Then you remove the overquota; let your grandmother decide.

Joffm

By the way:
Did you really insert all images base64 encoded (by copy/paste)?
I am not surprised that this small survey has nearly 2MB.
Use the HTML <img tag.

Volunteers are not paid.
Not because they are worthless, but because they are priceless

Last edit: 2 years 7 months ago by Joffm.

Please Log in to join the conversation.

Moderators: tpartner, holch

Lime-years ahead

Online-surveys for every purse and purpose

Pricing & Plans

Get started

Welcome to the LimeSurvey Community Forum

random values: a more stable distribution?

File Attachment:

Lime-years ahead

Legal

About Us

Open Source