User:Patrick/random sequence
For a random sequence of independent characters of an alphabet of p characters, almost surely the limit as n tends to infinity of (the number of occurrences of a given subsequence of length s) divided by n, is p^(-s).
The odds of getting the word "banana" typed are 26^6 or 1 in 308,915,776.
say i made a program make random letters, and it stops making them when it finds "banana".
Ex:
wjkotcbanana
How then would i classify the results of this single test?
is it 6 in 12 because 6 of 12 letters make the word banana? is it 1 in 12 because out of 12 letters the word banana appears once? or is there a specific formula that i havent mentioned? Please help.
172.129.238.63 01:54, 24 July 2007 (UTC)
- It depends on what you want to express. If you want to express how many letters out of those produced make up the word banana, the answer is: 6 (out of 12). If you want to state how often you found the word banana for every letter produced, it is once (on 12 times). If you want to express the odds of a certain event occurring – usually such terminology as 1 in 308,915,776 is reserved for the odds of an event occurring – you have to make clear what the event is. The probability of finding the string banana at least once in a random string of 12 letters from a 26-letter alphabet is equal to (7×266−1)/2612, which is about 7 in 308,915,776 or 1 in 44,130,825. The probability of finding the string banana exactly once in a random string of 12 letters from a 26-letter alphabet, appearing as the last 6 letters, is equal to (266−1)/2612, which is about 1 in 308,915,776. --Lambiam 03:13, 24 July 2007 (UTC)
Shirley, you mean the average number of characters typed before the word "banana" (first) appears. I would state the result as "On one run, a monkey typed a total of 12 characters when the word banana first appeared." 202.168.50.40 01:30, 25 July 2007 (UTC)
- To whom is the word you meant to refer in this declaration? --Lambiam 02:47, 25 July 2007 (UTC)
- It's a joke! Shirley, you would know what a joke is. 202.168.50.40 03:32, 25 July 2007 (UTC)
- You edited your posting of 01:30, 25 July 2007 (UTC) after my reaction, which is cheating. Without the later addition of a second sentence, it was totally unclear that the word "average" was supposed to refer to the results of a single test. --Lambiam 03:56, 25 July 2007 (UTC)
- The word "average" is not meant to refer to the result of a single run. It is meant to mean the "arithmetic mean" of many many runs. In a real run, it would be extremely unlikely that the word "banana" would appear as early as 12 characters typed. 202.168.50.40 04:31, 25 July 2007 (UTC)
- I calculate the average number of typed characters when the word "banana" first appear to be an obscenely large number of 321272406 202.168.50.40 04:45, 25 July 2007 (UTC)
Above the correct number of 266 = 308915776 has already been mentioned. --Lambiam 06:46, 25 July 2007 (UTC)- 1 in 26^6 is just the odds of getting the word "banana" when you type 6 characters. It is not the average number of characters typed when the word "banana" first appeared. Try and consider the average number of coin toss needed before you encounter 6 heads in a row. It would not be 2^6. 211.28.119.3 08:53, 25 July 2007 (UTC)
- To type the first 100 characters of the play Hamlet, requires an average of 3268647867246256383381332100041691484373976788312974266629140102414955744756908184404049903032490380904202638084876187965749304595652472251350 typed characters on the typewriter. I can't find the number for 1000 characters because my program crashed. PS. I assume the alphabet consists only of 26 unique characters. 202.168.50.40 05:00, 25 July 2007 (UTC)
- I calculate the average number of typed characters when the word "banana" first appear to be an obscenely large number of 321272406 202.168.50.40 04:45, 25 July 2007 (UTC)
- The word "average" is not meant to refer to the result of a single run. It is meant to mean the "arithmetic mean" of many many runs. In a real run, it would be extremely unlikely that the word "banana" would appear as early as 12 characters typed. 202.168.50.40 04:31, 25 July 2007 (UTC)
- You edited your posting of 01:30, 25 July 2007 (UTC) after my reaction, which is cheating. Without the later addition of a second sentence, it was totally unclear that the word "average" was supposed to refer to the results of a single test. --Lambiam 03:56, 25 July 2007 (UTC)
- It's a joke! Shirley, you would know what a joke is. 202.168.50.40 03:32, 25 July 2007 (UTC)
- Suppose we are trying to get a sequence of l correct letters from an alphabet of size s. Let be the expected number of additional letters needed if the last k letters (but no more) are correct. Then we have . The solution is . We are interested in , which is equal to . For , this is 321272406, like 202 has mentioned. -- Meni Rosenfeld (talk) 14:23, 25 July 2007 (UTC)
- For l = 1000, by the way, Mathematica is of course more than capable of finding an exact result, but the short version is "roughly ". -- Meni Rosenfeld (talk) 14:28, 25 July 2007 (UTC)
- There seems to be something wrong. Take e.g. the alphabet {a,b} and the sequence aa, then seems correct, but , not 2. And for the sequence ab, after correctly having an "a", the expected number of additional letters needed is less than for the sequence aa: if the next character is wrong we have aa, and we do not have to start from the beginning but again already the first letter is correct. I get for this sequence.--Patrick 06:01, 26 July 2007 (UTC)
- You're right, I messed up the summation of the geometric series - it should be (which is the same as before for k=0). You are also correct that my assumption that a wrong letter resets our position is only valid for a sequence of identical letters, where I have mistakenly assumed that it generalizes to every sequence. So the exact values depend on the structure of the desired sequence - in particular, for "banana", it will be smaller than the value I have given (which is correct only for "zzzzzz"). Sorry for those mistakes. -- Meni Rosenfeld (talk) 14:53, 26 July 2007 (UTC)
- Fortunately, none of this has any bearing on your previous question regarding HHHHHH - it is still 126 (which is remarkably larger than 64). -- Meni Rosenfeld (talk) 14:59, 26 July 2007 (UTC)
- Yes, and I am beginning to understand why this is: if you continue tossing the coin, 1 in 64 sequences is HHHHHH provided that HHHHHHH (7 Hs) is counted as two sequences of 6, etc.; it is 1 in 126 if overlapping sequences are not counted.--Patrick 16:22, 26 July 2007 (UTC)
- Since two occurrences of "banana" cannot overlap, would not the average number of letters that has to be typed be exactly 266 = 308,915,776?--Patrick 16:41, 26 July 2007 (UTC)
- I can't say I am completely convinced by your argument, but solving the equations indeed gives a result of 308915776. If only we had figured this out before Lambiam has struck out his comment... :) -- Meni Rosenfeld (talk) 17:53, 26 July 2007 (UTC)
average there is 1 sequence HHHHHH for every 64 tosses, provided that HHHHHHH (7 Hs) is counted as two sequences of 6, etc.; it is 1 sequence HHHHHH for every 126 tosses if overlapping sequences are not counted. For each sequence of tosses starting after a sequence HHHHHH and ending with the next occurrence of the same, there are on average also .5 such sequences starting with the second H, .25 starting with the third, etc.--Patrick 00:02, 27 July 2007 (UTC)