Kind of like the study of IQ. Those with the highest score are the
most interested in it.....
Still, an anthropologist would probably be interested in the results
and the study methods.
Xanth, don't let the negative comments dissuade you from your
chosen path. Our personal quests are what separate us from animals.
Originally posted by hopscotchI disagree.
My point is that members should not be able to rec posts in that forum as the recomendations themselves are completely meaningless and are nought but further evidence of the lethargic mediocrity of all pointless religious argument.
But I do agree that those particular recs are pointless, hence this ingenious plan...
Thread 26003
D
Originally posted by XanthosNZWhy heck Xan it included you on this Thread twice already. Looks like to me that if you did such a thorough and exaustive study of two and a half months you would have taken the time to process it too. Maybe it wouldn't have been ignored or rebuked at all then. Or did you mean to say than rather than then? Or should I have said than? This writing stuff for us amateurs is so confusing isn't it?
I believe they make up a large portion of 'unknown', as I said a good portion of that category went to a handful of posters (no it didn't include me).
Also, Gatecrasher yes there is more to what I looked at then what I posted. I didn't go to the trouble of processing it and writing it up in case the whole lot just got ignored. I'm not sure if I'll bother to do it now.
After I started this survey I added another method of analysis, letter frequency. I wrote a quick and dirty java program (thanks to Starrman for helping to optimise my code so it didn't take all day to run) that would count how often each letter occurred in a block of text and then add it to a total.
Here is the processed data:
http://img.photobucket.com/albums/v694/XanthosNZ/recsletters.gif
The reference is the established frequency of letters in written English (I believe there were no more than one or two non-English recced posts). Now obviously my sample is of a finite size and therefore has inherent error. I haven't included the error bars on my graph because I'm too lazy to work out how, however I can tell you that thanks to the huge sample size (upwards of 8 million letters) it is extremely small.
I was happy to note a similar overall shape (as it meant my code had worked and not given me useless data) but I expected a much closer match. Playing around (using sources for gutenberg.org and my code) I get an extremely close match to the reference with even relatively short samples (say 100,000 letters).
To me this indicates that something inherent in recced posts is skewing the letter frequencies.
Letters that there are much more of in recced posts include A, G, H, O and W. At the other end are E, I, R, S, U and Y. Does anyone know of anything that could explain this?
Originally posted by XanthosNZAre you trying to get recced for doing this?
Over the past two and a half months I've been running an exhaustive study to categorize every rec given during the period. I kept my progress quiet because I wanted to release my findings on my own schedule.
[b]Forum
Nice and simple really. Recs are split into the forum the post they are given to is in.
http://img.photobucket.com/albums/v694/Xanth ...[text shortened]... .
NOTE: I promise that the graphs are what they are advertised as unlike STANGs.[/b]
Originally posted by XanthosNZIs it statistically significant?
Letters that there are much more of in recced posts include A, G, H, O and W. At the other end are E, I, R, S, U and Y. Does anyone know of anything that could explain this?
Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.
Once you have that raw data, statistical techniques can be used to find out whether frequency differences are within the realms of chance (after all, you could never expect an exact match) or whether they are statistically significant.
Originally posted by Gatecrashera preponderance?
Is it statistically significant?
Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.
Rec'd!
Originally posted by GatecrasherI believe it is statisically significant, the sheer number of letters sampled makes the confidence interval very small. You are correct about checking vs. RHP posts to see if the same skew is present. Thanks to my code this is easy enough. I've taken the first page of threads from every forum (copy paste into my code and it strips out everything but post content, don't worry Russ no script accesses RHP that bit is done by me).
Is it statistically significant?
Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.
Once you have that raw data, statistical techniques can be used to find out whet ...[text shortened]... after all, you could never expect an exact match) or whether they are statistically significant.
This seems to give a higher P and slightly higher H (perhaps from 'RHP' itself?) but nothing resembling the skew in recced posts.
Originally posted by XanthosNZJesus H. Christ! You have way too much time on your hands. Should you be attending a class or at least pounding a few back in the local pub?
Over the past two and a half months I've been running an exhaustive study to categorize every rec given during the period. I kept my progress quiet because I wanted to release my findings on my own schedule.
[b]Forum
Nice and simple really. Recs are split into the forum the post they are given to is in.
http://img.photobucket.com/albums/v694/Xanth ...[text shortened]... .
NOTE: I promise that the graphs are what they are advertised as unlike STANGs.[/b]
Originally posted by XanthosNZHmm, well that is a very interesting finding.
I believe it is statisically significant, the sheer number of letters sampled makes the confidence interval very small.
Perhaps certain letters are "friendlier" and "nicer" than others. When it comes to the vowels, I've always "liked" A and O, and never been very fond of E, I and U. Couldn't tell you why.
I once analyzed lotto payouts and determined that certain numbers are much more popular than others. Whenever the popular numbers were selected, the dividends dropped. When unpopular numbers were selected the dividends rose. Unfortunately, with 50%-60% of the take being creamed off by the operators, the difference in popularity was never quite large enough to create a profitable betting system.
So maybe the letters that make up the words that make up the sentences that make up the posts do influence how we feel about the message. On a subconscious level, of course, because I doubt that anyone goes, "Ooooh, four X's in that post... Rec."
Originally posted by GatecrasherFour X is a beer right?? Rec!
Hmm, well that is a very interesting finding.
Perhaps certain letters are "friendlier" and "nicer" than others. When it comes to the vowels, I've always "liked" A and O, and never been very fond of E, I and U. Couldn't tell you why.
I once analyzed lotto payouts and determined that certain numbers are much more popular than others. Whenever the ...[text shortened]... urse, because I doubt that anyone goes, "Ooooh, four X's in that post... Rec."