Recs: A Study of Use and Meaning

XanthosNZ

Cancerous Bus Crash

General

30 Jan 06 04:33

sonhouse

Fast and Curious

slatington, pa, usa

Joined: 28 Dec 04
Moves: 53419

30 Jan 06 11:14

2 edits

Kind of like the study of IQ. Those with the highest score are the
most interested in it.....
Still, an anthropologist would probably be interested in the results
and the study methods.
Xanth, don't let the negative comments dissuade you from your
chosen path. Our personal quests are what separate us from animals.

Ragnorak

For RHP addons...

tinyurl.com/yssp6g

Joined: 16 Mar 04
Moves: 15013

30 Jan 06 11:17

1 edit

Originally posted by hopscotch
My point is that members should not be able to rec posts in that forum as the recomendations themselves are completely meaningless and are nought but further evidence of the lethargic mediocrity of all pointless religious argument.

I disagree.

But I do agree that those particular recs are pointless, hence this ingenious plan...
Thread 26003

D

cashthetrash

PoPeYe

This is embarrasking

Joined: 17 Nov 05
Moves: 44152

30 Jan 06 11:19

Originally posted by XanthosNZ
I believe they make up a large portion of 'unknown', as I said a good portion of that category went to a handful of posters (no it didn't include me).

Also, Gatecrasher yes there is more to what I looked at then what I posted. I didn't go to the trouble of processing it and writing it up in case the whole lot just got ignored. I'm not sure if I'll bother to do it now.

Why heck Xan it included you on this Thread twice already. Looks like to me that if you did such a thorough and exaustive study of two and a half months you would have taken the time to process it too. Maybe it wouldn't have been ignored or rebuked at all then. Or did you mean to say than rather than then? Or should I have said than? This writing stuff for us amateurs is so confusing isn't it?

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

30 Jan 06 11:48

1 edit

After I started this survey I added another method of analysis, letter frequency. I wrote a quick and dirty java program (thanks to Starrman for helping to optimise my code so it didn't take all day to run) that would count how often each letter occurred in a block of text and then add it to a total.

Here is the processed data:
http://img.photobucket.com/albums/v694/XanthosNZ/recsletters.gif

The reference is the established frequency of letters in written English (I believe there were no more than one or two non-English recced posts). Now obviously my sample is of a finite size and therefore has inherent error. I haven't included the error bars on my graph because I'm too lazy to work out how, however I can tell you that thanks to the huge sample size (upwards of 8 million letters) it is extremely small.

I was happy to note a similar overall shape (as it meant my code had worked and not given me useless data) but I expected a much closer match. Playing around (using sources for gutenberg.org and my code) I get an extremely close match to the reference with even relatively short samples (say 100,000 letters).

To me this indicates that something inherent in recced posts is skewing the letter frequencies.

Letters that there are much more of in recced posts include A, G, H, O and W. At the other end are E, I, R, S, U and Y. Does anyone know of anything that could explain this?

Freddie2008

9 Edits

London

Joined: 28 Sep 04
Moves: 110329

30 Jan 06 11:49

Originally posted by XanthosNZ
Over the past two and a half months I've been running an exhaustive study to categorize every rec given during the period. I kept my progress quiet because I wanted to release my findings on my own schedule.

[b]Forum
Nice and simple really. Recs are split into the forum the post they are given to is in.
http://img.photobucket.com/albums/v694/Xanth ...[text shortened]... .

NOTE: I promise that the graphs are what they are advertised as unlike STANGs.[/b]

Are you trying to get recced for doing this?

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

30 Jan 06 11:55

1 edit

Originally posted by Freddie2006
Are you trying to get recced for doing this?

I'm trying to bring scientific rigour to an area where it sorely needed. Never again shall mankind look at a post notice "2 recommendations" and sit in wonder. That's my aim.

Bosse de Nage

Zellulärer Automat

Spiel des Lebens

Joined: 27 Jan 05
Moves: 90892

30 Jan 06 12:19

Originally posted by XanthosNZ
I'm trying to bring scientific rigour to an area where it sorely needed.

Rigor mortis will have to do for now.

Gatecrasher

Whale watching

33°36'S 26°53'E

Joined: 05 Feb 04
Moves: 41150

30 Jan 06 13:01

Originally posted by XanthosNZ
Letters that there are much more of in recced posts include A, G, H, O and W. At the other end are E, I, R, S, U and Y. Does anyone know of anything that could explain this?

Is it statistically significant?

Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.

Once you have that raw data, statistical techniques can be used to find out whether frequency differences are within the realms of chance (after all, you could never expect an exact match) or whether they are statistically significant.

hopscotch

Joined: 09 Jun 04
Moves: 39731

30 Jan 06 13:05

Originally posted by Gatecrasher
Is it statistically significant?

Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.

a preponderance?

Rec'd!

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

30 Jan 06 13:12

Originally posted by Gatecrasher
Is it statistically significant?

Instead of using an external reference, you should use all RHP posts (recced and non-recced) as your reference. There may be a preponderance of certain words and names that are used at RHP that would create a bias for various letters.

Once you have that raw data, statistical techniques can be used to find out whet ...[text shortened]... after all, you could never expect an exact match) or whether they are statistically significant.

I believe it is statisically significant, the sheer number of letters sampled makes the confidence interval very small. You are correct about checking vs. RHP posts to see if the same skew is present. Thanks to my code this is easy enough. I've taken the first page of threads from every forum (copy paste into my code and it strips out everything but post content, don't worry Russ no script accesses RHP that bit is done by me).

This seems to give a higher P and slightly higher H (perhaps from 'RHP' itself?) but nothing resembling the skew in recced posts.

Hand of Hecate

Thug

Playing with matches

Joined: 08 Feb 05
Moves: 14634

30 Jan 06 13:26

Originally posted by XanthosNZ
Over the past two and a half months I've been running an exhaustive study to categorize every rec given during the period. I kept my progress quiet because I wanted to release my findings on my own schedule.

[b]Forum
Nice and simple really. Recs are split into the forum the post they are given to is in.
http://img.photobucket.com/albums/v694/Xanth ...[text shortened]... .

NOTE: I promise that the graphs are what they are advertised as unlike STANGs.[/b]

Jesus H. Christ! You have way too much time on your hands. Should you be attending a class or at least pounding a few back in the local pub?

Gatecrasher

Whale watching

33°36'S 26°53'E

Joined: 05 Feb 04
Moves: 41150

30 Jan 06 13:47

Originally posted by XanthosNZ
I believe it is statisically significant, the sheer number of letters sampled makes the confidence interval very small.

Hmm, well that is a very interesting finding.

Perhaps certain letters are "friendlier" and "nicer" than others. When it comes to the vowels, I've always "liked" A and O, and never been very fond of E, I and U. Couldn't tell you why.

I once analyzed lotto payouts and determined that certain numbers are much more popular than others. Whenever the popular numbers were selected, the dividends dropped. When unpopular numbers were selected the dividends rose. Unfortunately, with 50%-60% of the take being creamed off by the operators, the difference in popularity was never quite large enough to create a profitable betting system.

So maybe the letters that make up the words that make up the sentences that make up the posts do influence how we feel about the message. On a subconscious level, of course, because I doubt that anyone goes, "Ooooh, four X's in that post... Rec."

Chakan

Oro!

Fear The Cow

Joined: 23 Nov 01
Moves: 34289

30 Jan 06 13:49

Originally posted by Gatecrasher
Hmm, well that is a very interesting finding.

Perhaps certain letters are "friendlier" and "nicer" than others. When it comes to the vowels, I've always "liked" A and O, and never been very fond of E, I and U. Couldn't tell you why.

I once analyzed lotto payouts and determined that certain numbers are much more popular than others. Whenever the ...[text shortened]... urse, because I doubt that anyone goes, "Ooooh, four X's in that post... Rec."

Four X is a beer right?? Rec!

Bosse de Nage

Zellulärer Automat

Spiel des Lebens

Joined: 27 Jan 05
Moves: 90892

30 Jan 06 14:03

Originally posted by Chakan
Four X is a beer right?? Rec!

Let's get XXXX'd.

Phlabibit

Mystic Meg

tinyurl.com/3sbbwd4

Joined: 27 Mar 03
Moves: 17242

30 Jan 06 14:03

Recs are what keeps me posting... I have not got a rec in a week or so, so I don't have the energy I need to write great posts like I used to.

Rec Doldrums!

P-