Go back
Algorithm to detect trolls!:

Algorithm to detect trolls!:

Science

Vote Up
Vote Down

http://phys.org/news/2015-04-algorithm-online-trolls.html#ajTabs

Like they are SO hard to identify🙂

Vote Up
Vote Down

Originally posted by sonhouse
http://phys.org/news/2015-04-algorithm-online-trolls.html#ajTabs

Like they are SO hard to identify🙂
For people... as ever, what is simple for humans is frequently hard for computers.

Vote Up
Vote Down

Originally posted by sonhouse
http://phys.org/news/2015-04-algorithm-online-trolls.html#ajTabs

Like they are SO hard to identify🙂
Have you heard of Orwell?

Vote Up
Vote Down

Originally posted by stevemcc
Have you heard of Orwell?
I read about him a long time ago, around 1984 I think.

Vote Up
Vote Down

Originally posted by sonhouse
http://phys.org/news/2015-04-algorithm-online-trolls.html#ajTabs

Like they are SO hard to identify🙂
That's rather depressing.


http://phys.org/news/2015-04-algorithm-online-trolls.html#ajTabs

The researchers report that it was relatively easy to spot FBUs and to convert what they had found to something a computer could understand—starting with what they called an Automated Readability Index. After writing their algorithm and working out issues, the team reports that they were able to spot FBUs with an 80 percent accuracy rate after just ten posts. That is not high enough for web sites owners, of course, banning non-trolls by mistake 20 percent of the time could lead to driving away visitors—but it could possibly be used as a way to assist moderators.


They haven't posted the actual calculation, so it's not necessarily clear what they have done.

But this just set off my Bayesian badness detector.

If we read this to mean that this algorithm has an 80% reliability at determining if a post is from a Troll [T]
or a Regular User [RU], then the number of false positives is not 20% [except in a very narrow and improbable
set of circumstances]
Because the number of false positives is going to depend on the ratio of T to RU.

If we say for the sake of argument that 20% of posters are T, then this algorithm will correctly label 80% of that
20% as T. [0.8*0.2=0.16] [16% of the total]
However the algorithm will also incorrectly label 20% of the 80% of RU as T. [0.2*0.8=0.16] [16% of the total]
So we land up with 32% [16%+16%] of all users labelled as T, and of those only half are actually T.

If we ban all those who the algorithm identifies as T, then we will be incorrectly banning 16% of the users [and failing
to ban 20% of the actual trolls, or 4% of the user base total].

So the "banning non-trolls by mistake" number is 16%.

But lets make it so that only 5% of users are trolls...

If we say for the sake of argument that 5% of posters are T, then this algorithm will correctly label 80% of that
5% as T. [0.05*0.8=0.04] [4% of the total]
However the algorithm will also incorrectly label 20% of the 95% of RU as T. [0.95*0.2=0.19] [19% of the total]
So we land up with 23% [4%+19%] of all users labelled as T, and of those only ~17.4% are actually T. [(4/23)*100]

So in this case we have 19% of the total userbase incorrectly labelled as trolls.

The answer will converge on 20% if the proportion of trolls is very small. However I have been assuming that the
false positive rate is the same as the false negative rate. They may not be, and we don't know which [if either]
was being referred to.


The prior probability matters.

https://xkcd.com/1132/


Originally posted by sonhouse
I read about him a long time ago, around 1984 I think.
... while holidaying on a farm ....

Vote Up
Vote Down

I have an algorithm for trolls:

if his name is Metal Brain or RJHinds, he is a troll, else not so.

This algorithm has several severe limitations.