Answer Prancing

royalchicken

CHAOS GHOST!!!

Posers and Puzzles

21 Jun 03 19:07

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

25 Jun 03 16:56

Originally posted by Acolyte

Also, could you give me the objective function that the hiring officer is trying to maximise? It can't be as simple as 'probability of best candidate being selected', because then he would just interview everyone so he could be sure of finding the best one.

I still don't understand the merits of this strategy.

This is because I haven't explained it properly...😳.

The officer INTERVIEWS the first M, and rejects them out of hand. He then interviews the rest, and selects the first one to have a higher score on the objective criteria than the rest. If such a candidate does not exist, he hires the last candidate by default (that is what I forgot to mention). Furthermore, remember the crucial detail: A PROSPECTIVE PRANCER MUST BE ACCEPTED OR REJECTED IMMEDIATELY FOLLOWING HIS INTERVIEW (before others are interviewed). Given this strategy, I am asking you to tell me what value of M maximizes the probability that the best candidate is chosen. M is different from zero. Also, given that this is done properly, what is the probability that the best candidate is hired, and where should you put yourself in line. I think that I may have left the cruciall detail out of my initial explanation, which gives rise to the problems you astutely mentioned.

Your second objection is answered by this detail. Sorry about the lack of clarity.

iamatiger

Joined: 26 Apr 03
Moves: 26771

26 Jun 03 08:30

The best candidate can choose where to stand. He knows that the interviewer wants to minimise his workload. Therefore the cleverest candidate stands at the back. Interviewer discards the first M-2 people, interviews the second to last and hires the last one. Since only a very good candidate will work out that it is best to stand at the back this hiring technique will work admirably.

Acolyte

Now With Added BA

Loughborough

Joined: 04 Jul 02
Moves: 3790

26 Jun 03 12:52

Originally posted by royalchicken
The officer INTERVIEWS the first M, and rejects them out of hand...

Oh right, now it makes sense. So the quality of the last candidate is irrelevant, and if the best candidate of the other N-1 candidates is one of the first M, the hiring officer is bound to choose the last candidate. Is that right?

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

26 Jun 03 16:04

Originally posted by Acolyte
Oh right, now it makes sense. So the quality of the last candidate is irrelevant, and if the best candidate of the other N-1 candidates is one of the first M, the hiring officer is bound to choose the last candidate. Is that right?

That is correct. Sorry I forgot to tell you that detail before. Some linguistic rigidity on my part or some such.

iamatiger

Joined: 26 Apr 03
Moves: 26771

26 Jun 03 22:58

Ok - my thoughts so far: (mucho apologies about previous attempt)
Forget about the instantly rejected people, for now

Given his inefficient search procedure (without the rejection step), say the probability that the interviewer will select the best candidate from n is f[n]

The probability that, with the rejection step, the interviewer will select the best candidate from the total group G = (G-n)/G*f[n]
(i.e. that probability that the best candidate is not in the rejected batch, and is then selected)

f[n] is a summation for all p from 1..n of the values of function g[p,n], which gives the probability that, if the best candidate is at position p within the group of n, the search procedure will find him.

how does g[p,n] work:
well
g[1,n] = 0 (for n =/ 1) and 1 (for n = 1)
g[2,n] = 1 for all n > 1
g[x,n], where x > 2, is more complicated. It is a summation of functions h[b,c,d] where h[b,c,d] is the probability that a random selection of c candidates from a set of d will rank worse than b

b ranges from 1..n-1, and represents the ranking of the first candidate within the group (with better candidates higher)
c = p - 2, and represents the number of candidates between the first and p
d = n - 2, and is the total number of candidates, excluding the best candidate and the first candidate

I'm getting there, but I am going to bed now. Perhaps someone else can carry this on?

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

27 Jun 03 01:07

Okay thus far....(Russ, we need a 'cryptic' smiley.)?

iamatiger

Joined: 26 Apr 03
Moves: 26771

27 Jun 03 08:37

I'm getting there, but I am going to bed now. Perhaps someone else can carry this on?

A little bit more before I go to work... I didn't need parameter c
h[b,d] reresents the probability that a random selection of candidates from d rank worse than candidate b,

obviously thare are a total of d-b candidates in d that rank worse than b. The total number of ways of arranging these d-b candidates is (d-b)!
(where ! means factorial)
the total number of arrangements of d-b candidates selected from the total set d-1 is (d-1)!/(b-1)! (I take 1 from d because I am excluding the b'th candidate)

so h[b,d] = (d-b)!.(b-1)!/(d-1)!

Wow - I'll have to look up binomial expansions, I only did a chemistry degree.

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

28 Jun 03 02:32

A little more work than is really necessary actually...

Chris

Site Admin

Wimbledon

Joined: 21 Feb 01
Moves: 26280

01 Jul 03 16:15

1 edit

Coincidentally, after listening to Jon Tickle's theory of women on BB4, I wrote a little program to test his hypothesis (that given 30 women, you look at the first 10 you meet, and then select the next woman you meet who is better than any you have previously met!!)

This is the same problem in a different wrapper, so I have some ready-made results...

Using N=30, I tried this for each value of M from 0 to 29 - and for each value, did 100 runs and calculated the percentage of times the best 'candidate' was selected.

This shows, for example, that just interviewing the first 3 candidates and then selecting the next candidate that is better than any of the first three (or the last candidate if the best candidate was in the first three), the quality of the candidate will be approximately 82.15% (if all candidates have a random quality between 0 and 99). Jon's theory that M=10 was optimal is shown to be questionable by this sample, which produced a 79.18% quality rating...

0 = 49.32
1 = 69.44
2 = 80.92
3 = 82.15
4 = 82.03
5 = 81.26
6 = 82.58
7 = 80.95
8 = 83.75
9 = 78.60
10 = 79.18
11 = 73.39
12 = 80.25
13 = 76.59
14 = 74.81
15 = 73.37
16 = 75.39
17 = 66.94
18 = 65.13
19 = 61.22
20 = 64.87
21 = 61.36
22 = 54.48
23 = 53.23
24 = 57.24
25 = 54.51
26 = 60.61
27 = 51.65
28 = 53.25
29 = 50.67

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

01 Jul 03 22:30

2 edits

Quite impressive. For N=30, 11 is about the optimum M. I like your version of the problem better too, never having been one to hang around guys in purple tights...

I think you've got it. Can you tell me exactly how it works, though?

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

15 Jul 03 15:22

1 edit

Chris, maybe offer an accolade for the complete explanation of this one...let's see if the Answer Prancer will deliver 😉.

Also, can someone explain the fairly subtle reason why the theoretical answer (M=11 for N=30) disagrees with Chris's data? The program did not quite preserve everything.

Chris

Site Admin

Wimbledon

Joined: 21 Feb 01
Moves: 26280

17 Jul 03 12:56

Originally posted by royalchicken
Chris, maybe offer an accolade for the complete explanation of this one...let's see if the Answer Prancer will deliver 😉.

Also, can someone explain the fairly subtle reason why the theoretical answer (M=11 for N=30) disagrees with Chris's data? The program did not quite preserve everything.

To help, here is the code that was used...

ITEMS = 30
SAMPLES = 1000

FOR M = 0 TO ITEMS - 1

TOTAL = 0

* For each value of M, take a reasonable number of samples

FOR LOOP1 = 1 TO SAMPLES
ITEM = 0
BEST = 0

* Look at the first 'M' items and find the best

FOR LOOP2 = 1 TO M
ITEM = RND(100)
IF ITEM > BEST THEN
BEST = ITEM
END
NEXT LOOP2

* Look at the remaining items and stop if any is better than BEST,
* otherwise take the last item

FOR LOOP3 = 1 TO (ITEMS - M)
ITEM = RND(100)
IF ITEM > BEST THEN

* Exit Loop

LOOP3 = ITEMS + 1
END
NEXT LOOP3

* Add the score of this item to the current total for 'M'

TOTAL += ITEM
NEXT LOOP1

PRINT "Score for " : M : " = " : (TOTAL / SAMPLES)

NEXT M

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

17 Jul 03 18:06

4 edits

Right then...I guess I should explain. Here's how it works, as compared to what the program looks for. Let P(Best) be the probability that the best prancer candidate is selected; we seek to maximise this. Let p(i) denote the ith prancer candidate (i=1,2,....,N). Clearly:

P(Best) = SUM (P[p(i) is best]*P[p(i) is selected])

Where the sum is taken over i=M+1,M+2,...,N.

Since for each i, P[p(i) is best] = 1/N, and the probability that p(i) is better than all its predecessors is M/(i-1), we have:

P(Best) = (M/N)*SUM (1/(i-1)) for i = M+1,...,N

or (and I'll only get into the details of this step if requested):

P(Best) ~ (M/N)*(log N - log M)

Where log denotes a natural logarithm. We need the value of M that maximises this, so we differentiate the above wrt M and set that equal to zero:

log N - 1 = log M

or exp(log N - 1) = M

or M = N/e, where e = 2.7182818...

Thus the best value of M is N/e, and the maximum probability of getting the best prancer is 1/e or about 37%. So, for example, if there re 100 potential prancers, 37 should be rejected out of hand and the probability of getting the absolute best prancer will be 37%.

Therefore, Jon Tickle was not off by much. But can anyone tell me why this theoretical answer does not square with Chris's program.....? (There is a wee problemette, as iamatiger might put it.)

EDIT Also, I'd be interested in hearing whether anyone can come up with a more efficient algortihm for this sort of thing. This was the best I could do, but I suspect that proving optimality here would be quite difficult.

EDIT THE SECOND Chris, I like your profile quote.

Chris

Site Admin

Wimbledon

Joined: 21 Feb 01
Moves: 26280

18 Jul 03 09:57

3 edits

Originally posted by royalchicken
Right then...I guess I should explain. Here's how it works, as compared to what the program looks for. Let P(Best) be the probability that the best prancer candidate is selected; we seek to maximise this. Let p(i) denote the ith pr ...[text shortened]... te difficult.

EDIT THE SECOND Chris, I like your profile quote.

Here is the output from the program when it is adjusted to count the number of times the BEST item was selected. The number outside the brackets is the average quality (0-99) of the selected candidate. The number in the brackets is the number of times the best candidate of all 30 was selected. Each value of M had 100,000 runs. This shows that the value M=11 produces the highest number of "best" selections, but for average candidate quality, M=4 seems to be best. I'm confused.

0 = 50.09 (3385)
1 = 73.34 (13061)
2 = 79.99 (19805)
3 = 82.54 (24612)
4 = 83.27 (28262)
5 = 83.43 (31196)
6 = 82.91 (33701)
7 = 82.08 (35444)
8 = 81.00 (36360)
9 = 79.94 (37118)
10 = 78.78 (37702)
11 = 77.67 (38357)
12 = 76.05 (37680)
13 = 74.56 (37015)
14 = 73.32 (36543)
15 = 71.79 (35559)
16 = 70.21 (34047)
17 = 68.98 (32934)
18 = 67.27 (31145)
19 = 65.85 (29642)
20 = 64.13 (27452)
21 = 62.63 (25384)
22 = 61.00 (22936)
23 = 59.62 (20759)
24 = 57.98 (18088)
25 = 56.50 (15435)
26 = 54.80 (12605)
27 = 53.29 (9659)
28 = 51.67 (6574)
29 = 50.01 (3307)

EQU QUALITY TO 15000
EQU SAMPLES TO 100000
EQU ITEMS TO 30

FOR M = 0 TO ITEMS - 1

TOTAL = 0
GOTBESTCOUNT = 0

* For each value of M, take a reasonable number of samples

FOR LOOP1 = 1 TO SAMPLES
ITEM = 0
BEST = -1

* Look at the first 'M' items and find the best

FOR LOOP2 = 1 TO M
ITEM = RND(QUALITY)
IF ITEM > BEST THEN
BEST = ITEM
END
NEXT LOOP2

* Look at the remaining items and stop if any is better than BEST,
* otherwise take the last item

PICKEDPOS = -1

FOR LOOP3 = M + 1 TO ITEMS
ITEM = RND(QUALITY)
IF ITEM > BEST THEN

* Exit Loop

PICKEDPOS = LOOP3
LOOP3 = ITEMS + 1
END
NEXT LOOP3

* Now, see if the picked item is the best by examining the remaining items.

* Now, see if the picked item is the best by examining the remaining items.

* If an item was found that was better than BEST then let's see if it is the
* best of ALL the items

IF PICKEDPOS > -1 THEN

GOTBEST = 1
FOR LOOP4 = PICKEDPOS + 1 TO ITEMS
IF RND(QUALITY) > ITEM THEN
GOTBEST = 0
LOOP4 = ITEMS + 1
END
NEXT LOOP4

GOTBESTCOUNT += GOTBEST

END

* Add the score of this item to the current total for 'M'

TOTAL += ITEM
NEXT LOOP1

PRINT M : " = " : (TOTAL / SAMPLES / (QUALITY / 100)) 2 : " (" : GOTBESTCOUNT : &quot😉"

NEXT M

royalchicken

CHAOS GHOST!!!

Elsewhere

Joined: 29 Nov 02
Moves: 17317

18 Jul 03 15:50

Originally posted by Chrismo
Here is the output from the program when it is adjusted to count the number of times the BEST item was selected. The number outside the brackets is the average quality (0-99) of the selected candidate. The number in the brackets is the number of times the best candidate of all 30 was selected. Each value of M had 100,000 runs. This shows that the value M ...[text shortened]... "best" selections, but for average candidate quality, M=4 seems to be best. I'm confused.

You are correct 😀. The problem asks for the way to produce the highest probability of selecting the best candidate, which, as you say, happens at M=11. In practice, shooting for highest average quality seems better from a business standpoint, though.

Can anyone work out, in view of what I said before, how to get the best average candidate quality in general (noting that Chris was correct in saying M=4 in that case)?