1. SubscriberRuss
    RHP Code Monkey
    RHP HQ
    Joined
    21 Feb '01
    Moves
    2396
    24 Feb '08 12:151 edit
    First off, thanks to everyone who has been very supportive since we came backup. A pleasant surprise, because I thought we might be on the defensive for sometime, particularly since we had a series of hiccups proceeding the extended outage. So, thanks everyone, we appreciate it.

    As nearly everyone will be aware, site performance was starting to get quite miserable before the upgrade, but this was also leading to other problems. Stressed servers crash more often, and fail in other areas too. A few efforts to apply quick fixes to buy time didn’t provide the benefit we hoped for, so we got new hardware on order, and started migrating literally the second they came online. (And in the case of one machine, before it was even available to us, such was the urgency.)

    Even so, the downtime lasted much longer than intended. Primarily because of the lack of preparation. Most jobs (particularly those that result in downtime) are rehearsed a number of times to figure out the most efficient way of doing them with the least downtime. This was not an option in this instance, because we had to move to the new kit as soon as possible. This resulted in blunders that extended the original estimates significantly. (Ask Chris how he felt when I destroyed 2 hours of work at 6am in the morning….) Anyway, the best thing that happened was the acceptance that we would not be able to get back online anytime soon, taking a breathe, and then starting again.

    So, why the big rush to new hardware, how come we didn’t see that one coming? Since this site has started we have always been pushing our hardware, but once we noticed degradation in performance, we normally roll-up our sleeves and optimise. New hardware is not normally the answer to a slow site, badly designed scripts (scripts that don’t scale) are normally at fault. So, we had hoped in this instance, as before, we could make better use of what we had. But in this case, we were in a corner, and there was nowhere to go. So we needed to get to new hardware ASAP.

    We also upgraded numerous software components, including OS (now 64bit on the database) and database server software. Furthermore, we bundled up a few upgrades that were planned for the future. Rather than have another lengthy outage again later, we rolled it all in to one. Increased the risk, but it appears to have been a success.

    We do actually have plenty more to do still, but there should be no more serious disruption as a result.

    Lessons learnt to prevent this happening again? Always make sure we have more contingency on our server resources. As simple as that.

    Thanks for listening,

    -Russ

    Oh, and we are now using carbon neutral hosting. 🙂 Trees. We love them.
  2. Cavan, Ireland
    Joined
    30 Apr '07
    Moves
    3516
    24 Feb '08 12:46
    Thats good to know - I have an idea of what you guys were doing now....I think 😛
  3. Nixa, MO USA
    Joined
    04 May '07
    Moves
    16406
    24 Feb '08 13:35
    I AM NOT A COMPUTER "Techie" so I don't understand a word said in the above!! ALL I know is that the site is operating so much faster and smoother it's almost unbelieveable!! I can now lose my games MUCH faster!!!
    Gary Thomas
  4. UK
    Joined
    16 Dec '02
    Moves
    71100
    24 Feb '08 14:13
    Yes it is definitely pretty zippy now. I suppose with stability you can't really tell this early on, but it seems ok from here.

    Good job.
  5. Joined
    25 Jan '07
    Moves
    116633
    24 Feb '08 14:25
    good job guys!!!
    thnx a lot
  6. Joined
    27 Nov '05
    Moves
    3146
    24 Feb '08 14:49
    LOL, sounds like me, now I can lose much faster now. Well said my friend
  7. Subscribercoquette
    Already mated
    Omaha, Nebraska, USA
    Joined
    04 Jul '06
    Moves
    1114459
    24 Feb '08 18:02
    thanks for the info. i, too, am losing games faster than ever!
  8. The sky
    Joined
    05 Apr '05
    Moves
    10385
    24 Feb '08 18:05
    Originally posted by coquette
    thanks for the info. i, too, am losing games faster than ever!
    That would make a good slogan - "Join RHP and lose games faster than ever!"
  9. Joined
    25 Dec '07
    Moves
    5268
    24 Feb '08 18:52
    When I will be able to claim timeout win, during the suspended, 3 games, not finished, now I have 2 games timeout, but I can't claim the win.

    So, when it will be able to do it...
  10. Standard memberMrJohn
    A Chess Friend :-)
    Texas :-)
    Joined
    20 Nov '06
    Moves
    718
    24 Feb '08 20:036 edits
    Originally posted by Russ
    so we got new hardware on order, and started migrating literally the second they came online.

    Yeah, my first week selling computers for IBM (many years ago ), I sold a small (i.e.; air-cooled) mainframe to some guys in exactly your situation. It was a one-call close: they needed iron fast! :-)

    Anyway, I know what you went thru and I hope it will be easier for you, now. Thanks very much for your care of us. :-)

    John :-)
  11. Joined
    07 Jun '05
    Moves
    5301
    24 Feb '08 21:46
    Originally posted by Russ

    ...

    Thanks for listening,
    Thanks for explaining Russ. It is good to know what is going on/went on. I hope you manage to have some weekend soon.
  12. Standard memberthyme
    Undutchable
    was here
    Joined
    23 Jul '07
    Moves
    83545
    24 Feb '08 22:11
    Thanks. I think you did a really good job. Site performance has improved noteably.

    Lol about the losing faster .. that seems to be happening to me as well...
  13. Standard memberGalaKev
    Borderer
    Scotland
    Joined
    09 Apr '07
    Moves
    33614
    24 Feb '08 22:39
    Originally posted by Bebcho
    When I will be able to claim timeout win, during the suspended, 3 games, not finished, now I have 2 games timeout, but I can't claim the win.

    So, when it will be able to do it...
    Harsh, considering the problems with the site.

    Although in certain circumstances, I can understand your frustration. In three of my games, the players have not moved in 12 days in any game. One is the last game on a round 1 of a torno, and the next round can't progress.

    But not everbody has access every day, some may only log on, shall we say once a week at a library. Give it time, the site will hopefully be back to usual shortly.
  14. SubscriberRuss
    RHP Code Monkey
    RHP HQ
    Joined
    21 Feb '01
    Moves
    2396
    25 Feb '08 00:22
    Originally posted by GalaKev
    Harsh, considering the problems with the site.

    Although in certain circumstances, I can understand your frustration. In three of my games, the players have not moved in 12 days in any game. One is the last game on a round 1 of a torno, and the next round can't progress.

    But not everbody has access every day, some may only log on, shall we say once a week at a library. Give it time, the site will hopefully be back to usual shortly.
    Timeout suspension will be lifted within 24 hours.

    -Russ
  15. This is embarrasking
    Joined
    17 Nov '05
    Moves
    44152
    25 Feb '08 02:01
    Originally posted by Russ
    Timeout suspension will be lifted within 24 hours.

    -Russ
    Russ, you really should think about doing this stuff for a living some day. You're actually pretty good. I am glad you got some of the kinks ironed out. I can imagine the stress was pretty tough. But you know what, I think everyone pretty much understands. Don't sweat it. I'm glad you guys got it up and running again. Always remember.

    There are only two tools needed in any job:

    1) Duct Tape
    2) WD40.

    If it wiggles use duct tape, if it don't use WD40.
Back to Top

Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.I Agree