A digest of all things wϋnderful

Posts in category Uncategorized

Email Jail

Email Jail

It started off with a conversation like this

Dave: “We are going to turn off access to your email while you are on vacation”

Me: “What?!??!! How much access? What about emergency issues?”

Dave: “Yes, your Exchange account will be turned off on the evening your vacation starts and then enabled again when you return”

Me: “What if a server goes down? What if a client needs something critical? What if there is a major disaster that needs to be fixed?”

Dave: “We will be fine for a few days, your only task is to document the process when you return”

So the journey began. I had always kept track of my email on holidays (not obsessively in my opinion) but in moderation to see if there were critical items that needed to be addresses by me. I mainly did this so that I did not have to spend one to two days digging out of a huge inbox collection that would build up while I was away. I also wanted to make sure that nothing was dropped while I was away and there would not be a huge issue awaiting for me upon my return.

As I was closing out items on my last day before holiday I went through the usual processes. I setup my out of office message informing anyone who contacted me via email that I would be unable to respond until I returned and to contact our main office if it was an emergency. I discussed with my team outstanding items and gave them my personal email address to contact me in case there was an extreme emergency. I went through my items in progress, help desk tickets and outstanding sales items. Everything was ready for the transition, except I received an emergency call from a client 20 minutes before the cutoff point. I started to work the issue and gave as much assistance as I could before rapidly writing notes to pass along to another sales engineer.

Into the Great Unknown

Then it happened…..

I received the personal email that my account was disabled. I immediately checked my accounts and sure enough, there was no access. Many things went through my head at this point. I could login with the admin account and simply turn my account back on with no one noticing. But in the spirit of the experiment I resisted this temptation. I did see that I still had access to my Skype and help desk accounts so I can check moderate things but overall my main email was now locked.

During the Lockout Period

  •  Day 1

After going through the entire night without email I awoke with a bit of panic. My usual morning routine is to clear out the email items that have collected the night before, issue responses and in general give a little maintenance to my inbox. It felt really unnatural to not start my day with an activity that I almost religiously do.

My wife was enjoying her morning coffee and in our conversation I kept bringing up the fact that I could not check in on things and did not know the status of certain items. My wife especially enjoyed seeing me squirm as she is very appreciative of the experiment. After about an hour or so, I started to do other tasks around the house and my mind slowly forgot about my earlier panic. By noon I was easing into holiday mode and things were getting easier and easier to forget.

By the time that dinner rolled around I started to feel the need to check on items. I work about an 8 hour time difference from our head office and the early afternoon to evening is when the most activity occurs for me.  Again it really felt like an out of body experience.  I actually opened my iPad and pulled my inbox down in hopes that a refresh would magically happen.  It did not.

  • Day 2

I awoke the second day still missing my morning routine but today came with more ease.   Thoughts of checking entered and left my mind with not much distraction during my day.  Since we did not have internet at the house where we were staying, I was only reminded of checking when my wife’s sister’s kids complained about not being able to check Facebook or download the latest app for their phone or iPad.  We did go into town for dinner this evening and there was WiFi at the restaurant and made you could tell.  EVERYONE had their phones out and were taking pictures of the dinner, posting them to Facebook and other services.   Which makes me ask the question.  Is is more important to have proof you did something than actually doing it?

I did not bring my phone but forgot that I had my Skype account installed on my wife’s phone.  When she connected to the WiFi there was a message on that account.   We argued about me checking it but I defended this by saying that I was still on the hook for personal email and did not have access the majority of the time and a Skype message counted as reaching out through personal mediums for a response.   It turned out to be a client that I have a very good connection with (we had done business when he was on holiday earlier this year).   He had a simple question that needed to be answered.   I responded to him via personal email explaining my email situation and that I could get a response in the next 2 hours if he emailed me his question.   He simply responded “It’s OK, it is not urgent and we will address it next week.  Have a good vacation.”   At that point my mind was completely put at ease that if the most crucial clients accepted I was away then everyone else probably did as well.

  • Day 3

Bliss.   Really that is all I can say.   Things finally reached normality again and it felt like when I was a kids (when there was not or very, very, very, unbearably slow internet.  Think BBS boards).   We spent all of our time outside enjoying the weather at a nearby lake.   We created games from imaginations and surroundings as well as remembering a few old games (for those of you who don’t know CUPS [not the printing service] ours was not as polished as the video though).  Basically it felt like summer holiday should be.   We also had some of the kids mention that they had not been able to have real conversations because either they or their parents were on their phone or tablet most of the time at home.  There has been a recent video going viral on the web that I think does a very good job illustrating this point “I Forgot my Phone”.  A large family cookout that night making Hungarian Gulyás certainly washed away any thoughts of checking email.

  • Day 4

Anxiety.  I thought I would wake up to the same euphoria that I had experienced the previous day but that was not the case.   I had an uneasy feeling of anxiety about what was going to be waiting for me upon my return.  Was there some big issue that needed my attention that had festered for a few days and was not wildly out of control?   Was there a client with an issue that was probably small but due to my holiday causing additional wait time made their mood needlessly sour upon my return.  Don’t get me wrong, my last day was still great but in the back of my mind I always had these feelings of anxiety creeping in.

 First Day Back

mountain of mail graphic

As expected my inbox was stuffed full like a turkey on Thanksgiving.

How much?

Inbox count 1,186.  That may seem extreme to most but keep in mind that part of my job is systems administration and I received a lot of status emails from the various services whether they are running correctly, have a warning or an error still report via email.   All told when the service emails were weeded out I was down to about 350 meaningful messages.  I had considered the “Email Bankruptcy” technique that Wired Magazine suggested but thought that was a bit extreme.  Instead I took the time tested approach of pruning and prioritizing.

How to address them?

  • Outlook Conversation Mode:The next task was how to make sure I answered the most pressing issues first.   I found that Microsoft Outlook’s conversation mode made things MUCH clearer to see as I did not have to go through all the responses of an important email to the “all” list in order to see the root issue and if it applied to me.  At this point I found that there were actually about 41 emails that needed my direct attention that day.
  • Set filters and use flags: As an admin, I already had many filters in place so the most mundane emails could be deleted without needing to be read (systems logs, software updates, notifications).   I then used flags to highlight the most important emails for the first day back, the next few days and answer by the end of the week at a minimum.
  • Let the senders of critical items know there will be a delay: The first step at this point was that I emailed the most crucial saying that I would be a bit delayed in my response due to my backlog from my time off.   Not something I normally do but after reading tips, this seemed like a good idea.
  • Work through your critical items with your inbox window minimized and incoming notifications turned off: Next were the items that I considered to be “on fire”The client that was having the issue had accepted that I was away and was OK with waiting a few days for a response (not the big deal I made it out to be in my head).  The problem was that my train of thought was constantly being interrupted by newly arriving items.  The solution was to simply minimize the inbox window and turn off all new message notifications.   I found my concentrate and ability to process responses rapidly increased.

Time?

I was able to dig out to a relatively stable status in about 2 hours.  I am sure things could be faster when I do this process again as I was doing things for the first time but all in all it was not that bad.

Going Forward

There are a few things that I noticed that I could do to minimize the impact of the email backlog the next time I am away.  Here are some things that I will try in the future to reduce the total amount of email.

  • Unsubscribe or remove useless email: I received a LOT of systems log updates and over the years I have been added to many mailing lists in my company to the point where I am receiving duplicates of mail messages.  On a daily basis this is not bad, I just do a quick delete when they come in and move on with my day.  However I am starting to take the Apple approach to notifications on this issue.  If something is alright and there is no need to give it attention, should I receive a notification?  I don’t think so.  This is what monitoring applications and services are for.   I plan to get a service like Nagios fully configured for all our services and if I want to know if something is OK, I can GO THERE and see that the service is up and running instead of clogging my inbox.  I really only need to be notified if something is wrong or about to cause an error.  I will also be unsubscribing from the long list of online stores that I have ordered things from over the years.  Their ad emails keep clogging my inbox needlessly.
  • Take the same process I used on my return and apply it daily or at least a few times a week:  There were many efficiency items that I discovered upon my return and they effectively let me minimize the time I spent in my inbox.  I will apply some to my daily routine but I also thought about using the entire process on returns from shorter breaks (think Monday morning after you weekend break).  This might relieve some of the anxiety in the future when I go on holiday as there would not be as much backlog when I return.

Success or Failure?

Overall I have to say that the experiment was a success.   I learned a lot about the habits I keep as well as many things I could improve in my daily processes to be a more efficient user of email.   It was not easy, that I have to say, but it was needed and in the end I became better because of it.  There are a few items to address for the next time I do this experiment of if others try it.

  • Always on society – this is a bigger issue than I want to tackle here but it became pretty apparent that when you are unplugged for a few days you really see it from the outside looking in.  How much is enough connectivity and too much.  The lines are blurred but a total disconnect really makes it apparent.  The video I posted earlier demonstrates this very well  “I Forgot my Phone”.
  • Fear and anxiety you will be needed for a critical task and not available or worse yet someone creates more work by doing it wrong a different way.
  • Digging out upon return – It was not as bad as I assumed and the majority of my email was filled with useless notifications.  Unsubscribing and trimming down this set should make for a more manageable inbox on a daily basis as well as when returning from a long trip.
  • The feeling that you are not needed – when you leave and your projects run smoothly and clients are happy this should be a good feeling.  When you return and you ask how everything went and your office response “just fine”.  This again should be a good feeling.  However there is a side affect to all your hard work and preparation, you feel like you are no longer needed and anyone could do your job.   This is obviously not true but there is something in the subconscious that creates this feeling similar to the Andy Griffith Show where Opie and Andy are left as housekeepers when Aunt Bee goes on a trip to Mount Pilot.

Potential Problems?

  • What to do with admins that have access to the Exchange server to prevent the temptation to login via RDP and reset their password or unlock their account?
  • What about forwarding their email before their access is shutoff to a private account and communicating that way?
  • Skype (Phone/Chat Software)? Help desk software? Other independent services if SSO has not been implemented
  • Access to other services that don’t communication via email, IE web services?
  • How do priority items filter through when your email account is shut off?

Windward Code Wars 2012 – Think You Can Do Better?

Windward Code Wars 2012 – Think You Can Do Better?

This last weekend Windward hosted its first annual code war competition. Many of you may ask what is a code war exactly. As we have defined it a code war is a competition where programmers design an AI (Artificial Intelligence) that must solve a programming problem in a specified amount of time. The goal of the AI is to best the other competitors taking home bragging rights for the leanest, meanest, most intelligent program above all.

Windward asked for submissions from many of the top 25 computer science universities around the world. The code war was setup to run all submissions at each school and then the top 2 finalist would move on to the semifinals and championship hosted on Windward’s servers.

Our basic problem was to build an AI modeled after the game RoboRally (which Windward staff plays frequently). The basic premise of RoboRally is that you are a robot in a factory. You can move forward, backward, left, right in any combination. The goal is for the robot to reach each of the flags on the board without dying. This sounds fairly simple but there are many obstacles that stand in your way such as conveyor belts, pits, lasers, walls and most of all other competing robots that also fire lasers. A basic game starts by the user drawing 9 cards and choosing the best five of those cards to get your robot to the flag. Each time you are hit by a laser, you lose a damage point, lose all 9 damage points and your robot powers down where ever it currently resides on the board (also leaving you open to fire from others, conveyor belts, laser, pits, you get the idea). You have three lives as a robot and when maximum damage is reached or you fall off the board or into a pit, you lose one of these lives. There are chances to heal (by landing on a flag or health square) but these are few and far between.

Windward had to make some changes to bring this game into a digital age as well as makes sense for an AI. For a full description check out the game rules we implemented here. Basically it came down to the decision by the development team if we should reward combat or not. After a few internal code wars, we decided that reward combat was best so AI’s would not turtle in corners away from combat. Our basic game consisted of a server written in C# that presented the game board and accepted network communication from each game client. The clients were written in C#, Java, Python and C++. Each of our developers assisted in getting all of this running and tested. As is with all software, if a bug was found our rule was that you could report it and exploit it. Only the team that reported the bug would be informed if a patch was put in place but we also would not inform the other teams that the bug existed. Tactical advantage depending on the bug and your viewpoint.

On January 28th, we presented the code war in 3 times zones. Logistically this had its challenges. We did not want to make the problem present until 10 AM in each time zone. This meant setting up several different meetings and getting our staff ready to handle each timezone and the questions they may have. In short we used a combination of gotomeeting and UStream for the audio video and several email aliases to handle the flood of questions. Maintaining the FAQ page became a fulltime job. Over the next few hours we kicked off all 3 times zones and then we had a little break before the real chaos hit.

10 AM kicked off and we had the east coast schools up and running after a short description of the problem. About 20 min later the questions started coming in and man did we need everyone there to stay on top of them.

 

As it approached 6 PM east coast time (competition was from 10 AM to 6 PM) the regionals were being run and the final code submissions started pouring in, literally. This is where we will be much more organized next year, we had no standard submission process for the code we received. We received programs in all forms from the single file/files they change (we loved this, drop files to overwrite the stock client and you are off and running) all the way to fully compiled source binaries and other full clients in all (this was the most difficult to find which client was used and then get the code to compile on our servers). Somehow we frantically troubleshot all the programs over the next few hours and were ready for the last regional hosted by our found David Thielen at Harvey Mudd College.

The Mudd finals were hosted live and there were several rounds. From what we heard and saw, all of the students were having a good time cheering as their AI’s drew combats against their fellow classmates. We also received many emails from the regional proctors and professors saying how much fun their students had. When the final Mudd regional aired we then had limited time to test the final programs and get them loaded onto our servers to broadcast live for the championship. We held three semifinal rounds and the top 2 teams from each semi as well as the top two third place aggregate entries were entered into the “elite eight” for a showdown to claim the title of Windward Code War 2010 Champions.

The final turned out with the following placements with the University of Wisconsin – Eau Claire campus taking grand champion, followed by the University of Massachusetts as runner up and Purdue University in third place. The additional results as well as live streaming of the finals can be found here. The contestants were thrilled but I wanted to take a moment to see something that I found in the code.

Now keep in mind that we ran 10 games in the final the best score aggregate out of all ten games was the winner so we did do several runs. Being I have a slightly skeptical nature I went ahead and ran the final with two additional runs. The results are below, and I did find some interesting things.

First Run

HMC_League_of_SmashCraft

72

102

78

74

125

51

55

37

88

51

733

UMASS_Skyward_Nord

80

47

54

47

76

53

71

87

65

68

648

HMC_Trolololol

57

49

76

32

44

39

55

36

74

70

532

UW_Grande_Letra_O

54

32

39

43

51

37

61

80

33

43

473

PU_BotBebo

17

31

83

53

35

78

47

32

40

30

446

GATECH_dBuggers

44

42

37

26

31

57

81

59

10

43

430

CU_PEDOBEAR

11

37

13

26

45

73

41

17

47

31

341

CMU_Siddant

0

53

71

38

4

29

10

10

1

10

226

 

Second Run

PU_BotBebo

70

73

46

38

59

59

32

51

75

34

537

HMC_Trolololol

43

72

65

30

45

53

39

76

43

52

518

UW_Grande_Letra_O

44

40

77

48

73

26

48

48

56

55

515

UMASS_Skyward_Nord

52

69

40

43

42

53

66

31

44

66

506

HMC_League_of_SmashCraft

119

26

38

34

44

39

68

62

31

32

493

GATECH_dBuggers

46

67

25

41

62

39

17

19

62

22

400

CU_PEDOBEAR

34

13

40

38

43

25

19

22

34

41

309

CMU_Siddant

15

4

14

16

58

32

3

20

20

16

198

 

What I found was that the top 5 teams were consistently there just in different order each time so on any given “Saturday” these teams could win. The others were consistently fighting for the bottom slots. This made me wonder what was going on, especially in the first additional run I did where team League of Smashcraft from Harvey Mudd College trounced everyone.

Team LofS had what I think is a great strategy. Watching them I saw their AI was moving from flag to flag but trying to encounter combat all along the way. Remember you are rewarded for combat as well as reaching flags and when a flag is reached you regain health. When team LofS was down at a corner flag and there were lots of other competitors trying for the same flag, they would move off flag, take a shot and then move back on the flag. This yielded maximum points from combat and flags while regaining health each time. As you can see in the first additional run, this strategy ruled. However when this team had to move between flags they ran the risk of getting killed early due to so much combat encountered before a health regeneration could be achieved. End result is that over many runs that LofS usually died ¾ the way through the round but racked up lots of points before doing so, thereby solidifying their place at the top.

Team Grande Letra O from University of Wisconsin – Eau Claire had a much different strategy and I am still trying to see a pattern without diving into the code. From what I could observe from their runs they relied on a combat escape sequence. The escape sequence came into play early, they usually started behind in points, while they avoided intense combat. But as the player pool dwindled down they were then forced to use their combat side of their AI and at that point they would always mount a phenomenal comeback.

So after reviewing these two we at Windward have decided to look at the submissions more closely to see what each of the implantations did. This is where I have a request to the open source community. We got a pretty healthy mix of Python and Java submissions followed by C# and a few C++. I would like to send a challenge for anyone who is willing to do the same and see how you stack up against our finalists.

Keep in mind if you really want to compare yourself you will only work in the eight hour timeframe we allotted for all our teams. I am curios to see how everyone turns out so you if you do take this challenge or host it for a team of your friends, let us know your results at codewars@windward.net

 

http://www.windward.net/codewar/Windward_Code_War_2012.zip

 

 

Windows Update Erroneous Error – Black Screen After Boot

Windows Update Erroneous Error – Black Screen After Boot

I went to do my daily check of our external servers when I noticed that one that hosts several of our virtual machines was not responding. Since I have a Dell RAC card installed I logged in via the web console to see what was going on. I started an active console session and saw nothing but a black screen and the server was unresponsive. I figured this may have been a glitch in the RAC card web console so I went to reboot the machine via the console. All reboot and restart options where greyed out. At this point it was time for a field trip to the data center.

 

Upon gaining physical access to the machine I saw there was nothing but a black screen on the console so I manually rebooted. The memory test completed, RAID array verification messages flew by and then nothing but a black screen. No error messages at all. At this time I knew I had a sick server and decided to take it back to the office for further investigation.

Upon arriving at the office I loaded up Bart PE to take a look. The hard disks were present and the volumes were intact but diskpart.exe was showing an error on the mirror. My first thought was that there were bad sectors on the disk so I went to run a chkdsk to verify this. My second thought was the mirror was bad so I was going to break the mirror, check each disk and rebuild with a new disk if needed.

To launch chkdsk I put in the Windows Server 2008 R2 boot disk and booted it up and selected the repair option. I then ran the memory diagnostics. After a thorough check unbelievably the server started like normal. A quick check of the disks showed that the mirror was intact and everything was functional. I decided to run chkdsk to scan the disks for errors. This needed to be run on a schedule reboot. So I rebooted the machine and……….Same result. Memory and RAID diagnostic messages followed by a black screen and no response from the console.

After a LOT of digging in log files, I found out what was finally happening.

I figured out that Windows Update put a read only lock on the local Windows Update DB.    The updates never successfully completed on reboot.  When I tried to reboot again the updates tried to complete one last time but gave up internally.   At this point a bug was encountered that erases the boot loader config which tells Windows which volume and partition to launch.  Basically it cleared out the boot loader file.   The only feedback I got was a black screen after the memory tests when it booted.

I ran a memory diagnostic again and exited halfway through.  This allowed me to boot the OS, and manually rebuild the boot config file.   However a reboot committed the same bug.   It was only after doing the whole process a second time that I set the network to DHCP to connect to the internet, uninstall the windows updates that recently came in and reinstalled them and rebuild the windows update DB.  After that the updates came in fine, the DB was no longer read only and rebooted successfully each time. 

Both hard disks turned out to be fine but I had to break the mirror in the process as I thought the bad mirror was adding to the issue.  I rebuilt the mirror and server is going back to the data center this AM to be returned to production.

Windows Update robbed a few hours of my life. I would like them back but alas, such is work in IT. Just glad to have the production server back online and I hope others find this useful.