28th March, 2017 and 09:00 AM.
Waiting at the office elevator in the ground floor. The lunch bag in one hand and the cell phone in the other hand….
I have the habit of checking emails in my cell phone as the crappy elevator often gives me nuts to wait for a long time..
Dragged down the messages in the email app to get the latest.. One message caught my attention, the subject says “Site down, Urgent”
Oh..yeah, there are a plethora of such messages with the same subject, as I dragged the email screen down and down and down….incessantly.
All those emails started coming from 06:30 AM today…. Unfortunately, I didn’t get a chance to peep into my emails earlier today due to sleepless night yesterday (don’t ask me why, it is a different story.. 🙂 ) woke up late, had quick makeup for work and just rushed to the office..
Quickly opened the latest message.. It says, ” ….. Mr. M (//my respected boss’s name//) & company have made our site down…”
“F..king shit..”, were the words came out loud from me automatically… The persons standing beside me were giving me strange looks…
I didn’t care a damn, started reading the details into the other similar emails and while reading only rushed into the elevator, as I felt some sense of commotion from the fellow passengers.
Well, those emails were from one of my clients, saying his website was completely down since 19:00 that day..bombarding us and complaining the performance of the system..etc. and asking us to bring up the site back ASAP…. It is in production and nearly 30000 customers are having accounts into that site…. We are having the maintenance contract with them..
“Seems like, my day started crappy…Hmm..”…but my thoughts got interrupted, as I heard, “third floor…” from the lift boy…
Given him a thankful look and got out from the elevator.
While walking, punched the reply to the latest message, “Just reached office, we will take a look and get back to you…” and hit ‘send’ as the security was opening the main door for me.
Before going to my office, had a glance at the relevant developer seat. Empty.
Switched on my workstation monitor and logged in. Searched for the production host in the huge list of saved putty sessions in multi tabbed PuTTY.
Found the entry and double clicked on it. “My bad….” The putty console gives message, “unable to use key file, the location,blah blah blah..” Not able to login into the production Linux!
Gone to the location, E:\ in my desktop, to check for the .ppk file. Boom.. There was no .ppk file there for the login auth.
It was good, any way because it made me know why I was not able to login. My hard disk was been replaced recently and forgot to take the backup of that .ppk file.
I didn’t have much time to fiddle with keygen tool to generate a new key and all, as it is production issue. So, called up my QA person who was working remotely due to maternity reasons.
She answered my call, ” looking at the site down emails…”
Made her interrupted, “OK. Please login into the production host…and see what’s happening…I’m not able to login….here…”
“Can you send me the .ppk file you have, in an email asap?”, I asked her, as I wanted to debug on my own because of the issue severity and priority…
Got the .ppk file in 2 minutes… Meanwhile, figured out (googled) how to drag a ppk file into a putty saved session for the auth purposes.
“Voila”… logged into the production host… Irony is, I had to figure out some other stuff before figuring out the actual problem.
Just before starting my investigation, I had a look at our internal chat messenger contact list to see my in-house developer was at desk. No, his icon was still offline.
OK, and in a matter 5 minutes, found out (using some networking and Linux commands) that some body (???!!!???) rebooted the production box at around 19:00 that day and as some services like tomcat, apache etc. were not started on system reboot.
And the resolutions were pretty quick and cake walk, because I made our guys to document all the necessary steps in our local redmine portal.
The next steps like informing the client and deducing the new task to add some init scripts, happened pretty fast.
The reason, why I’m posting this story is not like we have got a problem, found the root cause and giving the resolution.
But it is about how the persons behave on their loss at others, seeking help, in a helpless state. Seems like, we need to think like, if such problem happened to us, how we should feel and behave instead of deriving useless meanings. It should be productive.
I don’t know.
And in a different context – this is appraisals period. Any hung-up souls should REBOOT themselves to refine or rejuvenate their inner resources in order set the targets for the coming new year… Happy Ugaadi.