A defender has to think of thousand ways, an attacker has to think of just one.
First job, first experience and the new hires group I am part of, decides to have a Hackathon for the new hires. A Hackathon that is true to its literal sense. I said “Why not? Let’s do it.” I always wanted to be on the other side of the event though.
I thought to myself if I’m going to conduct a Hackathon, why not conduct one that is Microsoft wide. Truth dawned and I realized that I should stop kidding. New into the company and you want to test the mettle of the ones who have been in the system for this long, that’s not just bold, that’s probably many levels ahead of that.
Voila! The bunch of folks who took this project actually wanted it to be at least Microsoft IT wide. The gang-leader still tried to keep it low profile until we have something substantial, but a Hackathon, which was MSIT wide was definitely on the cards. I was like a kid who has been told that you have been given the responsibility to test the mettle of the entire world.
We brainstormed a lot for hours and then, started the marathon coding sessions. I thought that we will do this and that and see how it pans out. But by the end of the day, it was pretty clear that I am still no one who can pull this event off. All I had was the lamest SQL hack that anyone can embed in an application and the simplest form of application that can be reverse engineered (which I later realized that I wasn’t supposed to code it anyway as that was part of another hacking event). And on that, another epic moment dawned on me, how are we going to identify that someone was hacked and who was that someone’s hacker.
Bummer! All for so much enthusiasm. The gang-leader was still very much optimistic and ecstatic about whatever the day had delivered. I wanted to join him but all I could think of was either this guy is actually a pro or we are just destined to doom. We left at 2 that night. And to be honest, I had no clue what exactly did we do that took us so much time.
Taking a step back, one thing which was pretty clear in our mind was that there would be two different hacking events which would go on in parallel. The one that I was working on, was where every team would be given a server of their own which was running on a VM. A web form application, which had bugs, was deployed on these servers (VMs) with an extra catch that everything including the Application Pools were stopped. The other event consisted of a hackable central server. The latter event was more like a game where difficulty level increases with every difficulty one just surpasses.
Then began our next night-out-coding session. There were a few informal hangouts about how to go about completing this project, but this probably was the only proper coding session after the first night. There was something different about this session though. The moment I entered the conference room where we all were meeting, I was greeted with this giant and beautiful architecture of the whole system. God, that was big! The first thing that popped in my mind was who can actually even imagine drawing that. No perks for guessing, it was the gang-leader. He went on to explain how he envisions the whole architecture. The architecture wasn’t complicated per se. In fact, if I were to draw one such, I would also have drawn one which was something closer in resemblance. But the point is – It was just big… insanely big.
Anyways, we started to code. I picked my designing part of the client side server. In our informal hangouts, we had discussed a few vulnerabilities that we can use and expose for exploitation. Still unclear about how to identify the hacker and who hacked whom, I picked on six or so vulnerabilities and started working on them.
A hacked webpage would lead the server script to crash. In other words, an exception would be thrown. It was this Error 500 that we relied on to identify the hack. A custom error page for Error 500 was designed. When a page crashed, it invoked this page. This error page, in turn looked for a shared folder where flags (GUIDs, basically) mapped to the corresponding user’s crashed page were stored. This GUID was displayed to the attacker which he had to submit to us so that we can identify the attacker and the one who was attacked. The folder was refreshed with newer flags to maintain extra cautiousness against a “friendly-play” (Friends generally follow the “tu bhai hai naa!” (Come on man! You’re my bro.) strategy. They’ll ask their losing friends to share their flags. The already losing friend thus decides that they are losing anyways, why not let your best friend win.)
Though this flag generation and distribution module might appear small, it was the most critical module in the whole architecture as it was the only way we could identify someone who did the attack and the guy who was attacked. To keep it totally aloof from any crash, it was again divided into two separate modules – one module just cared about the flag generation and the other one just cared about the distribution process. This helped in avoiding deadlock conditions and conditions where a lock could be on hold for long when the files were queued for multiple read operations. Any key assigned that was older than the currently generated one by a level of 2 became invalid. Further these were developed as Windows services which started on boot to keep it hidden from users’ view.
After 3-4 such marathons, we were ready with everything. Everything… Sigh! So much for the hullabaloo. Then the testing season started. And the initial test phases passed off with flying colors. I was actually ecstatic. We syspreped the VHDs for client and server thinking that everything that was supposed to be done is done and we were ready for the word Go! But, thanks for one curious fella who still wasn’t satisfied.
We sat one night together to test it out from head to toe in totally isolated conditions with none of our credentials involved. And then every horror that we could have imagined came to life. The feeling could be described in two simple words – Nothing worked.
The first thing that we found was, while creating the application, I left my credentials involved somewhere in it. So, when the system started in total isolation (as an administrator), the application looked for my credentials (which never existed in the first place on the server) instead of kicking off things as an administrator. Trying to debug it piece-by-piece became a pain in the butt. Realizing that it probably would become much more tedious, I created everything from scratch without using my credentials.
The next thing we found was, no matter how current the flags were and howsoever valid they happen to be, the central server said the flag being submitted by the capturer is invalid. We banged our heads to figure out the issue for nearly more than 3 hours but to no avail. With no plausible visible issue and out of frustration, I started counting the number of characters in the GUIDs (on someone’s suggestion, can’t recall who). Much Ado About Nothing. It turns out that flags being generated and the ones being distributed differed by that last always eluding character. But hey, anything that fixes your pending bug is soothing.
Then it turns out that a person can hack himself. That probably could have been the worst thing to have happened. All everyone had to do was submit all his flags when he failed in his endeavours to hack others. Yes, he would lose flags for submitting his own flag, but then he would get the points for at least submitting the flags, won’t he?
This whole thing was already blowing up in our face. What if something like this happens when the event is actually on? I can’t even fathom the consequences. Anyhow we held our senses instead of going into panic mode and carried on with our work.
This time to be extra cautious, we created multiple fake participants and started the game again. Surprise, surprise! It blew up again. The culprit this time again were the bloody flags. Though it may not appear as a big issue, it was very subtle. The distribution of the flags was pathetically slow. Be it a network issue or a processing power one, this should not happen. What if the flag expires before it is actually distributed by the distributing service. The flag owner will go on to be the best defender without even touching his system. Small fix, but it definitely needed one.
All it needs is that one last kick to make you feel that you just can’t do it. Feeling that somehow everything is working as it is supposed to work, I was beginning to feel now that maybe, just maybe, we can pull this off. But there has to be that one last thing. The last night before the event, we again tested the whole scenario from end to end. The issue this time couldn’t have been subtler. We realized that after about half an hour or so, everything just about came to a freeze. Something was eating up the whole memory. We always felt this but we never paid any heed to it, given the kind of issues that kept coming up. Even after a reboot, the same thing happened. Looking up the memory usage, it was clear that the utilization shot up just after a few minutes. It was just that it was almost a complete freeze which happened after that long duration. The issue this time was with an exploit we wanted the gamers to explore. But as the general bugs are, the developers don’t have any clue about it.
I, for one, actually felt as if I would get lost into the code just by looking at it. The code was just perfect, at least it appeared to be. Think man, you just don’t have that spirit. Which coding principles did you evade when coding which led to this? What could have been that blunder?!
Logs, they always come in handy! It is one thing which differentiates between a good and an awesome programmer. One doesn’t understand the essence of logs unless he experiences it firsthand. Always leave a trail somewhere so that you can think through your mess. The trail is your guide to improvement. When staring at the code didn’t work, we sorted to logging. A dummy log was created to look into the issue and bam, there you have your fix! The issue this time was an open port which was left opened in the memory and was never closed. But ports? I didn’t do any port.open() thingy. And here fellas, you realize another truth, “Why one should not rely on garbage collection!” There was a disposable object which was created and was supposed to be disposed which opened this port. So, we had to dispose it manually. Tired of all the staring, we syspreped for the upteempth time and then left with a sad face. Maybe this was just not supposed to be.
So what’s the point of this all? Why this big write up? Developers just write the code and think that they have won the battle. All it took was a few night outs to realize what a failure it can be if it were only the developers who drive this tech industry. Testers know their way and man, they know it well! They know how to break and what will make the system break. They are the dudes!
And as for the event, it went kickass! :)