# Git – VSO (Onboarding/Migration from TFS)

Overview

Before we start with Git (in VSO), let’s just visit/revisit the difference between Centralized VCS (version control system) and distributed VCS. With centralized VCS, repositories are stored on a central server and developers checkout a working copy, while with distributed VCS, developers themselves maintain a copy of the entire repository with its history. Carrying this forward, TFS stores the entire repository along with its history on a central server, while with Git, developers clone the entire repository including the entire history to their machines and maintain it. So, as we can see, the version control paradigm itself is different when it comes to Git.

• Branching and merging – Branching and merging work like a breeze in Git. Every time a developer has to test something on the current branch, all he has to create a new branch on his own local repo, without touching the main branch’s code (Since everybody maintains the entire source base, VSO acts as a reference which can be thought of as untouched code base).
• Offline repo access – Every change made, every code check in, the entire change track is available offline. One never has to be online to retrieve history of any check in (think, relief from VPN connection).
• Saving your team from build breaks – Any change you make, you do them on your own branch, this saves the hassle of rushed up check in and thus causing potentially unstable changes to your team’s reference repo.
• You commit constantly, thus saving your work every time you code – As most of the developers code on their own local branch (without affecting the central repo), you can always check in your code to your own branch. This saves the loss of change which might happen when doing a blind “get latest and overwrite”, especially during the peak time of essential and important code changes.

• Initial learning curve – The first and the most-cried about disadvantage of using Git is the initial learning curve.
• Code duplication redundancy – Everyone maintains the code base, so it leads to the source code redundancy. But this again can be viewed as a boon as no one has to depend on the central server for code base. Anyone can clone the Git repo from any other developer’s machine too without going to Internet.

Before we begin

First off, this document relies heavily on command line tools. So, be warned.

The tools that might come in handy.

• Git (obviously)
• PoshGit (for a rich PowerShell integration)

Other things to do/keep in mind:

• While installing Git, use the option to integrate with Windows command prompt and also let Git be added to the PATH variable.
• Enable Alternate Credentials in VSO so that your PS/Cmd commands can talk directly with VSO. You can do that by opening your VSO account and follow the steps as shown in the below pictures.

You can also set a shorthand username if you want and then use that username whenever prompted on the PS/Cmd prompt.

Creating a Git repo

With this small introductory difference, let’s start with the basics in Git.

There are two things that can happen when it comes to project tracking in Git.

1. You are starting the project. So, you basically need to import your project into Git.
2. You are cloning an existing Git project from another Git server.
• Let’s start with first method. If you are starting a new project and you want Git to track it, you will first have to set up your project location for Git. It can be done with a very simple command-
git init
The above command creates a new hidden sudirectory .git which contains all the necessary files including the repo skeleton. Now, this has to be kept in mind that initializing the folder with Git hasn’t added all your files to the tracker. Only the folder was setup for Git tracking.

If you use PoshGit (just launch PowerShell in the current folder and run git init or open PowerShell in a Git folder, PoshGit automatically picks it up) for the above command, you will see that a red lettered count has appeared beside a name master. This indicates that there are count amount of files/folders which is still to be tracked by Git. Now we can add all these files for Git tracking by the command –

git add * // Adds all the files and folders in the current folder. You can also add individual files similarly.

Let’s commit these changes for an initial commit and our first git project is ready.

git commit –m “First commit”

Note: There would be cases where you would want some files to exist but wouldn’t want them to be checked in. The prime examples are the bin and obj folders. You can do so by mentioning their path in the .gitignore file which exists in the root folder of your repository.

• Now, let’s look into the second case, which would be the most used use case. If you have to get an existing Git repo, so that he can start contributing/deving, you can do that with a simple command –
git clone <VSO_URL> FolderName
The above command creates a directory FolderName, initializes .git directory and checks out a working copy of the latest version from the main branch with all its history. Cloning in Git can be equated with Creating a Workspace and do a Get latest in TFS.

Note: Checkout in TFS is not the same as checkout in Git. In TFS, when we checkout a file, we open it up for editing. While in Git, when we do a checkout, we switch branch to its last/latest commit. So, if we do a checkout on the current working branch, we lose all our changes in our working branch and our Git pointer points back to the last commit.

Migrating from TFS to Git in VSO

The simplest way to migrate from TFS with all the commit history is to use the Codeplex project git-tf which depends on JRE. You can migrate your commits to Git with git-tf installed with the following command –

git tf clone http://TFSPath:8080/tfs/CollectionName $/Project/Main FolderName --deep This will clone the TFS with its entire commit history in the folder – FolderName. A sample output is shown below. The problem that we would face right now is that the username against which all the check ins have been made have their aliases instead of their email names which Git uses to track users. If we do “git log” to check the commits made, we would see something like this – As is evident from the above image, we have the domain aliases instead of the emails. We need to re-write the history of these commits to reflect the emailNames. Git provides filter-branch command to overwrite history, which obviously makes it a very powerful tool. Using this command is considered a bad practice in an ongoing project. So one should use it with caution and only in cases of rarest of the emergencies. We are going to use filter-branch command in a git script to re-write git history to map the users with their commits. The following script replaces all the emails in the above git log with their email addresses – git filter-branch -f --env-filter ‘ // Mark that there are two hyphens before env-filter ALIAS=”FAREAST\\\alpha” // “alpha” is the assumed alias under the blurred line above CORRECT_EMAIL=”alpha@microbeta.com” if [ “$GIT_COMMITTER_EMAIL” = “$ALIAS” ]; then export GIT_COMMITTER_EMAIL=”$CORRECT_EMAIL”
fi
if [ “$GIT_AUTHOR_EMAIL” = “$ALIAS” ];
then
export GIT_AUTHOR_EMAIL=”$CORRECT_EMAIL” fi’ -- --all You can find the list of all the aliases to be updated above by the following command – git log –format=’%aE’ | sort –u Difference between local branch and remote/origin branch One of the core fundamentals where Git and TFS differ is in their implementation of branch mechanism. To begin with, as Git is distributed VCS, the notation of branching exists on every developer’s repo. This goes on to say that a developer can create on his own local system, without even touching the reference central server, when used in integration with VSO. So, the branch on developer’s machine can look like this – Even when the remote reference branch looks like this – As can be seen from the above two pictures that the first local repo has an extra branch Harsh2 which doesn’t exist on the remote reference repo in VSO. However, as we know from our TFS experience, branching in TFS happens on the central TFS server and we just map our workspaces and checkout files for editing. Branching and Merging No introduction to Git is good enough without actually explaining how Git manages branching and merging. Branching in Git is as simple as git branch BranchName You can, then, switch to this branch by git checkout BranchName Or simply, you can use a shorthand git checkout –b BranchName to create and switch directly to branch – BranchName. The branching mechanism is so simple that developers branch out from the feature branch even for a bug fix so that they don’t end up checking in a faulty code even by mistake. Once done with the required changes, they switch back to feature branch, do a merge and then delete their test branch, if they deem fit. Let’s take this in more detail with a use case. Your team Alpha is working on project Omega which uses Git as VCS. A bug suddenly came up in the main branch which needs your immediate attention. How will you go about it? • Commit all your changes that you were working on (in your local branch). • Command – git commit –m “Some relevant message” • Switch to the main branch (say – main). • Command – git checkout main • Create a new branch from the main branch for hotfix and switch to that branch. • Command – git checkout –b hotfix • Work out all your changes for the fix and then commit. • Command – git commit –m “Hotfix for the bug abcd” • Switch back to the main branch and merge. • Command – git checkout main git merge hotfix • Now that you are done with the fix and merge, you can now delete the hotfix branch, switch back to your local dev branch and carry on with your work. • Command – git branch –d hotfix // -d – Delete option As can be seen above, branching and merging is obviously a breeze in Git. Often, the files that a developer is working on, isn’t only being worked upon by him. So, this would obviously lead to conflicts when checking in. One can merge that with git mergetool, or do a simple merge inside VS and then do a check in. Note: During any point during development, if you want to check the status of your changes in your project, you can use the command “git status”. It gives a list of all files that have been changed, added, deleted or renamed. Git not only keeps track of all the files which it has been tracking in the Git folder, it also can see all the files which have been created but not yet tracked. The command “git status” shows these files in a separate bucket as untracked files. A sample “git status” output could look like this – Rollbacks The simplest way to rollback a change on a file to the last commit is git checkout FileName If you want to rollback all your changes to the tracking files in the current folder, you can try git checkout * If, however, you want to reset the entire repo to the previous committed state, you can use git reset --hard // Mark that there are two hyphens before hard You need to be cautious here, however, as every time you do a reset, Git resets the repo one commit back. So, if you do a reset for the second time, Git will reset your repo back to the second last commit. If you want to remove untracked files (files which you had created but not added to Git tracking), you can try git clean –d –f // d – directories, f – forced You again need to be cautious here however, as there is no going back. So, if you want to preview all the damages you are about to commit by using the above command, you can do a dry-run by using the option --dry-run (two hyphens before dry-run) or –n. Fetch/Pull/Push (and Sync in VS): When you are done with all your changes in your local branch and have committed everything. You would probably want to sync all these changes to the central server. But before you do that, you probably want to do a “Get latest” and merge if any change exist. The commands which can be used for these are fetch and pull. • git fetch • git pull In simplest terms, “git pull” does a “git fetch”, following it up with a “git merge”. One can do a “git fetch” any time to update the remote tracking branches. This won’t affect the local branches though. Developers, in fact, do this on a regular basis to keep their remote tracking branches updated. You do a “git pull” when you think your local branch is ready to be updated with the remote changes. “Sync” is not a Git concept however. Sync does only pull operations with many other options available. A beautiful thing to mention here about Git is that if you had committed your changes to your local branch and had done a “git pull” later, the merge which would happen to your local branch is again treated as a new commit to your local branch. So when you would push this to remote branch, you would see that you actually made two commits instead of one, first was your change commit and the second one was a merge commit, if Git had to do a merge on “git pull”. Now, that you have pulled all the remote changes and merged it with your changes, you would want to push it to VSO. You can do so by simply running the following command – git push Note: You would be asked to enter your alternate credentials when using the above commands from PowerShell. Some other issues and their workarounds: • Currently, if your file path exceeds the 255 character limit set by Windows, you won’t be able to check in or checkout any such file for edit. There are two ways to solve this problem. • Keep your root project folder as the first level folder in your directory and keep it’s name as short as possible. • You can create a virtual drive using Command Prompt or PowerShell on your Project folder itself and then do a check in. This will shorten your directory path. Let’s say you want to create a virtual drive on the below path, you’ll use the command subst for substituting the path with a virtual drive and then switch to the drive just created. After you are done with all the commits, you can switch back to your original drive and then delete the drive with the option /d. If, however, you forget to delete the drive, it would automatically get deleted on a system reboot. • You can also impersonate someone else and commit as him through Visual Studio’s Git settings. But you can still find out who actually made the commit in VSO. It’s just that you would have to view the complete git commit in VSO to know who did the commit. # Design & Implement Azure Storage and Monitoring Azure VM Today I had to give a session for Developing Azure Solutions certification on Design & Implement Azure Storage and Monitoring Azure VM. This is what I ended up creating for the session. The sway covers Storage Pools & Spaces and how to configure them. It also covers Geo-Replication and Disk caching. In monitoring section, the sway talks about enabling diagnostics and configuring it. It also covers endpoint monitoring, alert configuration when a threshold value for a VM metric is reached and other monitoring metrics that come along with. Once again, the link is here. # A Tester can be the best attacker A defender has to think of thousand ways, an attacker has to think of just one. First job, first experience and the new hires group I am part of, decides to have a Hackathon for the new hires. A Hackathon that is true to its literal sense. I said “Why not? Let’s do it.” I always wanted to be on the other side of the event though. I thought to myself if I’m going to conduct a Hackathon, why not conduct one that is Microsoft wide. Truth dawned and I realized that I should stop kidding. New into the company and you want to test the mettle of the ones who have been in the system for this long, that’s not just bold, that’s probably many levels ahead of that. Voila! The bunch of folks who took this project actually wanted it to be at least Microsoft IT wide. The gang-leader still tried to keep it low profile until we have something substantial, but a Hackathon, which was MSIT wide was definitely on the cards. I was like a kid who has been told that you have been given the responsibility to test the mettle of the entire world. We brainstormed a lot for hours and then, started the marathon coding sessions. I thought that we will do this and that and see how it pans out. But by the end of the day, it was pretty clear that I am still no one who can pull this event off. All I had was the lamest SQL hack that anyone can embed in an application and the simplest form of application that can be reverse engineered (which I later realized that I wasn’t supposed to code it anyway as that was part of another hacking event). And on that, another epic moment dawned on me, how are we going to identify that someone was hacked and who was that someone’s hacker. Bummer! All for so much enthusiasm. The gang-leader was still very much optimistic and ecstatic about whatever the day had delivered. I wanted to join him but all I could think of was either this guy is actually a pro or we are just destined to doom. We left at 2 that night. And to be honest, I had no clue what exactly did we do that took us so much time. Taking a step back, one thing which was pretty clear in our mind was that there would be two different hacking events which would go on in parallel. The one that I was working on, was where every team would be given a server of their own which was running on a VM. A web form application, which had bugs, was deployed on these servers (VMs) with an extra catch that everything including the Application Pools were stopped. The other event consisted of a hackable central server. The latter event was more like a game where difficulty level increases with every difficulty one just surpasses. Then began our next night-out-coding session. There were a few informal hangouts about how to go about completing this project, but this probably was the only proper coding session after the first night. There was something different about this session though. The moment I entered the conference room where we all were meeting, I was greeted with this giant and beautiful architecture of the whole system. God, that was big! The first thing that popped in my mind was who can actually even imagine drawing that. No perks for guessing, it was the gang-leader. He went on to explain how he envisions the whole architecture. The architecture wasn’t complicated per se. In fact, if I were to draw one such, I would also have drawn one which was something closer in resemblance. But the point is – It was just big… insanely big. Anyways, we started to code. I picked my designing part of the client side server. In our informal hangouts, we had discussed a few vulnerabilities that we can use and expose for exploitation. Still unclear about how to identify the hacker and who hacked whom, I picked on six or so vulnerabilities and started working on them. A hacked webpage would lead the server script to crash. In other words, an exception would be thrown. It was this Error 500 that we relied on to identify the hack. A custom error page for Error 500 was designed. When a page crashed, it invoked this page. This error page, in turn looked for a shared folder where flags (GUIDs, basically) mapped to the corresponding user’s crashed page were stored. This GUID was displayed to the attacker which he had to submit to us so that we can identify the attacker and the one who was attacked. The folder was refreshed with newer flags to maintain extra cautiousness against a “friendly-play” (Friends generally follow the “tu bhai hai naa!” (Come on man! You’re my bro.) strategy. They’ll ask their losing friends to share their flags. The already losing friend thus decides that they are losing anyways, why not let your best friend win.) Though this flag generation and distribution module might appear small, it was the most critical module in the whole architecture as it was the only way we could identify someone who did the attack and the guy who was attacked. To keep it totally aloof from any crash, it was again divided into two separate modules – one module just cared about the flag generation and the other one just cared about the distribution process. This helped in avoiding deadlock conditions and conditions where a lock could be on hold for long when the files were queued for multiple read operations. Any key assigned that was older than the currently generated one by a level of 2 became invalid. Further these were developed as Windows services which started on boot to keep it hidden from users’ view. After 3-4 such marathons, we were ready with everything. Everything… Sigh! So much for the hullabaloo. Then the testing season started. And the initial test phases passed off with flying colors. I was actually ecstatic. We syspreped the VHDs for client and server thinking that everything that was supposed to be done is done and we were ready for the word Go! But, thanks for one curious fella who still wasn’t satisfied. We sat one night together to test it out from head to toe in totally isolated conditions with none of our credentials involved. And then every horror that we could have imagined came to life. The feeling could be described in two simple words – Nothing worked. The first thing that we found was, while creating the application, I left my credentials involved somewhere in it. So, when the system started in total isolation (as an administrator), the application looked for my credentials (which never existed in the first place on the server) instead of kicking off things as an administrator. Trying to debug it piece-by-piece became a pain in the butt. Realizing that it probably would become much more tedious, I created everything from scratch without using my credentials. The next thing we found was, no matter how current the flags were and howsoever valid they happen to be, the central server said the flag being submitted by the capturer is invalid. We banged our heads to figure out the issue for nearly more than 3 hours but to no avail. With no plausible visible issue and out of frustration, I started counting the number of characters in the GUIDs (on someone’s suggestion, can’t recall who). Much Ado About Nothing. It turns out that flags being generated and the ones being distributed differed by that last always eluding character. But hey, anything that fixes your pending bug is soothing. Then it turns out that a person can hack himself. That probably could have been the worst thing to have happened. All everyone had to do was submit all his flags when he failed in his endeavours to hack others. Yes, he would lose flags for submitting his own flag, but then he would get the points for at least submitting the flags, won’t he? This whole thing was already blowing up in our face. What if something like this happens when the event is actually on? I can’t even fathom the consequences. Anyhow we held our senses instead of going into panic mode and carried on with our work. This time to be extra cautious, we created multiple fake participants and started the game again. Surprise, surprise! It blew up again. The culprit this time again were the bloody flags. Though it may not appear as a big issue, it was very subtle. The distribution of the flags was pathetically slow. Be it a network issue or a processing power one, this should not happen. What if the flag expires before it is actually distributed by the distributing service. The flag owner will go on to be the best defender without even touching his system. Small fix, but it definitely needed one. All it needs is that one last kick to make you feel that you just can’t do it. Feeling that somehow everything is working as it is supposed to work, I was beginning to feel now that maybe, just maybe, we can pull this off. But there has to be that one last thing. The last night before the event, we again tested the whole scenario from end to end. The issue this time couldn’t have been subtler. We realized that after about half an hour or so, everything just about came to a freeze. Something was eating up the whole memory. We always felt this but we never paid any heed to it, given the kind of issues that kept coming up. Even after a reboot, the same thing happened. Looking up the memory usage, it was clear that the utilization shot up just after a few minutes. It was just that it was almost a complete freeze which happened after that long duration. The issue this time was with an exploit we wanted the gamers to explore. But as the general bugs are, the developers don’t have any clue about it. I, for one, actually felt as if I would get lost into the code just by looking at it. The code was just perfect, at least it appeared to be. Think man, you just don’t have that spirit. Which coding principles did you evade while coding which led to this? What could have been that blunder?! Logs, they always come in handy! It is one thing which differentiates between a good and an awesome programmer. One doesn’t understand the essence of logs unless he experiences it firsthand. Always leave a trail somewhere so that you can think through your mess. The trail is your guide to improvement. When staring at the code didn’t work, we sorted to logging. A dummy log was created to look into the issue and there it was, right in our face! The issue this time was an open port which was left opened in the memory and was never closed. But ports? We didn’t do any port.open() thingy for that module. And here fellas, you realize another truth, “Why one should not rely on garbage collection!” There was a disposable object which was created and was supposed to be disposed which opened this port. So, we had to dispose it manually. Tired of all the staring, we syspreped for the upteempth time and then left with a sad face. Maybe this was just not supposed to be. So what’s the point of this all? Why this big write up? Developers just write the code and think that they have won the battle. All it took was a few night outs to realize what a failure it can be if it were only the developers who drive this tech industry. Testers know their way and they know it well! They know how to break and what will make the system break. They are the dudes! And as for the event, it went kickass! :) # Downloading the web folder People generally ask “how can we download the whole web-server?” They keep looking for different softwares to do this but they forget that they already have one, inbuilt (generally) in their very own Linux system. All they have to do is to run wget -H -r –level=1 -k -p <http://domain_name/address_of_the_folder&gt; to download the folder. You can change your level of recursion for download by changing the value of level according to your needs. That’s it! You have your web folder ready with you. # The Must Have in Linux Everytime I install a distribution of Linux (generally Fedora) on someone’s system, the first question that the owner asks me after the installation is what other software he may need apart from the ones already installed. Well, this post is more about those extra applications. This post covers almost everything that one may need. If you feel, I have missed out on something, you can add a comment and I will add that up in this list. Autoplus+ I start off with this simple script that will help you get rid of most of your headache. Flash, Google Earth, Skype, audio-video codecs, VirtualBox, Imagination, DropBox – It installs just about all the daily usage things. The only catch is it works only for Fedora. Chrome The browser that you just cannot miss on is Chrome. Yes, Firefox is already there on most of the distros but you just cannot miss this one. A drop-down console that keeps your terminal on your fingertips. While Guake is meant for Gnome, Yakuake is meant for KDE. (VideLAN)Vlc It’s like the list is never complete if you don’t see VLC there. This amazing open-source player is almost on all the systems be it a Mac, Windows, Linux or any other UNIX implementation. Talk about songs, one just can’t forget Amarok. Though intended for KDE desktops, it works equally well on Gnome. The lack of multimedia key support in Gnome can also be done away with the gnome multimedia keys script. Xchm Again Okular is there to support the .chm files, but it comes no way near xchm. Try it to to see the difference. XBMC XBMC media center is another open-source media hub for the TV experience on your laptop. The skins are so beautiful that you will definitely fall in love with it. VirtualBox This freely available virtualizer is a must for any geek (be a tech or non-tech). Try new operating system or run a Windows software, you will definitely find this handy. VMware Another virtualizer but with extra command over hardware which thus leads to higher data transfer speed. Even the network(ing) and the network configuration are also easier. Qemu Qemu is both an open-source emulator and virtualizer. If you are a tech geek, you must give this a try. Wine Wine lets you run your Windows applications straight on your Linux. Though not an emulator in strict sense as it does not emulate each processor instruction as any other emulator would, it provides the software libraries which Windows software may require during installation. It still is under active development. Unrar This will add the rar codec for extracting the rar archives. A comic book geek? Well, these cbr and cbz readers are definitely for you. JDownloader If you are a heavy internet user who downloads stuff from rapidshare, mediafire, hotfile or any such file-sharing sites everyday, then this download manager will be a boon for your daily dose. GoldenDict As name says, it is a freely available dictionary which will always be running in the notification area for your help. NetBeans This is a fully-fledged IDE, completely written in Java. PHP, Java, C/C++, Groovy or Ruby – you can do your development with this IDE. HandBrake HandBrake is an open-source media converter with a clean and simple interface. You can also convert your media files to the mkv container format. GParted GParted is a partitioning software to create, resize, move, delete, format and reformat your partitions. It can also format a partition with NTFS file system. RecordMyDesktop RecordMyDesktop is a desktop session recorder which is both easy to use and configure. It comes in both command-line and GUI mode. TrueCrypt A powerful encryption software that can be used to create on-the-fly encrypted volumes and partitions/drives. Super Grub Disk This will come in real handy when you are in need of a System rescue. It will help in restoring your boot loader. And as a download manager don’t forget to add DownThemAll to your firefox. One more thing that I would like to add is that, while configuring your ppp modem, the libusb1-devel rpm/deb file is generally missing on your system. So, don’t forget to install that before you start configuring your ppp modem. I also felt like including the softwares like GIMP, Brasero, ffmpeg, Totem, OpenOffice/LibreOffice, Transmission/KTorrent and nmap. But they generally come bundled with almost all the distributions. Well, you may also enlist your recommendations in the comments if I have missed on a few. I will re-update this list accordingly. :) # The annoying ‘yum Error’ One of the most annoying error that I have faced on Fedora is the yum error: Error: Cannot retrieve repository metadata (repomd.xml) for repository: fedora. Please verify its path and try again After a lot of googling and going through forums, I have made a list of solutions that can fix this problem. -> Sometimes, you may face this error just after a fresh install. It can be fixed then by yum clean all yum clean metadata yum clean dbcache It can also be helpful even in the cases when yum was working just a few hours back and suddenly the problem rose and you have no idea why. -> One of the most common fix is editing the fedora repo file. The fix is uncomment the baseurl line and comment the mirrorlist and then edit the /etc/hosts file adding 80.239.156.215 mirrors.fedoraproject.org 213.129.242.84 mirrors.rpmfusion.org Well it’s is the most common fix. But, it has never helped me. -> Another fix is disable the repo that causes this erro and then do the yum update. This was what I found in some forums but none of the solution-seekers were satisfied. One can though try this. It might be just as helpful. -> Sometimes you may need to fix the rpm db. Type rpm -vv –initdb If one still gets an error, he can further do rpm -f /var/lib/rpm/__db* rpmdb -vv –rebuild -> If behind a proxy, one may have forgotten to export the proxy settings. He can do so by export HTTP_PROXY=http://username:password@IP:port export FTP_PROXY=http://username:password@IP:port For permanent solution to text internet, one can create a proxy.[c]sh file in /etc/profile.d/ and type export HTTP_PROXY=http://username:password@IP:port export FTP_PROXY=http://username:password@IP:port export http_proxy=http://username:password@IP:port export ftp_proxy=http://username:password@IP:port and then log out and log in. And for yum, add proxy=http://username:password@IP:port to /etc/yum.conf and do the update. # Compiling Hadoop codes After wandering for around a month and a half and pulling my hair off my head, it feels good when the work starts heading in a definite direction. Understanding how hadoop works and then start coding for it are miles apart. One of the problems that I faced in coding was I couldn’t compile any of my hadoop codes that I wrote, not even the one that were given in the books. The error that came up looked something like- xyz.java:5: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.Path; ^ xyz.java:6: package org.apache.hadoop.io does not exist import org.apache.hadoop.io.*; ^ and so on.. The basic problem is the classpath. We need to set the classpath to compile our codes because hadoop library files are yet to be integrated so that they can be referred during compilation. This can be done by-$ javac -classpath hadoop-common-0.21.0.jar <filenam.java>

you can add -verbose option to the command-line so that you can actually see what’s going on during the compilation.

Though I did this on Linux, but it doesn’t really matter on the OS. The same syntax applies even to Windows.

With this you are done with compilation of your hadoop code. Jar your files and then execute them.