Streamline your TFS to Git migration with Gitflow
As a long time TFS user, and VSS user before that, what is most commonly referred to as centralized version control (CVCS) was all I knew. I "bound" my code to a remote repository which then allowed me to get the latest changes and begin making modifications locally. My local mods would tell the server that I've "checked out" the file either exclusively or non-exclusively to avoid conflicts with other users. Essentially, my local copy of the repository was a live connected representation of what was on the server in real time. Where this model works fairly well is when you have a small on location team with an on location server your able to collaborate very closely with your team members. e.g. how we used to assemble our work force, everyone under one roof. Where this model starts to fall apart is how software teams work in the modern distributed work force world. In this brave new remote world that I've lived for the past decade, centralized anything can be a challenge. If I'm trying to edit a locked file at 2am when the other team member who locked it is asleep, then my options are to either go to bed or make my changes offline and hope I can reconcile them when the file becomes available to me. Now, I'm not saying there are not processes you can put on top of CVCS to avoid those types of conflicts, but I've found that most of them are quite expensive to implement in terms of resources and productivity. Imagine the scenario of branching a large project for a small feature. In TFS, a complete physical copy of that entire repository is copied on the server and then pulled down locally for you to begin making changes. Now imagine 10 developers all working on features for that same project. Expensive, complicated, and slow all come to mind. Not to mention that without that server connection, all your branches and copies mean nothing. You'll have no change history or any idea how the remote branches are tied together, can't even diff or merge. I'm just touching on a few topics here, but I think you get the point. I've just reached a place with myself and my team where we needed to find a better way.
So Git to the rescue? Yes and No. Git, arguably the most popular distributed version control system (DVCS), on its own doesn't quite get it done. Git with all its power can present an entirely new series of headaches which in some ways can be much more painful than any centralized horror story I've encountered. Git is so powerful in fact that a single developer has the power to rewrite (rebase) the change history for an entire repository. They could then overwrite (force push) any number of 'remotes' with their version of history, wreaking havoc on all others who've copied (cloned) that repo prior to the rewrite. This power is available because when you clone or create a git repository, its entire lifetime of changes are included with it, right there on disc in the inconspicuous .git folder. Powerful? yep. Awesome!? yep. Scary as hell? yep. It's peer to peer coding, it's decentralized, it's both connected and disconnected at the same time, its almost anything you want it to be in any process you want to use it with. "I see your version of history, but this is how it really went down"
git push origin master -f
Which brings me to Gitflow. Gitflow is essentially a branching model that gives you the safety and control of a centralized system while still keeping most of the decentralized power of git. Meaning that you still have all of the freedom of having the repository and all its history locally, but release code, merges, and collaboration all happen on the server. The server will keep the 'true' version of whats being worked on, e.g. the develop branch, and the 'true' version of whats in production, e.g. the master branch. Beyond those two trunks can be any number of branches that match a specific workflow, all kept sane by standard branch naming conventions. So
feature/my-dev-feature would branch directly off of the 'origin' develop branch so that I could checkout and begin my work locally while 'pushing' to the server (origin) every so often for safety and potential collaboration. It's process, rules, a code of conduct if you will. If git were life, then Gitflow are the laws we follow to maintain a civilized society. Louis CK says it best: "The number one thing preventing murder, is the law against murder"...dark, funny, and probably true ;)
Merging in Gitflow
You've got a few options here depending on how you want to design your workflow. Before we go there though, as the developer, you have some nice options to control how the changeset history will look before it enters the remote origin branch. By choosing to do an interactive rebase
git rebase -i, the dev can review each commit on their branch and choose which ones to leave intact (pick) and which once combine (squash). So commit small and often, then clean up when your ready to merge. A very powerful way to organize changes so that they are easy to understand by others.
The best way for me to think of a pull request in terms of workflow is that it's a code review. When a formal review is something your process requires, like being the gatekeeper of a large project with a lot of contributors, then a pull request model is a great way to collaborate on a proposed merge. In all of my years doing code reviews, this has been the most satisfying part of my enterprise migration. And when you combine this with a CI server doing branch management, then you have yet another powerful form of quality control when a pull request comes in. Doesn't really get better than that when it comes to a formal review and merge process. Tools matter here though. Products like Github, Bitbucket, Stash, and others are there to make this as easy and as second nature as possible. The commit comment is only the first part of the discussion. Its the reasoning, the collaboration, and the line by line comments that make this such a powerful way to integrate code.
Fast-forward vs 3-way merging
In TFS, everything is a 3 way merge, meaning that when merging changes from one branch to another, you need to have a dedicated merge commit to resolve potential conflicts, even if there aren't any to resolve. In Git, they streamline this a bit by doing whats called a fast-forward merge when your changes remian on a linear path. For example, if you branch from develop, make your changes on feature, then merge back in with no other changes being made on develop before you do, then your commits are simply tacked on top of develop. No 3-way merge was required because the path remained linear from your feature back to develop...allowing the develop branch to "fast-forward" to the end of your merged commits. Now, you could have easily just made the changes right on the develop branch, but why when you have this kind of safety? You can cheaply branch, just in case changes are committed while your making yours, and then merge back in as if the branch never happened. Safe, clean, and even one click with the right tools. When Git is unable to perform a fast-forward merge, then it falls back to a standard 3-way merge which will allow you to resolve any necessary conflicts and create a dedicated merge commit in your repo history. You could of course force a merge commit as well, even if fast-forward is available.
In some cases, I consider this method of merging my changes the best of both worlds. For example, you have a long running feature that you branched off of the main branch. Over the course of development other changes by other team members have been merged to the main branch, forcing your feature branch out of sync with the latest changes to the code base. Prior to feature completion you have two basic options, you can 3-way merge the latest changes on to your feature branch, creating a dedicated merge commit, or you can rebase your branch onto the main branch creating a linear history for a potential fast-forward merge. That doesn't mean there won't be conflicts, but in the case of a rebase you can deal with those conflicts and history changes on your local branch prior to the Gitflow merge onto the remote dev branch. This method of rewriting history is safe because you've never pushed your changes to the main remote branch yet, so the fact that they've be rewritten locally will not affect others who've cloned the main remote branch. What this ultimately looks like when viewing your history is that each change was made one after the other. You've effectively stuffed all of the latest changes behind your new ones making it look as if they'd been there all along. That being the best of both worlds part. You can avoid unnecessary merge commits when you are merging in new changes on a long running feature and still get the one controlled merge commit on to the main branch when you complete the feature. You could of course fast-forward merge your feature, but then you will have no visual record of when that code was branched and then when it was integrated back into the main code base. What this ultimately means to me is that I have a choice, which is really what all of this is about. It's less about the "right" way and more about the power of choice and what makes the most sense for your project or your enterprise. On our team, we'll rebase on to devleop prior to the merge, interactive rebase to organize the feature's new history, and finally create a single merge commit back on to develop. Once complete, both the local and remote feature branches are also removed to reduce clutter.
Bringing it all together
Now these are just some of the ways Git and Gitflow have impacted how my team works together. A big part of using a workflow like this is also adopting tools that embrace it. In my particular setup, I'm using both SourceTree and Visual Studio Git tools as my git clients, Stash and Visual Studio Online as my git servers, Jira for issue tracking, and both Bamboo and TeamCity as my CI servers. Each tool in the chain enforcing the workflow and linking everything together as you move through the process. For example, a feature in Jira becomes a feature branch on Stash containing the id of the issue in the branch name. When that feature branch is pushed to the remote server, the CI server will detect that branch, check it out and build it. All ultimately leading back to the issue in Jira. Powerful stuff with a killer audit trail for quality control. With Git though, no tool can (or should) completely replace the command line. If your going to embrace git, then embrace the cli in all its glory. Not a day goes by where I don't fall back to the shell. Some things are just faster and easier that way, so the sooner you get used to it the more empowered you'll feel and the more productive you'll be. I still learn something new every day, which is probably what I'm enjoying most of all.
But wait, there's more! Here are some great resources to get you Gitflowing? in no time:
- Migrate an existing project from TFS to GitHub with changeset history intact