Achieving Git Nirvana: Branching
Branching is a critical operation in day-to-day code management using Git. With proper practices in place, teams can leverage branching to produce higher quality code with less effort. Unfortunately, the value of branching is often misunderstood by coders new to the version control system. This is especially the case for teams that are more familiar with systems like Subversion and Perforce.
To understand why this might be the case, we need to go back to Git basics. We’re not attempting to provide an exhaustive tutorial on Git Branching (also here) in this post. Instead we’re going to focus on why Git branches are so useful, and how a touch of best practices around branching can make the data in your repositories much more useful. This will, in turn, help you produce better code with less effort. If readers are interested in good tutorials, we’ve include some links above, and at at the end of the post, but we’ll assume that readers are familiar with the basic Git command(s) involved in managing branches. With that said, we will do a tiny bit of Git explaining to establish a context for this discussion.
Unlike previous generation version control systems, each commit to a Git repository records something akin to a “snapshot” of the working environment. Of course, the snapshot is optimized for efficiency, and unchanged files aren’t fully instantiated in each snapshot, they’re simply referenced. Some people suggest the analogy that Git is, in a sense, a miniature versioning file system.
Now comes the good part. Git branches are simply references to one of these commits. Git branches are not completely separate copies of the source tree. However, due to the reference to a commit, each branch represents a complete, and independent, working environment. Team members executing against a branch have exactly the same facilities that they would encounter if they were working on the main line.
Git’s powerful and extremely accessible branching leads many organizations to execute every change, even very small ones, in a separate branch. This practice allows much better change encapsulation and tracking. Teams following such a discipline can ensure that their mainline codebase remains completely stable, while the branches under development have the time and isolation required to fully mature. As coders switch between branches, each branch represents a complete working environment, at the corresponding stage of development.
When the team is ready to incorporate the changes made on a branch back into the main line of development, a merge operation brings the independent set of commits together with the commits that inform the primary line. At this point, the branch reference can be deleted. Don’t worry, the commit history will still be there, it just won’t be referenced by the branch. Deleting successfully merged branches helps keep repositories “tidy” and it also prevents a programmer from accidentally checking out a branch that has already been merged.
Branches can play another critical role in supporting future analysis of your repositories. Assigning branch names that incorporate both a motivation for the branch, and a description of the change, is a best practice employed by many teams. For example, a good name for a branch that addresses a defect might be “bugfix/JIRA-2084-null-pointer-exception” Here, “bugfix” specifies the motivation for the branch, while “JIRA-2084-null-pointer-exception” give some sense regarding the changes made in the branch. This team separates the components of the branch name with the “/” character.
There are a couple of small notes on using branches for this purpose. In situations where a “fast-forward” merge can be performed (typically, no changes have been made to the master,or other “current,” branch subsequent to the creation of the target branch), Git doesn’t need to create a new commit at the merge point, it simply integrates the histories by moving the current branch tip to the same point as the target branch tip. Once this operation is completed, all of the commits from the target branch will be available through the current branch. The only downside is that, if the target branch name is then deleted, the information regarding motivation and description will be lost. If a team’s development process is such that they are often encountering this situation, they might want to consider using the “no fast-forward” option to git merge.
git merge --no-ff branch_name
This option forces the creation of a merge commit, which will include the name of the branch in the commit message, by default. Fortunately, GitHub uses the “no fast-forward” option for merging pull requests, so organizations using GitHub pull requests to manage merges won’t have to worry about this issue at all.
The –no-ff option forces a commit to be created for the merge. By default the branch name is stored in the commit message
If your team is not currently making full use of branches, you might want to reconsider, because they are a powerful tool that will help you build higher quality software with less effort. Specific benefits include:
- Branches isolate your changes from the main line of code, giving you the time and flexibility to fully develop your changes without impacting the stability of the main codebase
- You need to bundle the changes that make up a GitHub pull request into a branch.
- The changes that make up the branch can be analyzed and the risk of merging the branch can be assessed. This allows you to accept pull requests with low risk, and hold higher risk requests for further review.
- Branch names can be used to convey critical information about the motivation for, and changes made in, a branch. This will be extremely valuable in future analyses.
- The commit history of the branch is maintained in your repository, even if the reference is deleted.
Some good resources that cover the basics of Git branching.