Software Development Audits
Due Dilligence for Software Development

Blog

Researching Safer Software Purchases

Towards Safer Software Purchases


Most organizations cannot operate without using externally developed software. However relying on external software brings risks. We are writing a report on the risks of using externally developed software and the strategies that can be used to remediate these risks. Because of this we invite you to participate in this short survey.

It should take less than five minutes and if you leave your email address we'll send you a free copy of the report when it is finished. 

https://goo.gl/forms/WLUbnc0XnWzcMtmA3

Jan Princen
Hurrah for our Partners

Partnerships are very important to what we do at Grip. Our clients have very specific needs that often require a tailored solution.

The reasons why they want to optimize their software development processes and what exactly they want to achieve differ widely. Some are mostly interested to increase developer efficiency while other are legally bound to not let defects slip into production.

Different development teams use different combinations of off the shelf and home grown development tools so sometimes some customization is needed to get the right data ready for analysis.

Our partners help clients state their problems, connect their systems with Grip’s analysis engines and help them take action on the results.

Partner in the spotlight

Zenergy Technologies is a North Carolina based company that provides software delivery solutions.

Chris Laney From Zenergy

Chris Laney From Zenergy

"Grip's analysis tools help us to establish a baseline at our clients. We use this baseline to improve upon the specific goals of our clients and then use Grip to periodically measure the effectiveness of the improvements", says Chris Laney, President and CEO of Zenergy Technologies.

Interested to become a partner?

Let’s chat please reach out at at jan @ grip.qa

Jan Princen
Relative Complexity and Sizing

One of the most important tools in the Agile Development toolbox is point-based sizing of user stories. In contrast to time-based estimation, hours and days, points are used to estimate sizes in a relative way. Two stories of equal size have equal points.

While this sounds nice in theory, it’s hard to get an intuition for what that means. What is a point? What criteria do we use to get to our point size? If you ask 10 agile teams about their sizing criteria, you’ll probably get 10 different answers. Some use effort, others technical complexity. Sometimes points are not even numerical but rather ‘Small’, ‘Extra Large’, or even ‘Cat’, ‘Dog’, ‘Cow’. Especially in teams that are just starting to use point-based sizing there can even be different criteria within the team.

Before diving in to a few ways to develop an intuition about relative sizing, let’s start with briefly reviewing the benefits of using points.

Time-based sizing

With time-based sizing there is a large problem of the impression of accuracy. If I ask a developer for a time estimate, and she tells me it’s going to take 4 hours, I can reasonably expect that task to be done 4 hours later. This is not how it works.

Especially in software development the variation in estimates and reality can be very large. While a 400-hour project might take 500 hours to complete, a 1-hour task can just as easily take 5 hours to complete. That’s a 400% increase compared to 25% for the larger project. The estimation error is not always relative to the original estimate.

Another issue with time-based sizing is that while the best estimators (I have yet to find one) might be fairly accurate in estimating their own work, it breaks down when trying to estimate the required effort for a team. Time-based estimates are dependent on who implements a task, the technology used and the state of the codebase, rate and length of interruptions, previous experience with similar problems, and other things that vary wildly across people and tasks.

A third problem is psychological. If 4 hours are estimated for a task but I’m done in 2, I often feel like I missed something. I don’t want to deliver my work with time to spare, so I take a step back and look for improvements. Finishing early a number of times might have future estimates adjusted downwards. Time-estimated tasks tend to converge towards the estimate, and not towards the most optimum.

All things considered, time-based estimates are really inaccurate for predicting the duration of a single task done by a single person.

Point-based sizing

Point-based sizing on the other hand does not aim for individual accuracy. Especially the ambiguity in what a ‘point’ is makes it impossible to assess whether a single story will be exactly 5 points or not.

Instead, points shine when aggregated over a larger number of stories. Points get their true value from empirical evidence. “The past N sprints we were able to deliver X points”. This is referred to as the velocity of a team.

The lack of a measurable quantity for what a point represents makes it hard to estimate in points. Teams should find a common framework of what to include in their estimation, and above all remain consistent over time.

Qualities of a point

It’s difficult to describe points in terms of measurable qualities. This will differ from team to team. Instead, what we can do is describe what points should not be. Looking at the drawbacks of time-based estimation, we can state that points should not:

- Change based on technology used
- Change based on experience of the developer
- Change based on interruptions or distractions
- Change based on the maturity of the existing code
- Change based on the order of other stories
- Change based on availability of team members

Each one of these properties will likely change the amount of hours a story will take to implement. But instead of incorporating that into each story individually these will impact the overall velocity instead.

Using this framework, we can clearly improve our productivity by addressing the above items individually. By improving, for instance, the maturity of the existing code we can measurably increase our velocity by delivering more points each sprint. It helps the team find bottlenecks and improve them to asking the question: “How can we increase our overall velocity?”

Using these guidelines means that sometimes implementing a story today can mean that that takes multiple sprints, while doing the same story later when there are more building blocks already in place, might take significantly less time. While at first this may seem counterintuitive, this is exactly how we want to use points. Having those building blocks in place means a direct increase in velocity, and in turn, in the capability to deliver more stories, faster.

Kamiel Wanrooij
Pull Requests: The Lodestones of Collaboration

My intent in this article is to share a few tips that we’ve found valuable for incorporating Pull Requests into our workflow. First, though, I’ll spend just a tiny bit of time on the basics for those who aren’t familiar with the concept.

At a basic level, a Pull Request could be thought of as a wrapper for submitting contributions to a development project that is using a distributed version control system. The information incorporated in a Pull Request includes, at the highest level, the source repo & branch, the destination repo & branch and meta information related to the request. The source repo may be either a clone, or a fork, of the destination repo. The workflow data associated with each of these and the support incorporated within the Pull Request mechanism allows team members to review the proposed changes, discuss alternatives and even push additional commits (if required).

At GripQA, we use Git to manage our sources with GitHub as our hosting solution. We have a mix of public and private repos. These notes should apply to both equally. So far, we’ve tended to stick with the Shared Repository model for our approach to collaboration. It may be worth mentioning that we consider the “Pull Requests” to be a method of the workflow, not necessarily a feature of the version control system. Other code management tools, like Bitbucket also support Pull Request functionality.

The following are in no particular order, but they’ve helped us make more effective use of this powerful tool for efficient code collaboration:

  • While the Fork and Pull model works great for the well known open source projects, we find that the Shared Repository model is better suited for our relatively small teams of senior software engineers that collaborate closely on a daily basis. This will certainly change when some of our public repos begin receiving proposed changes from outside the immediate team.
  • We strongly encourage teams to use all available tools to ensure that the changes included in the Pull Request are sound. This should include defect analysis, entropy analysis and appropriate static code analysis.
  • This should go without saying, but it is the Pull Request submitter’s absolute responsibility to ensure that the code builds and that all tests, including new tests, pass successfully before the Pull Request is submitted
  • The team member merging the Pull Request has ultimate responsibility for ensuring that the code is properly reviewed and tested. We currently don’t have hard standards, because the scope of the changes covered in each Pull Request can vary widely. We trust our folks to make the right judgment on these issues and to use the appropriate tools to ensure that they’re making the correct call.
  • Size your Pull Requests appropriately. Ideally, each Pull Request should address exactly one issue. The changes included in a Pull Request should be small enough to be reasonably handled during the review process, but large enough to let reviewers get a full picture of the proposed changes. Of course, this will vary depending on the density of your code and the relationship between your developers. One rule of thumb that we use is that the effort required to review a set of changes increases exponentially with the magnitude of the of the Pull Request. Of course, defect and entropy analysis can help you estimate the effort required for sufficient review.
  • Don’t be afraid to push additional changes to the Pull Request to respond to feedback from the discussion. Commits made to the branch of a Pull Request will automatically be incorporated into the Pull Request.
  • Absolutely use a new branch for each Pull Request. Further, we strongly encourage maintainers to delete the branch once a Pull Request has been successfully merged. Do not continue committing to a branch after the Pull Request has been accepted and merged. Note that this is different from the situation in the previous item. Here, we’re talking about continuing to commit changes after the Pull Request represented by the branch has been merged. This will result in one of the classic GitHub gotchas.
  • Make liberal use of the ability to add comments / questions to a Pull Request – one of their primary uses for our team is fostering a conversation regarding the proposed changes.
  • You have the option of forgoing Pull Requests if you’re both the owner and the only contributor to a shared repo. If nobody is going to be reviewing your changes, and if you’re following proper branch hygiene, you can probably just push directly to your own repos. At least that’s what I do… However, if I’m proposing a change to a repository maintained by another team member, I always go with a full GitHub Pull Request.

Used properly, Pull Requests are a powerful tool to help your team collaborate more effectively and function more efficiently.

Good references for Pull Requests include:

Kamiel Wanrooij
Technical Debt: Here Be Dragons

We all have notions about what technical debt is, and we all know that it’s something that we should be concerned about. Development teams encounter it on a regular basis, and it can have a profoundly negative impact on their long term productivity. Unfortunately, it’s extremely difficult to prevent technical debt from accumulating for most software development efforts. In this post I’ll cover some of the reasons why I believe that we regularly underestimate the impact of technical debt, and then, almost invariably, fail to mitigate it once it has crept into our projects.

Components of technical debt

When considering the concept of debt, our primary interest focuses on two components:

  1. The principal amount that we’ll need to repay at a later time
  2. The interest that we pay on the amount owed up until the point that the debt is settled. The interest, and possible compounding thereof, makes the total amount that we’ll have to repay larger (sometimes significantly) than the amount borrowed.

Just like financial debt is borrowing money from your future self, technical debt can be thought of as borrowing productivity from your future team.

Technical debt accrues from a number of sources. Sometimes it’s simply due to lack of experience  with the technology or incomplete knowledge about the problem that results in necessary remediation steps. Perhaps the chosen solution has a negative impact on the project’s future. Often, immediate time pressures lead to making changes without considering the negative impacts on future productivity. As technical debt exchanges time gained today for time lost in the future, it might superficially appear to make sense to ‘finish it quickly now, clean it up later’. The problem is that it’s not an even exchange. We pay the debt back with interest!

Underestimating technical debt

A number of factors negatively impact the trade-off between present convenience and long term cost. We’ll examine some of those that have the greatest impact on long term productivity in the following sections:

Undervaluing productivity in the future

We all procrastinate on occasion. Sure, it sounds appealing: “I’ll relax for half an hour longer and finish my chores later.” If we valued our future free time as highly as we do our present free time this exchange would not make sense. Yet we often choose to do something ‘later’ because we think we can get more value from our current available time. We’ll throw caution to the wind in order to finish up a feature quickly today so that we can get on to more important things, while mentally noting that we’ll need to come back to fix it up later. Right…

Underestimating the overhead

Even when we correctly put off some maintenance tasks to focus on  current priorities we often underestimate the extra work that we will need to do later. If a quick-and-dirty fix introduces 30 minutes of overhead for future releases, that’s 5 hours for the next 10 releases. If this is slightly underestimated, say we have to explain it to a new team member, and maybe forget to do it once, that additional overhead can quickly add up to 40 or 50 minutes. That’s an additional 3 hours on top of the original 5. Now, even if we did make the right call for 5 hours, does that still feel right if it turns out to be 8 instead?

XKCD: Is it worth the time

Is it worth the time? (source: XKCD.com)

XKCD shared an illustration of how improving a task by a certain amount of time can result in significant cost savings over time. You get the point. Technical debt IS EXACTLY THE OPPOSITE. Introducing a minimal amount of overhead up front, can have a very negative impact on future productivity. This “interest payment” drag adds up quickly.

Pay some now, or pay more later?

Once a project has started incurring technical debt, the immediate question is when to start servicing the debt. Since the technical debt often requires rework to correct it, another trade-off must be made. Either start paying the servicing costs (interest) now and defer truly corrective action until some point in the future, or bite the bullet immediately and fix it properly, now. This is very similar to the reasoning that introduced the technical debt in the first place. We’re unlikely to come to a different conclusion when faced with similar circumstances. Thus, the technical debt will likely persist for longer than originally expected.

So?

The factors that we’ve discussed combine with a myriad of others to make the consequences of our choices much more severe in terms of  the total cost of a deferred task. Unfortunately, we tend to discount future productivity relative to the allure of accomplishing something today. In addition, the extra work is often underestimated and persists in the system for far longer than expected. In my experience this results in technical debt sometimes never being resolved. I’ve seen teams lose as much as an incremental 15% of productivity each year due to maintenance overhead. That’s right, 15% additional loss each passing year. That means that after 7 years, 100% of the capacity will be spent on avoidable maintenance costs. Sound familiar? This is the graveyard of brittle systems.

Fixing technical debt

There’s little doubt that we inevitably underestimate both the magnitude and the long term cost of technical debt. Of course, an “easy” solution would be to just not settle for inferior solutions. Sadly, that’s not always possible. Some of the tactics that my teams employ to successfully manage the level of technical debt include:

  • When first making the conscious decision to cut corners, we don’t ask ourselves how much time we can save today. Rather, we consider the worst case scenario if we postpone the proper solution. More often than not, the  actual cost of delaying the proper solution doesn’t outweigh the cost of technical debt. If we can’t delay, it’s probably worth doing right.
  • If we can’t delay, but also can’t afford to invest the time to do it properly, we take appropriate measures to ensure that we don’t let our biases minimize the maintenance overhead. We log both the decision and the expected impact. Then we keep track of what it actually costs the team. We use this information to learn from these decisions and to make more well informed decisions the next time that we encounter similar circumstances. In our current team, we simply record marks on a sticky note for every fifteen minutes that someone was affected by a specific problem area. During our biweekly retrospectives we discuss all of these problem areas to see if the overhead is still acceptable.
  • To prevent technical debt from accumulating in a project, we commit to budget for the cost of fixing any outstanding issues as soon as possible – preferably before the technical debt is introduced. We make it very clear to all stakeholders that this is just part of the investment required to implement the change. We estimate the cost required to fix it properly and agree on that up front. Then, when the time comes to stabilize the code, everyone is on the same page.

While steps like these require some project team discipline, they will minimize the build up of technical debt and mitigate the burden. It’s often tempting to let technical debt build in order to address immediate priorities. If you’re choosing to do this, just remember the old saying:

Pay me now, or pay me later…

P.S.  Who knew that the commonly accepted “Here Be Dragons” artifice was just wrong?

Kamiel Wanrooij
Achieving Git Nirvana: Branching

Branching is a critical operation in day-to-day code management using Git. With proper practices in place, teams can leverage branching to produce higher quality code with less effort. Unfortunately, the value of branching is often misunderstood by coders new to the version control system. This is especially the case for teams that are more familiar with systems like Subversion and Perforce.

To understand why this might be the case, we need to go back to Git basics. We’re not attempting to provide an exhaustive tutorial on Git Branching (also here) in this post. Instead we’re going to focus on why Git branches are so useful, and how a touch of best practices around branching can make the data in your repositories much more useful. This will, in turn, help you produce better code with less effort. If readers are interested in good tutorials, we’ve include some links above, and at at the end of the post, but we’ll assume that readers are familiar with the basic Git command(s) involved in managing branches. With that said, we will do a tiny bit of Git explaining to establish a context for this discussion.

Unlike previous generation version control systems, each commit to a Git repository records something akin to a “snapshot” of the working environment. Of course, the snapshot is optimized for efficiency, and unchanged files aren’t fully instantiated in each snapshot, they’re simply referenced. Some people suggest the analogy that Git is, in a sense, a miniature versioning file system.

Now comes the good part. Git branches are simply references to one of these commits. Git branches are not completely separate copies of the source tree. However, due to the reference to a commit, each branch represents a complete, and independent, working environment. Team members executing against a branch have exactly the same facilities that they would encounter if they were working on the main line.

Git’s powerful and extremely accessible branching leads many organizations to execute every change, even very small ones, in a separate branch. This practice allows much better change encapsulation and tracking. Teams following such a discipline can ensure that their mainline codebase remains completely stable, while the branches under development have the time and isolation required to fully mature. As coders switch between branches, each branch represents a complete working environment, at the corresponding stage of development.

When the team is ready to incorporate the changes made on a branch back into the main line of development, a merge operation brings the independent set of commits together with the commits that inform the primary line. At this point, the branch reference can be deleted. Don’t worry, the commit history will still be there, it just won’t be referenced by the branch. Deleting successfully merged branches helps keep repositories “tidy” and it also prevents a programmer from accidentally checking out a branch that has already been merged.

Branches can play another critical role in supporting future analysis of your repositories. Assigning branch names that incorporate both a motivation for the branch, and a description of the change, is a best practice employed by many teams. For example, a good name for a branch that addresses a defect might be “bugfix/JIRA-2084-null-pointer-exception” Here, “bugfix” specifies the motivation for the branch, while “JIRA-2084-null-pointer-exception”  give some sense regarding the changes made in the branch. This team separates the components of the branch name with the “/” character.

There are a couple of small notes on using branches for this purpose.  In situations where a “fast-forward” merge can be performed (typically, no changes have been made to the master,or other “current,” branch subsequent to the creation of the target branch), Git doesn’t need to create a new commit at the merge point, it simply integrates the histories by moving the current branch tip to the same point as the target branch tip.  Once this operation is completed, all of the commits from the target branch will be available through the current branch. The only downside is that, if the target branch name is then deleted, the information regarding motivation and description will be lost.  If a team’s development process is such that they are often encountering this situation, they might want to consider using the “no fast-forward” option to git merge.

git merge --no-ff branch_name

This option forces the creation of a merge commit, which will include the name of the branch in the commit message, by default.  Fortunately, GitHub uses the “no fast-forward” option for merging pull requests, so organizations using GitHub pull requests to manage merges won’t have to worry about this issue at all.

 

The –no-ff option forces a commit to be created for the merge. By default the branch name is stored in the commit message

If your team is not currently making full use of branches, you might want to reconsider, because they are a powerful tool that will help you build higher quality software with less effort. Specific benefits include:

  • Branches isolate your changes from the main line of code, giving you the time and flexibility to fully develop your changes without impacting the stability of the main codebase
  • You need to bundle the changes that make up a GitHub pull request into a branch.
  • The changes that make up the branch can be analyzed and the risk of merging the branch can be assessed. This allows you to accept pull requests with low risk, and hold higher risk requests for further review.
  • Branch names can be used to convey critical information about the motivation for, and changes made in, a branch. This will be extremely valuable in future analyses.
  • The commit history of the branch is maintained in your repository, even if the reference is deleted.

Further reading:

Some good resources that cover the basics of Git branching.

Kamiel Wanrooij
Introducing Defect Analysis

Introduction

Defect Analysis (DA) explores a number of important indicators regarding the state of a team’s software and the effectiveness of its development process. At GripQA, we study the output of our defect analysis algorithm to identify anomalies that may represent issues requiring the project team’s attention. I’m particularly fond of this technique as it combines a methodology that is relatively easy to comprehend with results that correspond well to our empirical data.

Defect Analysis falls into a category of techniques that we refer to as Code Correlation Analysis (CCA). The fundamental premise of CCA is that the history of a project’s codebase is an extremely rich source of information about both what has happened with the project and why it happened.  CCA works by associating some factor that can be extracted from the information in a project’s repository and that can then be analyzed in terms of pull requests (or commits).  Some examples include: defects, complexity, duplication and style guide violations. The association between the factor that we intend to study and the interactions with the repository is key to this methodology.

In order to make the most effective use of CCA techniques, we need both a target for the investigation (for DA, as we’ll be discussing in this post, we’re exploring the source of defects in the project) and a set of hypothetical causes for the issue that we’re investigating. The current implementation of DA concentrates on two potential sources of defects, the individual responsible for the code involved and the programming language used.

This discussion presents a high level explanation of the algorithm we use for defect analysis and also explores some of the insight that we can gain from applying this technology.  In order to be more accessible to a larger group of readers, we’ll gloss over some of the gory details of the algorithm. However, the source code will soon be joining our other open source offerings for those that are interested in the nitty-gritty.

Although DA is just one of the many technologies that comprise GripQA’s Software Development Intelligence suite, it is also one of the easiest to explain, and the results generally compare favorably with the observed data for the projects that we’ve analyzed.

Algorithm

The algorithm for Defect Analysis is fairly straightforward.  First we identify all pull requests that incorporate bug fixes. Then we generate a list of the files changed by each commit in the selected pull requests and extract the number of lines changed for each file.

 

 

Lines per Pull Request

We store the results in a matrix, represented in this post by a table that would look something like “Lines Per Pull Request.”

 

Relative File Contribution

Next, we need a relative measure of each file’s contribution to the pull request.  We’ll get this by converting the raw numbers of lines into ratios calculated as the number of changed lines in each file divided by the total number of lines in the pull request. The resulting table is shown in “Relative File Contribution.”

Contributing Factors

Now, we’re in a position to begin exploring factors that might be contributing to the defects addressed by our pull requests.

In order to simplify this example, we’ll assume that all pull requests were simultaneous.  For our real analysis, we scan each file’s history for each pull request.

Do some team members contribute more than their share of defects?

 

Individual Contribution to Files

The individuals who wrote the code in the files that were included in a pull request might be one factor that contributes to the defect(s) addressed by the pull request.  We can get a sense of this by extracting the list of individuals who changed the file along with the total number of lines that they added/changed over the life of the file. Note that we’re not filtering for only lines that were changed for a specific pull request, at this point. Instead, we’re counting every line attributable to each individual. We’ll take care of relative contribution to the defect in a later step.  Again, this information is stored in a matrix, presented here in the “Individual Contribution to Files” table, and again the raw numbers of lines are converted to ratios as shown in “Individual Contribution to Files as a Ratio.”

 

Individual Contribution to Files as a Ratio

At this point we’re ready to understand how each individual’s efforts contributed to each of the pull requests. For those who remember their linear algebra, it might be obvious that we’re about to perform a matrix multiplication.  We’ll designate the “Individual Contribution to Files as a Ratio” table as the operations matrix and the “Relative File Contribution” table as the input matrix ([person contribution to files] x [files to pull requests]).  As our matrix dimensions are 4×3 and 3×2, our pair of matrices is conformable for multiplication, and the result will is shown in “Individual Contribution to Pull Requests.”

 

Individual Contribution to Pull Requests

With the information in “Individual Contribution to Pull Requests, we have a sense of the “responsibility” of each individual for a given pull request. Another way to think about this is that the people who wrote the code in a file share responsibility for a defect in direct proportion to both how much code each individual wrote and how much code in the file was changed to address the defect. For the purposes of this analysis, we are not explicitly tracking whether, for example, “Ted” specifically wrote the lines of code that had to be changed to “fix” a given defect.

As the Pull Request columns each sum to 1 (accepting a bit of rounding error), we can pat ourselves on the back for performing the matrix multiplication correctly.

Now for the fun part—we can start thinking about what we could deduce from the data in “Individual Contribution to Pull Requests.” One might observe that Alice’s efforts contributed to both of the pull requests. Further, nearly half of contribution to Pull Request 1 came from Alice. However, if we go back to the “Individual Contribution to Files as a Ratio” table, we’ll observe that Alice also had the largest code contribution.  Since Alice is a major contributor to the files in question , we would naturally expect her to also share a corresponding responsibility for the pull requests.  Clearly we are well served to consider the information in the “Individual Contribution to Pull Requests table in the context of each individual’s total code contribution.

 

Individual Code to Defect Contribution

We can get a rough idea of the expected individual contribution to defects, if we do a bit more data manipulation.  To generate the data shown in “Individual Code to Defect Contribution“, we start with the “Individual Contribution to Files” table, sum each individual’s contributions across all files and then calculate the ratio of each person’s contribution to the total team contribution. We put this number in the “Lines/Total” column of the “Individual Code to Defect Contribution” table. Finally, we average each individual’s contribution to pull requests to generate the data in the “PR Avg” column.

 

Defect to Code Correlation

From the information shown in the “Individual Code to Defect Contribution” table and graphed in the “Defect to Code Correlation” chart, we can see that there is a reasonably good correlation between each team member’s contribution to the project and their contribution to the code that resulted in defects.  There don’t appear to be any glaring anomalies that we can attribute to individual team members. We can probably conclude that this team is well balanced and that each individual’s defect contribution is roughly where we expect it to be.

Does the programming language used contribute to more defects?

 

Programming Language Used for Each File

Another factor that could contribute to defects is the programming language used.  As is generally the case, each file contains code written in a single language, so we’ll use matrix values of 1 to mark a file written in a given language and 0 if the file was not written in the given language.  Mapping this to Files A, B & C gives us the information in “Programming Language Used for Each File

 

Language Contribution to Pull Requests

Given the 1-to-1 relationship between programming languages and files, we don’t need to calculate any ratios. We can just go ahead and multiply the “Programming Language Used for Each File” matrix by the “Relative File Contribution” matrix ([language contribution to files] x [files to pull requests]) to see how our programming languages correlate to pull requests. The results are presented in the “Language Contribution to Pull Requests” table.

 

Language Code to Defect Contribution

It appears that javascript had a greater contribution to pull requests, but more of the codebase was coded in javascript, so we need to dig a little deeper. Once again, we’ll create a table that attempts to compare code contribution to defect contribution. We’ll also add a new column, the ratio of Defects to Code. This is a quick way to compare multiple programming languages and rank their impact on defect creation. The results are shown in the “Language Code to Defect Contribution” table.

 

Defect to Code Ratios by Language

Perhaps it’s easiest to appreciate the difference between javascript and ruby for this project by looking at a bar graph. In “Defect to Code Ratios by Language“, a lower ratio is better. Ratios greater than 1.0 might suggest that either the choice of implementation language, or the team’s skills with the selected technology are not optimal for the situation.

Conclusion

Once we can establish a correlation between a factor in a project’s codebase and an anomaly in the project’s defects, we can start exploring ways to address the issue. Some common measures to mitigate concerns around contributions from team members include pairs programming, additional training, increasing frequency and thoroughness of code reviews and greater focus on unit testing. When the issues trace back to the selection of programming languages, training, code reviews, additional testing and refactoring are among the possible remedies.

As suggested earlier, anything that can be directly associated with a pull request / commit can be used for Defect Analysis. The specific analyses discussed here are two of the explorations that we’ve found to be particularly useful. We’ll add others over time, and we can work with project teams to implement measurements that might be more illuminating for their unique situations.

Hopefully, after reading this post, you share my enthusiasm for Code Correlation Analysis in general and Defect Analysis in particular. Algorithms like these are powerful tools to help us move the field of Software Development Intelligence forward towards a time when we are fully embracing Data Driven Software Development.

Kamiel Wanrooij
Entropy for measuring Software Maturity

We’ve just released the first version of our commit entropy analysis tool to measure software maturity. This is the first in a series of blog posts that goes into detail about entropy in software development.

In a software development project change is one of the only constant factors. Requirements can change, as can the technical considerations and environmental circumstances. Our jobs as software project managers and engineers is largely managing this ability to change.

As software projects grow, the ability to change often diminishes. This is in contrast to the rate of change, which generally increases through the first releases until a project enters maintenance mode and, eventually, reaches End-Of-Life. This difference makes software projects unpredictable and has given rise to methodologies like Agile, SCRUM and Lean to streamline the rate of change. These methodologies do not, however, help increase the ability of a software project to support this rate of change.

One way to measure a software project’s ability to keep up with the rate of change is by utilizing the metric of entropy.

What is entropy?

Entropy is a term from information theory that is inspired by the concept of entropy in thermodynamics. In thermodynamics, and in general, entropy is a measure of disorder in a system. It’s this disorder that we are also interested in in software development.

Entropy in the context of information was first defined by Claude Shannon in 1948 in his famous paper: “A Mathematical Theory of Communication”. Shannon defines entropy as the amount of information you need to encode any sort of message. In other words, how much unique information is present in the message. If you have a coin that always turns up ‘heads’, you don’t need anything to record the outcome of a coin toss. A regular coin, you need one ‘bit’ of information to track if the coin came up heads or tails. A six-sided die: 2.6 bits (yes, in entropy, you can have fractions of bits).

 

Claude Shannon

This concept is often used in cryptography. The ‘entropy’ of a password is how many bits are required to store all possible combinations of a password. A 4-digit pincode carries less entropy than 16 alphanumeric characters with special characters mixed in. In cryptography, higher entropy means that it’s harder to brute-force, since there are more possible combinations.

The same concept can be applied to changes made in a software project. If a change only impacts a small part of the system, that change can be recorded with very few bits of information. If changes touch a large part of a system, you need many more bits to encode that change.

Using this logic, we can determine the impact of changes by calculating the entropy that each change carried. And in a typical software project, the larger part of the system you need to modify to implement a feature or change, the harder it is to implement that change. Therefore looking at the entropy of past changes tells us something about our ability to make those changes efficiently.

Coupling in software

One of the most common goals in software architecture is managing coupling. Coupling is the dependency of one part of the code to another. They are ‘coupled’ together, either explicitlyor implicitly.

Explicit coupling happens when one part of the code directly depends on or uses the other. This is unavoidable, but should be carefully managed. A tightly coupled system can become brittle and hard to change. Most design patterns that deal with explicit coupling implement some form or part of the SOLID or DRY principles.

SOLID is an abbreviation of 5 best practises in Object Oriented software development: Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion. The impact of these best practises is beyond the scope of this article, but they’re all designed to help create maintainable software architectures.

DRY stands for Don’t Repeat Yourself, and is an often heard mantra for developers. Not only does repeating yourself create additional work now, it also increases the maintenance burden later on. All repeated sections will probably require the same bug fixes and changes applied to them, if the developer remembers that the duplicate sections exist!

Implicit coupling can occur when there is no direct relationship in the code between two parts, but they are conceptually or otherwise linked together. This is usually harder to detect since it requires knowledge of how different components interact to see if changes in also affect another.

Kamiel Wanrooij
Definition of Ready

In one of our first posts we described our Definition of Done. A set of guidelines that states when a story may be considered done and ready for deployment to production. In this post we will discuss our definition of ready, when a story is considered ready for development.

 

Ready

During our bi-weekly planning sessions we often encountered stories that needed more discussion (grooming) before being fit to add to a sprint. So to keep the planning efficient and devote as much time as possible to sizing and determining what should go in the sprint, we decided to create a Definition of Ready: a set of guidelines similar to the Definition of Done, but with the purpose of defining when a story is ready to add to a sprint.

Definition of Ready

Our Definition of Ready currently consists of the following:

The stakeholder

Who gains the most when the story is implemented? In most cases there’s only a single stakeholder and in most cases that’s the end user. But we’ve discovered it’s good to explicitly point this out. It reminds us to view the story from this stakeholder’s perspective, fulfilling the story’s intent better.

The problem

What problem are we going to solve? This might be the most important part of the user story as it describes the purpose of its existence. The problem is the part that’s going to be tested. Here’s where all the intent of the story is and therefore requires apt description. The problem often arises from the stakeholder (or a representative) and the product owner uses it as it’s main indicator to determine its priority relative to other stories.

The solution

What solution do we think is the best (or least worst) to solve the problem? Before writing the story, the product owner has some discussion with (a) member(s) of the team to determine what solution is best. A conclusion can also be that more research is needed. In that case we create tasks (business-valueless stories) and time box these to facilitate the solution for the actual story. We strive to have all discussion needed to find the solution out of the way for the story to be ready.

Acceptance criteria

Acceptance criteria are small, atomic and testable manifestations of the solution. They represent the implementation from the stakeholder’s perspective, not the developer’s perspective. Again: a story is written for the stakeholder, this forces the developer implementing the story to think like the stakeholder in order to create the required business value. A story requires at least one acceptance criterion.

Kamiel Wanrooij
The Accuracy of our Predictions

At Grip we use software development data to predict the success of an application and suggest improvements to increase the chance of success.

We divide the key factors that contribute to success into three goals. These goals are aligned with the objectives of delivering maximum value in the shortest possible time with the lowest possible costs.

The three goals are:

User Satisfaction, Velocity, and Costs

 

Scoreboard

Each of these goals incorporates measurable indicators that have an impact on an organization’s results.

We refer to these indicators as goal measurements.

By applying sophisticated analytics, machine learning and simulations to the data that we collect from the development process, we predict the outcomes for the goals and goal measurements.

In this post we’ll discuss the accuracy of our predictions for some of our goal measurements.

We’re going to compare the predicted results with the actual scores from our own development activities.

For this discussion, we will examine:  Defect Removal Efficiency (DRE), Story Points per Collaborator (SPC) and Requirements per Collaborator (REQ).  We’ll be examining the actual results from our work over the past twenty weeks before February 9 to determine how accurately Grip can predict if the indicator improves or gets worse.

The period and goal measurements were selected because we have completely analyzed data for all three during this period.

The Grip with our own data is public, so feel free to check it out here.

Defect Removal Efficiency

DRE compares the defects found and removed prior to release with the defects found after release and is an important indicator of quality (See for instance the work of Capers Jones)

In the graph below we plotted our predicted score against our actual score for the past 20 weeks:

 

Defect Removal Efficiency

Let’s look at the Direction (whether the DRE was increasing, decreasing or remaining static):

 

DRE and Predicted DRE

Out of the 19 weeks where we have a direction we have the direction correct 11 times.

Story Points Closed per Collaborators

SPC is an indicator of the velocity with which the team performing.  It is important in agile planning sessions. SPC is a tool that quantifies how much a team can accomplish per unit of time.

 

Story Points Closed per Collaborator

Let’s look at the Direction:

 

SPC and Predicted SPC

Over  the 19 weeks Grip correctly predicted whether our performance was improving, staying the same or declining,  13 times.

Requirements Closed per Collaborator

 

Requirements Closed per Collaborator

Similar to situation for story points closed per sprint, the REQ results provide insight regarding how much value the team can create in a given unit of time.   This sub-goal helps to determine if the team is generating the highest possible output from the available resources.

 

REQ and Predicted REQ

Here we correctly predicted the direction 14 times.

Conclusion – the Accuracy of our Predictions

We are pleased to observe the directional accuracy of Grip’s predictions.  While the magnitude of our predictions still requires refinement, the trends highlighted by Grip are valuable.  The insight gained from knowing which way a team’s performance is trending enables the development organization to make far better decisions, more quickly.

For this study,  the best possible result would have been 57 correct directional predictions (3 sub-goals across 19 weeks).   From those possible 57 results, we have accurate directional predictions for 38 instances.  Grip was correct exactly two-thirds of the time.  This should be more than enough for a development team to verify their own “gut feelings”.

We expect the prediction magnitudes to more closely match actual performance, and for the directional predictions to be even more accurate as we train the system with more data both over time from our own efforts and adding analysis of other projects.

Interested to get predictions on your own development process?

Join our beta: http://grip.qa/beta

Jan Princen
Welcome to our first AI team member

At Grip.QA we’re firm believers in using our own products to experience what our users experience on a daily basis. Today, we decided to take that to the next level: we’re making our own public Grip dashboard a full-time team member.

 

Instead of just using our dashboard as a guide, we’re giving it a voice equal to that of all the other team members. This means acting on its advice during sprint planning sessions, asking for estimations, and listening during the retrospective.

Are you crazy?

No. Well, maybe. Time will tell. But the only way to experience the consequences of our own advice is to live by it rigorously. Make changes to our plans and see how they play out. Experience first hand what failure and success feel like. It’s almost like learning

 

As a side note, the team is obviously still in charge. We determine our goals before each sprint and decide on the sprint’s focus.

How will this work

Every planning session we’ll pick a few improvements from our improvement dashboard to work on based on our goals for the sprint. These often impact the size and number of user stories, and trigger us to reduce risk and complexity. The team’s commitment is influenced by looking at our predicted velocity for the upcoming sprint compared to the previous sprints.

 

During the sprint we’ll keep an eye on the technical improvements we committed to. This includes codebase metrics like lines of code and duplication, but also our testing effort and bug fixes. We also keep an eye on if our predicted goals move up or down during the sprint.

At retrospective time we evaluate whether we’ve met our business goals. Did we do better or worse than expected? Than predicted? We quickly go over the improvements we committed to to see how far we’ve come.

Results

We’re keeping a record of how this works in practise here on the blog. If you want to follow the experiment live, head over to our public dashboard which is updated in real time.

If you want to help us make Grip the best predictive analytics tool for software development, you can share your feedback through our public beta program!

Kamiel Wanrooij
Machine Learning to Improve Software Development – Grip Public Beta Release

Almost There!

After nine months of heated discussions, late night coding, morning mock ups, weekend writing and whisky nights (@kmile does not drink beer…) we are proud to announce our public beta.

We started Grip with the idea to improve software development through data analysis and now (finally) have a first version of our application live.

For a software application Grip can:

  • Automatically harvest and analyze live data from the application’s software development process from a limited number of sources (currently,  GitHub, JIRA, BitBucket, SonarQube and Pivotal Tracker)
  • Present our analysis of this data in web based dashboards
  • Set software development goals for a software application based on the data we collect
  • Make predictions on reaching these goals based on the data
  • Recommend what areas of the application’s software development process to improve in order to reach those goals

How does Grip work?

We identified three categories of Software Development Goals, based on the traditional project management triangle, that can be scored against.

These categories are:

Velocity – Represents the actual productivity of your development activities. It quantifies how much can your team(s) can accomplish per unit of time.

User Satisfaction – Gives you a sense for how happy your users/clients are with your offering. It helps you understand whether user issues are being addressed.

Costs – Costs, as used here, actually describes a concept more closely aligned to cost efficiency. That is, are you generating the highest possible output from your development resources?

The ability to reach these goals is dependent on what happens during development.

Deciding how to best reach these goals is now often done based on the experience of the people creating the software and while valuable this too often fails.

We set out to create a more data driven approach, where we use live data generated during development to make predictions about the outcome, the software product in use.

So after a few conversations we came up with the following sketch:

 

The input variables are based on what happens in a software development process before your application is in production. We measure elements of the people that make it, the process by which it is made, the technologies used and the code the team produces.  The output variables are based on the goals described above.

Besides collecting the data from a software development process we also decided to use machine learning and simulations to find out about relations between inputs and desirable outputs.

Our Approach: Simulation and Machine Learning for Software Development

 

Our Approach

Not there yet

We are very proud on what we have achieved but also know we are not there yet.

We still have very limited data so our predictions and recommendations can be inaccurate, our models are rudimentary and we still need to define better goals. But we hope you like what we have so far.

One of the reasons that we released our public beta is that we want feedback. We want to better understand what software development goals are important and having more data will improve our analysis.

If you are interested feel free to sign up for our beta and we will talk to you about connecting as soon as we can.

You can also have a look at our development though our own grip here: http://app.grip.qa/gripqa/grip

 Things we’re Working On:

  • Improved UX
  • Better Goal Representation
  • More connected information sources
  • More capabilities for managing users of your grip
  • Automated onboarding
  • And of course more accurate predictions and recommendations
Jan Princen