Why DevOps: Problems the Practice Helps Solve in your Business
Most people in the IT industry have heard of DevOps, yet many are confused on what it really means. In this post, we’ll discuss why DevOps is necessary to improve your IT department and your business.
You might have heard the following statements about DevOps:
- DevOps is a close collaboration between Development team and Operations.
- DevOps is using Automation.
- DevOps is treating your infrastructure as code.
- DevOps is frequent and smaller deployments to production.
- DevOps is using Agile Methodologies.
All of the above are true. Yet, there are many more statements that try to define DevOps. In order to better understand, it’s helpful to first know why we need DevOps and what problems it really solves.
Let’s first discuss four key areas that are usually used to measure the performance of an IT organization and how a traditional IT organization rates in these key areas. These four key areas are identified in the Puppet labs State of DevOps Report.
- Deployment Frequency: Refers to how often code is deployed to Production
- Lead Time for Changes: Defines how long does it take to go from code commit to code running in production
- Mean Time to Recover: How long does it take to recover from failures and unplanned outages
- Change Failure Rate: Refers to what percentage of the changes require subsequent remediation
I’ll continue to reference these four terms for the rest of this blog. When in doubt, please refer back to the definitions above.
Less Often Deployment to Production
A traditional IT department also sometimes labelled as “slow IT” has extremely long development cycles, and the deployment frequency of how often code is deployed to production is very low. In such an organization, code is generally deployed once every 6 months or so.
Often, different IT departments don’t collaborate well. Development teams are often waiting on Operations/Service Delivery folks as they are not able to get resources fast enough and sometimes wait for weeks to get servers. Operations also sometimes take a long time to deploy code to production. In many software projects, release is a manually intensive, multi-step, and rather scary process causing further delays in the deployment frequency and lead time for changes (defined above). Why scary? Because if one step goes wrong, the entire application might fail. Sometimes if the steps are run in a slightly different order, it can lead to catastrophic failures as well. In many cases lead time for changes can be between one to six months in a slow performing organization.
Factors Causing Unreliable Releases
In a traditional organization, a massive amount of manual testing is needed before code can be deployed to production. Repetitive, manual testing where you have hundred page documents you have to run through every time there’s a code change cause a lot of stress on testers and the QA team and as a result can miss critical issues leading to a higher change failure rate. Lack of QA resources and distribution of QA resources across multiple projects also lead to higher change failure rate.
Developers often get incentives to ship code faster with not so much focus on quality resulting in unreliable shipments and higher change failure rate.
Lack of Productivity
Time spent on activities that don’t produce results can cause problems. A development team often waits on new environments to get created or on updates to existing environments. Developers also wait for approval on change requests before they make changes. If you are in a QA team, you might be waiting for the development to finish their work before it is ready for testing. All of this waiting result in low deployment frequency and high lead time for changes.
Problem of Knowledge Transfer
Another major problem is many project teams/consultants leave after deployment, sometimes with very little documentation and/or knowledge transfer. Identifying people who know the system well can be an issue as the poor soul supporting the system may not necessarily know the system intimately. It can be challenging to the person supporting the system to make a major change as the core knowledge no longer exists in one visible place. All of this leads to slow application recovery i.e. higher mean time to recover when critical issues occur.
Problems in Creating the Right Environments
Creating environments can also cause problems. Many organizations manage the configuration of their production environment through a team of operations people. Any change (configuration, database etc.) needed are carried out manually on the servers. This can result in operations taking a long time to prepare an environment for a release. Servers in the same cluster can have different artifacts, such as required libraries needed for applications or patch levels. All this manual work often results in low deployment frequency and high lead time for changes.
All of these problems result in lost trust in IT among the end-users. It creates a major conflict between development, operations, and business when in fact all the departments should be working closely together and collaborating effectively.
In my next blog of this series, I’ll define exactly what DevOps is and how it solves all the above-mentioned challenges.