The goal of a software development team is to deliver great service to the customer.
But you can’t always anticipate what will happen when your applications start running in a real environment.
To extend the idea of the Agile methodology, the team needs to establish good feedback loops to tell them how things are running in production and what they need to do to improve it.
What is DevOps (development operations)?
DevOps is a combination of tools, practices, and cultural philosophy that integrates processes between software development and operations teams in an organization.
It removes barriers between the two teams, shortens the development lifecycle, and increases the speed at which an organization can deliver high-quality applications and services.
What are the Critical Elements of a Successful DevOps Process?
You need to have all of the following elements for a good DevOps process to work:
- Telemetry/Alerting
- An Agile Framework
- Automated Testing
- Continuous Integration/Continuous Deployment
1. Telemetry / Alerting
Telemetry can come in many forms, but operationally, you should focus on two main kinds:
- Application Logs
- Metrics
Application Logs as Telemetry
Application Logs can tell you about what kinds of data are flowing through the system, and the kinds of errors you are encountering. These should be able to help you quickly find out where problems are in the system.
Ideally, you are using a system that allows you to track logs across the various components of your system, so you can trace a single data event as it travels through the system (e.g. Kibana).
Metrics as Telemetry
Metrics count and measure things. You can use these to track when various components are being called (ones owned by the Development team as well as ones external to the team), and how long they are taking to respond.
Or, you can count the number of times certain kinds of errors are thrown.
Many times, it’s useful to also have a “heartbeat” metric, which simply indicates that your application is actually running and not hung or crashed.
You should be able to create graphs of this data in a “system dashboard” (e.g. Grafana) and make it available to the Operations or Development teams.
Once you have Metrics established, you can implement alerts based on events or thresholds. This will free you up from having to check logs or reports all the time.
An alert may tell you if an application has encountered too many errors, or if data seems to be out of a “normal” range. For example, if traffic to your website is too low or too high, or if your server response times are taking too long.
You can use a database such as Influx to track the metrics and an alerting component like Kapacitor to send a message to your Operations or Development team via email, or a service such as PagerDuty or Slack.
The alert should contain enough contextual information to help the team quickly track down and diagnose the problem. For example, which server environment, which component, and what type of data is causing the problem.
The team can configure the alerting system to notify on a broad range of possible problems, and then “tune” the alerts over time to reduce the number of false alerts.
2. An Agile Framework
As your Development team works on new features, they should be prepared to take on unplanned tasks based on information gathered from telemetry. This might include fixing applications, adjusting configurations, or adding capacity.
An Agile Framework such as Scrum or Kanban gives you the flexibility to change plans quickly. If using a framework such as Scrum, then you should allocate time in every Sprint to take on production changes.
3. Automated Testing
In cases where a configuration or code change is necessary, having good test automation allows the team to know that their change did not inadvertently break something else. There can sometimes be hundreds of tests that can be run over a complex piece of code.
If manual testing takes days, then the team cannot move quickly to fix problems. Automated testing can often determine if a fix is okay within minutes.
4. Continuous Integration / Continuous Deployment
The Development team should be using a code repository with a good branching process to support Continuous Integration and Continuous Deployment best practices.
The team should be able to make the needed changes, run an automated unit test, merge it into a production branch, run an automated system test, and release the change into a production environment within a relatively short amount of time.
In the worst case, if the change does not solve the problem (or creates other ones), the team should be able to quickly roll back the change.
Making small, incremental changes continuously can lower the risk and impact of the change, as opposed to waiting and gathering a large number of changes and releasing them all at once.
When releasing a large number of changes all at once, you often run the risk of having to roll back all of the changes if something goes wrong with one change.
Why is DevOps Important?
DevOps enables the software development and operations teams to work together closely to solve problems in the development lifecycle in a quicker, more accurate, and more efficient way.
Before DevOps, the responsibility for building and testing an application until it was “production ready” belonged to the software development team. Then, the code was handed off to an operations team who would deploy into production and keep it running.
This separation of responsibilities often led to numerous problems. Sometimes, the Operations team wouldn’t completely understand the complex software they were responsible to keep running.
Other times, they felt the Software Development team didn’t build the software to be run operationally in a production environment.
For example, they didn’t include enough telemetry to inform the Operations team of how the software was running or didn’t architect it to scale up as demand increased.
Adopting DevOps Best Practices Today
DevOps is a journey and not a destination. Beginning to adopt these practices will help your team down the path of DevOps.
DevOps skills and tools are continuously evolving. Similar to embracing Agile methodologies, DevOps is part of a complete culture change within an organization.
The overall solution will include many other elements, such as on-call and escalation procedures, incident tracking systems, and so on. However, if you are trying to put a DevOps process in place, you will need to focus on making these elements work well.
Need Help With Adopting DevOps?
We are technology experts & subject-matter thought leaders who have come together to form a consulting community that delivers unparalleled value to our client partners.