a case study on using 100% cloud-based resources with automated software delivery
[We, KaDeeCorp Continuous Delivery in the Cloud](http://bvajjala.github.io/Consulting) help – typically large – organizations create one-click software delivery systems so that they can deliver software in a more rapid, reliable and repeatable manner (AKA [Continuous Delivery](http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912 “Continuous Delivery book”)). The only way this works is when Development works with Operations. As has been written elsewhere in this series, this means changing the hearts and minds of people because most organizations are used to working in ‘siloed’ environments. In this entry, I focus on implementation, by describing a real-world case study in which we have brought Continuous Delivery Operations to the Cloud consisting of a team of Systems and Software Engineers.
For years, we’ve helped customers in [Continuous Integration](“) and Testing so more of our work was with Developers and Testers. Several years ago, we hired a Sys Admin/Engineer/DBA who was passionate about automation. As a result of this, we began assembling multiple two-person “[DevOps](http://en.wikipedia.org/wiki/DevOps “DevOps on Wikipedia”)” teams consisting of a Software Engineer and a Systems Engineer both of whom being big-picture thinkers and not just “Developers” or “Sys Admins”. These days, we put together these targeted teams of Continuous Delivery and Cloud experts with hands-on experience as Software Engineers and Systems Engineers so that organizations can deliver software as quickly and as often as the business requires.
A couple of years ago we already had a few people in the company who were experimenting with using Cloud infrastructures so we thought this would be a great opportunity in providing cloud-based delivery solutions. In this case study, I cover a project we are currently working on for a large organization. It is a new Java-based web services project so we’ve been able to implement solutions using our recommended software delivery patterns rather than being constrained by legacy tools or decisions. However, as I note, we aren’t without constraints on this project. If I were you, I’d call “BS!” on any “case study” in which everything went flawlessly and assume it was an extremely small or a theoretical project in the author’s mind. This is the real deal. Enough said, on to the case study.
Fast Facts
Industry: Healthcare, Public Sector
Profile: The customer is making available to all, free of charge, a series of software specifications and open source software modules that together make up an oncology-extended Electronic Health Record capability. Key Business Issues: The customer was seeking that all team members are provided “unencumbered” access to infrastructure resources without the usual “request and wait” queued-based procedures present in most organizations Stakeholders: Over 100 people consisting of Developers, Testers, Analysts, Architects, and Project Management. Solution: Continuous Delivery Operations in the Cloud Key Tools/Technologies: Amazon Web Services - AWS (Elastic Computer Cloud (EC2), (Simple Storage Service (S3), Elastic Block Storage (EBS), etc.), Jenkins, JIRA Studio, Ant, Ivy, Tomcat and PostgreSQL
The Business Problem The customer was used to dealing with long drawn-out processes with Operations teams that lacked agility. They were accustomed to submitting Word documents via email to an Operations teams, attending multiple meetings and getting their environments setup weeks or months later. We were compelled to develop a solution that reduced or eliminated these problems that are all too common in many large organizations (Note: each problem is identified as a letter and number, for example: P1, and referred to later):
- Unable to deliver software to users on demand (P1)
- Queued requests for provisioned instances (P2)
- Unable to reprovision precise target environment configuration on demand (P3)
- Unable to provision instances on demand (P4)
- Configuration errors in target environments presenting deployment bottlenecks while Operations and Development teams troubleshoot errors (P5)
- Underutilized instances (P6)
- No visibility into purpose of instance (P7)
- No visibility into the costs of instance (P8)
- Users cannot terminate instances (P9)
- Increased Systems Operations personnel costs (P10)
Our Team We put together a four-person team to create a solution for delivering software and managing the internal Systems Operations for this 100+ person project. We also hired a part-time Security expert. The team consists of two Systems Engineers and two Software Engineers focused on Continuous Delivery and the Cloud. One of the Software Engineers is the Solutions Architect/PM for our team.
Our Solution We began with the end in mind based on the customer’s desire for unencumbered access to resources. To us, “unencumbered” did not mean without controls; it meant providing automated services over queued “request and wait for the Ops guy to fulfill the request” processes. Our approach is that every resource is in the cloud: Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) to reduce operations costs (P10) and increase efficiency. In doing this, effectively all project resources are available on demand in the cloud. We have also automated the software delivery process to Development and Test environments and working on the process of one-click delivery to production. I’ve identified the problem we’re solving – from above – in parentheses (P1, P8, etc.). The solution includes:
- On-Demand Provisioning – All hardware is provided via EC2’s virtual instances in the cloud, on demand (P2). We’ve developed a “Provisioner” (PaaS) that provides any authorized team member the capability to click a button and get their project-specific target environment (P3) in the AWS’ cloud – thus, providing unencumbered access to hardware resources. (P4) The Provisioner provides all authorized team members the capability to monitor instance usage (P6) and adjust accordingly. Users can terminate their own virtual instances (P9).
- Continuous Delivery Solution so that the team can deliver
software to users on demand (P1):
- Automated build script using Ant – used to drive most of the other automation tools
- Dependency Management using Ivy. We will be adding Sonatype Nexus
- Database Integration/Change using Ant and Liquibase
- Automated Static Analysis using Sonar (with CheckStyle, FindBugs, JDepend, and Cobertura)
- Test framework hooks for running JUnit, etc.
- Reusing remote Deployment custom Ant scripts that use Java Secure Channel and Web container configuration. However, we will be starting a process of using a more robust tool such as ControlTier to perform deployment
- Automated document generation using Grand, SchemaSpy (ERDs) and UMLGraph
- Continuous Integration server using Hudson
- Continuous Delivery pipeline system – we are customizing Hudson to emulate a Deployment Pipeline
- Issue Tracking – We’re using the JIRA Studio SaaS product from Atlassian (P10), which provides issue tracking, version-control repository, online code review and a Wiki. We also manage the relationship with the vendor and perform the user administration including workflow management and reporting.
- Development Infrastructure– There were numerous tools selected by the customer for Requirements Management and Test Management and Execution including HP QC, LoadRunner, SoapUI, Jama Contour. Many of these tools were installed and managed by our team onto the EC2 instances
- Instance Management– Any authorized team member is able to monitor virtual instance usage by viewing a web-based dashboard (P6, P7, P8) we developed. This helps to determine instances that should no longer be in use or may be eating up too much money. There is a policy that test instances (e.g. Sprint Testing) are terminated no less than every two weeks. This promotes ephemeral environments and test automation.
- Deployment to Production – Much of the pre-production infrastructure is in place, but we will be adding some additional automation features to make it available to users in production (P1). The deployment sites are unique in that we aren’t hosting a single instance used by all users and it’s likely the software will be installed at each site. One plan is to deploy separate instances to the cloud or to virtual instances that are shipped to the user centers
- System Monitoring and Disaster Recovery – Using CloudKick to notify us of instance errors or anomalies. EC2 provides us with some monitoring as well. We will be implementing a more robust monitoring solution using Nagios or something similar in the coming months. Through automation and supporting process, we’ve implemented a disaster recovery solution.
Benefits The benefits are primarily around removing the common bottlenecks from processes so that software can be delivered to users and team members more often. Also, we think our approach to providing on-demand services over queued-based requests increases agility and significantly reduces costs. Here are some of the benefits:
- Deliver software more often – to users and internally (testers, managers, demos)
- Deliver software more quickly – since the software delivery process is automated, we identify the SVN tag and click a button to deliver the software to any environment
- Software delivery is rapid, reliable and repeatable. All resources can be reproduced with a single click – source code, configuration, environment configuration, database and network configuration is all checked in and versioned and part of a single delivery system.
- Increased visibility to environments and other resources – All preconfigured virtual hardware instances are available for any project member to provision without needing to submit forms or attend countless meetings
Tools Here are some of the tools we are using to deliver this solution. Some of the tools were chosen by our team exclusively and some by other stakeholders on the project.
- AWS EC2– Cloud-based virtual hardware instances
- AWS S3 – Cloud-based storage. We use S3 to store temporary software binaries and backups
- AWS EBS – Elastic Block Storage. We use EBS to attach PostgreSQL data volumes
- Ant – Build Automation
- CloudKick – Real-time Cloud instance monitoring
- ControlTier – Deployment Automation. Not implemented yet.
- HP LoadRunner – Load Testing
- HP Quality Center (QC) – Test Management and Orchestration
- Ivy – Dependency Management
- Jama Contor– Requirements Management
- Jenkins – Continuous Integration Server
- JIRA Studio– Issue Tracking, Code Review, Version-Control, Wiki
- JUnit – Unit and Component Testing
- Liquibase – Automated database change management
- Nagios – or Zenoss. Not implemented yet
- Nexus – Dependency Management Repository Manager (not implemented yet)
- PostgreSQL – Database used by Development team. We’ve written script that automate database change management
- Provisioner (Custom Web-based) – Target Environment Provisioning and Virtual Instance Monitoring
- Puppet – Systems Configuration Management
- QTP – Test Automation
- SoapUI – Web Services Test Automation
- Sonar – code quality analysis (Includes CheckStyle, PMD, Cobertura, etc.)
- Tomcat/JBoss – Web container used by Development. We’ve written script to automate the deployment and container configuration
Solutions we’re in the process of Implementing We’re less than a year into the project and have much more work to do. Here are a few projects we’re in the process or will be starting to implement soon:
- System Configuration Management – We’ve started using Puppet, but we are expanding how it’s being used in the future
- Deployment Automation – The move to a more robust Deployment automation tool such as ControlTier
- Development Infrastructure Automation – Automating the provisioning and configuration of tools such as HP QC in a cloud environment. etc.
What we would do Differently Typically, if we were start a Java-based project and recommend tools around testing, we might choose the following tools for testing, requirements and test management based on the particular need:
- Selenium with SauceLabs
- JIRA Studio for Test Management
- JIRA Studio for Requirements Management
- JMeter – or other open source tool – for Load Testing
However, like most projects there are many stakeholders who have their preferred approach and tools they are familiar in using, the same way our team does. Overall, we are pleased with how things are going so far and the customer is happy with the infrastructure and approach that is in place at this time. I could probably do another case study on dealing with multiple SaaS vendors, but I will leave that for another post.
Summary There’s much more I could have written about what we’re doing, but I hope this gives you a decent perspective of how we’ve implemented a DevOps philosophy with Continuous Delivery and the Cloud and how this has led our customer to more a service-based, unencumbered and agile environment.