Balaji Vajjala's Blog

A DevOps Blog from Trenches

6 Challenges in implementing Enterprise Continuous Delivery

Implementing Continuous Delivery (CD) is being listed as one of the key initiatives for many enterprises. CD’s ability to rapidly and repeatedly bring feature improvements to market, aligns naturally with business initiatives to accelerate time-to-market and stay a step ahead of the competition, while maintaining quality.

It also helps to meet the expectations of today’s “always-on” consumer, who has come to expect simple, one-click installation of applications that update automatically and regularly. However, putting CD into practice can be challenging, to say the least. Particularly in the context of the existing development and release environment in a large enterprise. In this article, I’ll examine some key challenges that we’ve encountered when helping clients introduce CD and related automation, along with some helpful suggestions on dealing with them.

Challenge-1 Massive, monolithic applications

A key aspect of CD is making small, incremental changes to an application, to enable fast feedback and faster fixes. Large, tightly-coupled applications with multiple components that need to be compiled, tested and deployed together are tough to update incrementally, leading to long development, test and deploy cycles.

Quality control and root cause analysis is harder too, as many changes are being implemented at the same time. Also, as each release procedure needs to differ slightly, it is hard to create a standardized delivery pipeline and benefit from the resulting increase in reliability.

When faced with this challenge in the past, we have tried to initiate a workstream to incrementally break out components of the application into separate modules. These can then be built and deployed independently, allowing for faster feedback cycles with smaller changesets.

Challenge-2 Minimal automation

We shouldn’t be automating for the sake of automation: manual activities aren’t “banned” from a CD pipeline on principle. However, a high percentage of manual steps will likely slow down your delivery pipeline and increase the chance of errors, thus preventing you from being able to scale your CD implementation. In order to meet your throughput and consistency goals, it is usually required to either automate a bulk of the manual steps in your delivery process, or replace them with suitable alternatives.

It is important to treat this automation effort as seriously as any other development effort, applying appropriate design, coding and testing practices in order to avoid ending up with an impossible to maintain “ball of mud”. The Infrastructure as Code movement has made significant steps in this area, for instance promoting test-driven development of provisioning and deployment automation and providing supporting tooling.

Challenge-3 Limited environments

A limited pool of shared test environments increases the risk of bottlenecks during your CD implementation. You would need to “block” or “reserve” an environment to avoid two pipelines running side-by-side from attempting to deploy and test in the same environment. Measures also need to be taken to prevent one pipeline blocking an environment for too long, or for one pipeline to always just beat the other to the required environment, leading to “starvation” for the other project.

Furthermore, an interesting data point from the aforementioned survey is that misconfigured or “broken” environments, that have been unexpectedly modified by previous teams or test runs, are one of the leading causes of deployment failures. If you plan to be running delivery pipelines at scale, a dynamic pool of available, “clean” target environments is required. Private, public or hybrid cloud platforms, coupled with provisioning and configuration management tools, allow you to grow and shrink this pool automatically and on-demand.

Challenge-4 Enterprise release management

As soon as we approach QA or production in most enterprise environments, an increasing number of release management requirements must be met – creation of a change ticket, placing the change on the agenda of the next Change Board meeting, receiving Change Board approval, confirming deployment windows etc. etc.

How do you integrate such requirements into your delivery pipelines?

Some possibilities:

  • Simply cap all delivery pipelines at the test stage, before running into any release management conditions. The goal is typically to take CD further than just test environments, though.

  • Integrate the various release management steps into the pipeline, e.g. by manually and, eventually, automatically creating and scheduling a change ticket, or by automatically setting a start time on the pipeline’s deployment phase from the change management system.

  • Revisit the need for certain change management conditions. The origin of such practices is typically to ensure that only changes of an approved level of quality and stability make it to production – precisely the level of quality and stability that prior stages of a delivery pipeline are intended to verify.

Challenge-5 Managing multiple custom pipelines.

In a large organization with a diverse service portfolio, spanning different technology platforms, departments, customers, and teams there will be many pipelines to manage as you scale your CD implementation.

If every pipeline ends at a different stage in the delivery process, it would be difficult to compare metrics such as cycle time, throughput or percentage of successful executions

A large set of pipelines is easier to manage if each one is based on a standard template. Standardized pipelines also allow for more meaningful comparative reporting as well as providing useful feedback that can be applied to other pipelines. Templates can be as simple as a shared Wiki page but can also supported by CD tools. The number of templates you should start with depends on the variation across your service portfolio; one per technology stack is often a useful starting point. Over time, you will hopefully be able to consolidate towards just a handful of pipeline types.

Challenge-6 Ownership and security

Automated delivery pipelines span multiple teams of the IT organization, something that is especially highlighted when pipeline stages fail and it’s hard to find who needs to fix it.

Every pipeline stage needs to have an owner(s) tasked not only with fixing problems and getting the delivery stream running again, but also with contributing to feedback-driven improvement of the pipeline as a whole. Since visibility into the state of the entire pipeline is important for all stakeholders, it is important that any orchestration tool considered offers a suitable security model.

For example, developers will probably need to examine the results of a functional test phase to help identify the cause of test failures. However, they should not be able to disable or modify the configuration of the functional testing step.

In Summary

Analyzing which of the common challenges to putting CD into practice apply in your situation should be a first preparatory step in your implementation. Gain an accurate picture of your current baseline, structure your implementation in measurable phases, and then work on dealing with these challenges to clear the way for your first delivery pipelines with defined roles and responsibilities for each phase.

Mitigating any challenges that you identify early in the project cycle will help your implementation progress smoothly and help you address future challenges.

Your CD implementation will then be on the way to providing faster releases, more reliable feature delivery and steady improvement driven by quicker feedback and better insight.

Best Practices of Continuous Integration

What is Continuous Integration

Martin Fowler has the best description of CI

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.

Implementing Practical Continuous Deployment

In Early part of this year, I had the pleasure of speaking at one of the Local User Group about some of our experiences with continuous delivery and deployment here. The slides for this are available online, but the talk generated a lot of discussion at the time and I’d like to recap some of it here.

Jenkins Job Builder and How to Extned it

What is jenkins job builder

Jenkins job builder is extreme good tool to manage your jenkins CI jobs, it takes simple description from YAML files, and use them to configure jenkins.

#set free style job
#job-template.yml
- job:
    name: testjob
    project-type: freestyle
    defaults: global
    disabled: false
    display-name: 'Fancy job name'
    concurrent: true
    quiet-period: 5
    workspace: /srv/build-area/job-name
    block-downstream: false
    block-upstream: false

Then put your jenkins access into jenkins.ini file

[jenkins]
user=USERNAME
password=USER_TOKEN
url=JENKINS_URL
ignore_cache=IGNORE_CACHE_FLAG

Based on the job configuration above, you just need to type command

$ jenkins-jobs --conf jenkins.ini update job-template.yaml 

Then your job testjob is created in your jenkins server.

The project is created by openstack-infrastructure team, it is used to manage the openstack environment, fairly good.

How it works

There is no magic behind it, jenkins-jobs just convert the job-template.yaml to jenkins XML request file, and use jenkins remote API to send create request.

Try to do below to understand this.

$ jenkins-jobs test job-template.yaml -o .

Then xml file testjob is created, see

<?xml version="1.0" ?>
<project>
  <actions/>
  <description>

&lt;!-- Managed by Jenkins Job Builder --&gt;</description>
  <keepDependencies>false</keepDependencies>
  <disabled>false</disabled>
  <displayName>Fancy job name</displayName>
  <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
  <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
  <concurrentBuild>true</concurrentBuild>
  <customWorkspace>/srv/build-area/job-name</customWorkspace>
  <quietPeriod>5</quietPeriod>
  <canRoam>true</canRoam>
  <properties/>
  <scm class="hudson.scm.NullSCM"/>
  <builders/>
  <publishers/>
  <buildWrappers/>
</project>

Now you can use curl command to send the request (testjob) directly !!

$ curl --user USER:PASS -H "Content-Type: text/xml" -s --data "@testjob" "http://jenkins-server/createItem?name=testjob"

How to recreate your jenkins job

Looks great, finally you need think about how to re-create your jenkins job, it is also simple, just download the config.xml

$ curl --user USER:PASS http://jenkins-server/testjob/config.xml

Or open the configuration page in broswer *http://jenkins-server/testjob/configure* and map from YAML file.

You need to read jenkins job builder’s guideline to know the map, generate it had level Macro like builders, which is connected to the real python builders module to do transformation from YAML to XML.

What you stated in YAML file like

-job:
  name: test_job
  builders:
- shell: "make test"

it will be converted to

<builders>
<hudson.tasks.Shell>
  <command>make test</command></hudson.tasks.Shell>
</builders>

How to extend

Greatly to see jenkins job builder already had lots of default modules to support your normal jenkins jobs, but there is exceptions like some none popular jenkins plugins or your own plugins.

Then it is time to extend the module, the existing document: Extending is not clear enough, I will use example to show how it works, code is in github jenkins-buddy project

ArtifactDeployer Plugin is used as example, this plugin is the popular plugin to deploy the artifacts to other folder.

Artifact Deploy Plugin

And I want to have .YAML like below

*#artifactdeploy.yaml*
- job:
name: test-job
publishers:
  - artifactdeployer: 
  includes: 'buddy-*.tar.gz'
  remote: '/project/buddy'

write codes to transform

Now I need to download the existing jobs to see how XML looks like, using curl above, I got it like

<publishers>
   ...  
  <org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerPublisher plugin="artifactdeployer@0.27">
<entries>
  <org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerEntry>
<includes>buddy-*.tar.gz</includes>
<basedir></basedir>
<excludes></excludes>
<remote>/project/buddy</remote>
<flatten>false</flatten>
<deleteRemote>false</deleteRemote>
<deleteRemoteArtifacts>false</deleteRemoteArtifacts>
<deleteRemoteArtifactsByScript>false</deleteRemoteArtifactsByScript>
<failNoFilesDeploy>false</failNoFilesDeploy>
  </org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerEntry>
</entries>
<deployEvenBuildFail>false</deployEvenBuildFail>
  </org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerPublisher>
..
</publishers> 

It belongs the section publishers So I write the jenkins_buddy/modules/publishers.py module to add one function artifactdeployer:

def artifactdeployer(parser, xml_parent, data):
    logger = logging.getLogger("%s:artifactdeployer" % __name__)
    artifactdeployer = XML.SubElement(xml_parent, 'org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerPublisher')
    entries = XML.SubElement(artifactdeployer, 'entries')
    entry = XML.SubElement(entries, 'org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerEntry')
    print data
    XML.SubElement(entry, 'includes').text = data['includes']
    XML.SubElement(entry, 'remote').text = data['remote']

It is the core part handling convert.

Hook into jenkins-job builder

Now you need hook this script into jenkins-jobs builder, thank for the entry_points in python, it can be used for this.

Create the plugin related script and structure, add new entry_point in setup.py

#setup.py in jenkins-buddy
entry_points={
    'jenkins_jobs.publishers': [
    'artifactdeployer=jenkins_buddy.modules.publishers:artifactdeployer',
    ],
}

it tells jenkins-jobs if you meet new keyword artifactdeployer in publishers, please let me jenkins_buddy.modules.publishers:artifactdeployer to handle.

Verify it

Build the pip package local and install it

$ python setup.py sdist
$ pip install dist/jenkins-buddy-0.0.5.zip

And verify the new job, Bingo, it works.

$ jenkins-jobs test artifactdeploy.yaml -o . 

###Make it more complete by checking jenkins plugin java code

Maybe you noticed, it is hack solution, since I skipped some parameter converting and guess what the XML will look like, if you want to make it more complete, we need to check the java codes directly.

src/main/java/org/jenkinsci/plugins/artifactdeployer/ArtifactDeployerPublisher.java is the class we need to take care.

@DataBoundConstructor
public ArtifactDeployerPublisher(List<ArtifactDeployerEntry> deployedArtifact, boolean deployEvenBuildFail) {
    this.entries = deployedArtifact;
    this.deployEvenBuildFail = deployEvenBuildFail;
    if (this.entries == null)
    this.entries = Collections.emptyList();
}

It is directly mapping from XML into internal data, if you need know more, learn how to develop jenkins plugin.

Nodejs Deployment: Building and Configuring on Amazon Linux AMI

Logging in and updating system to latest

SSH your shiny new VM,

Now lets update the system to the latest:

1
sudo yum update

Install OS dependencies

We’r going to build Node.js from sources, some dependencies (such as gcc) are required:

1
sudo yum install gcc-c++ make openssl-devel git

Deploy/Release Workflow from GitHub

## Workflow : Deploying/Release Apps from Development to Production ##

Deploying is a big part of the lives of most of our Engineering employees. We don’t have a release manager and there are no set weekly deploys. Developers and designers are responsible for shipping new stuff themselves as soon as it’s ready. This means that deploying needs to be as smooth and safe a process as possible.

The best system we’ve found so far to provide this flexibility is to have people deploy branches. Changes never get merged to master until they have been verified to work in production from a branch. This means that master is always stable; a safe point that we can roll back to if there’s a problem.

The basic workflow goes like this:

  • Push changes to a branch in GitHub
  • Wait for the build to pass on our CI server (Jenkins)
  • Tell Hubot to deploy it
  • Verify that the changes work and fix any problems that come up
  • Merge the branch into master Not too long ago, however, this system wasn’t very smart. A branch could accidentally be deployed before the build finished, or even if the build failed. Employees could mistakenly deploy over each other. As the company has grown, we’ve needed to add some checks and balances to help us prevent these kinds of mistakes.

Safety First

The first thing we do now, when someone tries to deploy, is make a call to Janky to determine whether the current CI build is green. If it hasn’t finished yet or has failed, we’ll tell the deployer to fix the situation and try again.

Next we check whether the application is currently “locked”. The lock indicates that a particular branch is being deployed in production and that no other deploys of the application should proceed for the moment. Successful builds on the master branch would otherwise get deployed automatically, so we don’t want those going out while a branch is being tested. We also don’t want another developer to accidentally deploy something while the branch is out.

The last step is to make sure that the branch we’re deploying contains the latest commit on master that has made it into production. Once a commit on master has been deployed to production, it should never be “removed” from production by deploying a branch that doesn’t have that commit in it yet.

We use the GitHub API to verify this requirement. An endpoint on the github.com application exposes the SHA1 that is currently running in production. We submit this to the GitHub compare API to obtain the “merge base”, or the common ancestor, of master and the production SHA1. We can then compare this to the branch that we’re attempting to deploy to check that the branch is caught up. By using the common ancestor of master and production, code that only exists on a branch can be removed from production, and changes that have landed on master but haven’t been deployed yet won’t require branches to merge them in before deploying.

If it turns out the branch is behind, master gets merged into it automatically. We do this using the new :sparkles:Merging API:sparkles: that we’re making available today. This merge starts a new CI build like any other push-style event, which starts a deploy when it passes.

At this point the code actually gets deployed to our servers. We usually deploy to all servers for consistency, but a subset of servers can be specified if necessary. This subset can be by functional role — front-end, file server, worker, search, etc. — or we can specify an individual machine by name, e.g, ‘fe7’.

Watch it in action

What now? It depends on the situation, but as a rule of thumb, small to moderate changes should be observed running correctly in production for at least 15 minutes before they can be considered reasonably stable. During this time we monitor exceptions, performance, tweets, and do any extra verification that might be required. If non-critical tweaks need to be made, changes can be pushed to the branch and will be deployed automatically. In the event that something bad happens, rolling back to master only takes 30 seconds.

All done!

If everything goes well, it’s time to merge the changes. At GitHub, we use Pull Requests for almost all of our development, so merging typically happens through the pull request page. We detect when the branch gets merged into master and unlock the application. The next deployer can now step up and ship something awesome.

How do we do it?

Most of the magic is handled by an internal deployment service called Heaven. At its core, Heaven is a catalog of Capistrano recipes wrapped up in a Sinatra application with a JSON API. Many of our applications are deployed using generic recipes, but more complicated apps can define their own to specify additional deployment steps. Wiring it up to Janky, along with clever use of post-receive hooks and the GitHub API, lets us hack on the niceties over time. Hubot is the central interface to both Janky and Heaven, giving everyone in Campfire great visibility into what’s happening all of the time. As of this writing, 75 individual applications are deployed by Heaven.

Designing A RESTful API That Doesn’t Suck

Designing A RESTful API That Doesn’t Suck

As we’re getting closer to shipping the first version of devo.ps and we are joined by a few new team members, the team took the time to review the few principles we followed when designing our RESTful JSON API. A lot of these can be found on apigee’s blog (a recommended read). Let me give you the gist of it:

  • Design your API for developers first, they are the main users. In that respect, simplicity and intuitivity matter.

  • Use HTTP verbs instead of relying on parameters (e.g. ?action=create). HTTP verbs map nicely with CRUD:

    • POST for create,
    • GET for read,
    • DELETE for remove,
    • PUT for update (and PATCH too).
  • Use HTTP status codes, especially for errors (authentication required, error on the server side, incorrect parameters)… There are plenty to choose from, here are a few:

    • 200: OK
    • 201: Created
    • 304: Not Modified
    • 400: Bad Request
    • 401: Unauthorized
    • 403: Forbidden
    • 404: Not Found
    • 500: Internal Server Error
  • Simple URLs for resources: first a noun for the collection, then the item. For example /emails and /emails/1234; the former gives you the collection of emails, the second one a specific one identified by its internal id.

  • Use verbs for special actions. For example, /search?q=my+keywords.

  • Keep errors simple but verbose (and use HTTP codes). We only send something like { message: "Something terribly wrong happened" } with the proper status code (e.g. 401 if the call requires authentication) and log more verbose information (origin, error code…) in the backend for debugging and monitoring.

Relying on HTTP status codes and verbs should already help you keep your API calls and responses lean enough. Less crucial, but still useful:

  • JSON first, then extend to other formats if needed and if time permits.
  • Unix time, or you’ll have a bad time.
  • Prepend your URLs with the API version, like /v1/emails/1234.
  • Lowercase everywhere in URLs.

I Can Haz Init Script

I Can Haz Init Script

Something went awfully wrong, and a rogue process is eating up all of the resources on one of your servers. You have no other choice but to restart it. No big deal, really; this is the age of disposable infrastructure after all. Except when it comes back up, everything starts going awry. Half the stuff supposed to be running is down and it’s screwing with the rest of your setup.

INIT SCRIPTS, Y U NO LIKE?

You don’t get to think about them very often, but init scripts are a key piece of a sound, scalable strategy for your infrastructure. It’s a mandatory best practice. Period. And there are quite a few things in the way of getting them to work properly at scale in production environments. It’s a tough world out there.

What we’re dealing with…

Packages

Often enough, you’re gonna end up installing a service using the package manager of your distro: yum, apt-get, you name it. These packages usually come with an init script that should get you started.

Sadly, as your architecture grows in complexity, you’ll probably run into some walls. Wanna have multiple memcache buckets, or several instances of redis running on the same box? You’re out of luck buddy. Time to hack your way through:

  • Redefine your start logic,
  • Load one or multiple config files from /etc/defaults or /etc/sysconfig,
  • Deal with the PIDs, log and lock files,
  • Implement conditional logic to start/stop/restart one or more of the services,
  • Realize you’ve messed something up,
  • Same player shoot again.

Honestly: PITA.

Built from source

First things first: you shouldn’t be building from source (unless you really, really need to).

Now if you do, you’ll have to be thorough: there may be samples of init scripts in there, but you’ll have to dig them out. /contrib, /addons, …it’s never in the same place.

And that makes things “fun” when you’re trying to unscrew things on a box:

  • You figured out that MySQL is running from /home/user/src/mysql,
  • You check if there’s an init script: no luck this time…
  • You try to understand what exactly launched mysqld_safe,
  • You spend a while digging into the bash history smiling at typos,
  • You stumble on a run.sh script (uncommented, of course) in the home directory. Funny enough, it seems to be starting everything from MySQL, NGINX and php-fpm to the coffee maker.
  • You make a mental note to try and track down the “genius” who did that mess of a job, and get busy with converting everything to a proper init script.

Great.

Why existing solutions suck

Well, based on what we’ve just seen, you really only have two options:

  1. DIY; but if you’re good at what you do, you’re probably also lazy. You may do it the first couple times, but that’s not gonna scale, especially when dealing with the various flavors of init daemons (upstart, systemd…),
  2. Use that thing called “the Internet”; you read through forum pages, issue queues, gists and if you’re lucky you’ll find a perfect one (or more likely 10 sucky ones). Kudos to all those of whom shared their work, but you’ll probably be back to option 1.

We can do better than this

You’ll find a gazillion websites for pictures of kittens, but as far as I know, there is no authoritative source for init scripts. That’s just not right: we have to fix it. A few things I’m aiming for:

  • Scalable; allow for multiple instances of a service to be started at once from different config files (see the memcache/redis example),
  • Secure; ensure configtest is run before a restart/reload (because, you know, a faulty config file preventing the service to restart is kind of a bummer),
  • Smart; ensuring for example that the cache is aggressively flushed before restarting your database (so that you don’t end-up waiting 50 min for the DB to cleanly shutdown).

I’ve just created a repo where I’ll be dumping various init scripts that will hopefully be helpful to others. I’d love to get suggestions or help.

And by the way, things are not much better with applications, though we’re trying our best to improve things there too with things like pm2 (fresh and shinny, more about it in a later post).

Goodbye node-forever

Goodbye node-forever, hello PM2

pm2 logo

It’s no secret that the devo.ps team has a crush on Javascript; node.js in the backend, AngularJS for our clients, there isn’t much of our stack that isn’t at least in part built with it. Our approach of building static clients and RESTful JSON APIs means that we run a lot of node.js and I must admit that, despite all of it awesomeness, node.js still is a bit of a headache when it comes to running in production. Tooling and best practices (think monitoring, logging, error traces…) are still lacking when compared to some of the more established languages.

So far, we had been relying on the pretty nifty node-forever. Great tool, but a few things were missing:

  • Limited monitoring and logging abilities,
  • Poor support for process management configuration,
  • No support for clusterization,
  • Aging codebase (which meant frequent failures when upgrading Node).

This is what led us to write PM2 in the past couple months. We thought we’d give you a quick look at it while we’re nearing a production ready release.

So what’s in the box?

First things first, you can install it with npm:

npm install -g pm2

Let’s open things up with the usual comparison table:

FeatureForeverPM2

Keep Alive

Coffeescript

Log aggregation

API

Terminal monitoring

Clustering

JSON configuration

And now let me geek a tad more about the main features…

Native clusterization

Node v0.6 introduced the cluster feature, allowing you to share a socket across multiple networked Node applications. Problem is, it doesn’t work out of the box and requires some tweaking to handle master and children processes.

PM2 handles this natively, without any extra code: PM2 itself will act as the master process and wrap your code into a special clustered process, as Nodejs does, to add some global variables to your files.

To start a clustered app using all the CPUs you just need to type something like that:

$ pm2 start app.js -i max

Then;

$ pm2 list

Which should display something like (ASCII UI FTW);

pm2 list

As you can see, your app is now forked into multiple processes depending on the number of CPUs available.

Monitoring a la termcaps-HTOP

It’s nice enough to have an overview of the running processes and their status with the pm2 list command. But what about tracking their resources consumption? Fear not:

$ pm2 monit

You should get the CPU usage and memory consumption by process (and cluster).

pm2 monit

Disclaimer: node-usage doesn’t support MacOS for now (feel free to PR). It works just fine on Linux though.

Now, what about checking on our clusters and GC cleaning of the memory stack? Let’s consider you already have an HTTP benchmark tool (if not, you should definitely check WRK):

$ express bufallo     // Create an express app
$ cd bufallo
$ npm install
$ pm2 start app.js -i max
$ wrk -c 100 -d 100 http://localhost:3000/

In another terminal, launch the monitoring option:

$ pm2 monit

W00t!

Realtime log aggregation

Now you have to manage multiple clustered processes: one who’s crawling data, another who is processing stuff, and so on so forth. That means logs, lots of it. You can still handle it the old fashioned way:

$ tail -f /path/to/log1 /path/to/log2 ...

But we’re nice, so we wrote the logs feature:

$ pm2 logs

pm2 monit

Resurrection

So things are nice and dandy, your processes are humming and you need to do a hard restart. What now? Well, first, dump things:

$ pm2 dump

From there, you should be able to resurrect things from file:

$ pm2 kill     // let's simulate a pm2 stop
$ pm2 resurect // All my processes are now up and running 

API Health point

Let’s say you want to monitor all the processes managed by PM2, as well as the status of the machine they run on (and maybe even build a nice Angular app to consume this API…):

$ pm2 web

Point your browser at http://localhost:9615, aaaaand… done!

And there’s more…

  • Full tests,
  • Generation of update-rc.d (pm2 startup), though still very alpha,
  • Development mode with auto restart on file change (pm2 dev), still very drafty too,
  • Log flushing,
  • Management of your applications fleet via JSON file,
  • Log uncaught exceptions in error logs,
  • Log of restart count and time,
  • Automated killing of processes exiting too fast.

What’s next?

Well first, you could show your love on Github (we love stars): https://github.com/Unitech/pm2.

We developed PM2 to offer an advanced and complete solution for Node process management. We’re looking forward to getting more people helping us getting there: pull requests are more than welcome. A few things already on the roadmap that we’ll get right at once we have a stable core:

  • Remote administration/status checking,
  • Built-in inter-processes communication channel (message bus),
  • V8 GC memory leak detection,
  • Web interface,
  • Persistent storage for monitoring data,
  • Email notifications.

Special thanks to Makara Wang for concepts/tools and Alex Kocharin for advices and pull requests.

Ansible Simply Kicks Ass

Ansible Simply Kicks Ass

The devo.ps team has been putting quite a few tools to the test over the years when it comes to managing infrastructures. We’ve developed some ourselves and have adopted others. While the choice to use one over another is not always as clear-cut as we’d like (I’d love to rant about monitoring but will leave that for a later post), we’ve definitely developed kind of a crush for Ansible in the past 6 months. We went through years of using Puppet, then Chef and more recently Salt Stack, before Ansible gained unanimous adoption among our team.

What makes it awesome? Well, on top of my head:

  • It’s agent-less and works by default in push mode (that last point is subjective, I know).
  • It’s easy to pick up (honestly, try and explain Chef or Puppet to a developer and see how long that takes you compared to Ansible).
  • It’s just Python. It makes it easier for people like me to contribute (Ruby is not necessarily that mainstream among ops) and also means minimal dependency on install (Python is shipped by default with Linux).
  • It’s picking up steam at an impressive pace (I believe we’re at 10 to 15 pull requests a day).
  • And it has all of the good stuff: idempotence, roles, playbooks, tasks, handlers, lookups, callback plugins…

Now, Ansible is still very much in its infancy and some technologies may not yet be supported. But there are a great deal of teams pushing hard on contributions, including us. In the past few weeks, for example, we’ve contributed both Digital Ocean and Linode modules. And we have a lot more coming, including some experimentations with Vagrant.

Now, an interesting aspect of Ansible, and one that makes it so simple, is that it comes by default with a tool-belt. Understand that it is shipped with a range of modules that add support for well known technologies: EC2, Rackspace, MySQL, PostgreSQL, rpm, apt,…. This now includes our Linode contribution. That means that with the latest version of Ansible you can spin off a new Linode box as easily as:

ansible all -m linode -a "name='my-linode-box' plan=1 datacenter=2 distribution=99 password='p@ssword' "

Doing this with Chef would probably mean chasing down a knife plugin for adding Linode support, and would simply require a full Chef stack (say hello to RabbitMQ, Solr, CouchDB and a gazillion smaller dependencies). Getting Ansible up and running is as easy as:

pip install ansible

Et voila! You gotta appreciate the simple things in life. Especially the life of a sysadmin.