Balaji Vajjala's Blog

A DevOps Blog from Trenches

Simple Zookeeper cluster

Sometimes I need to run ZooKeeper ensemble on my development box to test my application on the production-like environment. I found that recreating the whole ensemble from scratch is much faster than cleaning it up using ZooKeeper CLI tool. To automate this process I created a bash script which I want to share in this blog post. I hard-coded all the paths in the script using my regular conventions. You might need to change them to yours — it should be fairly straightforward.

Before you can use the script, you need to install ZooKeeper on your box. That’s what I did on my machine

$ cd /opt
$ sudo mkdir zookeeper
$ sudo chown -R andrey:admin zookeeper
$ cd zookeeper
$ wget http://apache.mirror.rafal.ca/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
$ tar xf zookeeper-3.4.5.tar.gz
$ rm zookeeper-3.4.5.tar.gz
$ ln -s zookeeper-3.4.5 zookeeper

In the end you should have a ZooKeeper installed in /opt/zookeeper/zookeeper directory.

Now download, chmod, and run the script. It will create the following files

/opt/zookeeper/zookeeper/cluster
├── server1
│   ├── conf
│   │   ├── log4j.properties
│   │   └── zoo.cfg
│   ├── data
│   │   └── myid
│   └── logs
├── server2
│   ├── conf
│   │   ├── log4j.properties
│   │   └── zoo.cfg
│   ├── data
│   │   └── myid
│   └── logs
├── server3
│   ├── conf
│   │   ├── log4j.properties
│   │   └── zoo.cfg
│   ├── data
│   │   └── myid
│   └── logs
├── start.sh
└── stop.sh

This is the minimum configuration for 3-node ensemble (cluster), which is recommended for production. To start the cluster, run the following command

$ cd /opt/zookeeper/zookeeper
$ cluster/start.sh

Check the log files to see if the cluster is successfully started

$ tail -f cluster/server{1,2,3}/logs/zookeeper.out

When the cluster is up and running, you can test your application. After you are done, shutdown the cluster using the following command

$ cluster/stop.sh
$ ps -ef | grep java

To recreate a clean cluster, just run the script again

$ ./zookeeper-init-ensemble.sh

Git Productivity enhancements

If I haven’t told you yet, Git is awesome and yes I do most og the work from the command-line, so in order to make my life easier I did two things:

1
2
apt-get install bash-completion
wget https://raw.github.com/git/git/master/contrib/completion/git-completion.bash -O /etc/bash_completion.d/git

Upon next login (or execute source /etc/bash_completion.d/git) right away and you have all the bash completion you need for git at your finger tips.

Another awesome script to make your life easier with git is git-prompt.sh which you can also include in your bash profile like so:

1
wget https://raw.github.com/git/git/master/contrib/completion/git-prompt.sh

then add a line to ~/.profile sourcing it upon login shell, see header of git-prompt.sh for more details.

How to run apt-get update before Puppet

One of the problems I keep running into with Puppet is that the packages I’m trying to install are very new and I need to run apt-get update to update the repositories. Because Puppet does not run scripts it gets rather annoying to get the update to run before you install any packages.

I found a lot of solutions around the web but this one seems to work best for me.

Ruby gems are still not safe to use

In the light of the recent Rubygems.org security compromise the community has been looking at ways to make Rubygems.org and Ruby gems in general more secure. The project is still ongoing and feel free to help them out on #rubygems on Freenode, but here is a highlight of what I think are some of the main issues.

Some of the issues highlighted here are taken from Ben Smith’s enlightening (but scary) talk at Aloha Ruby Conference.

Disclaimer

I am not a security expert. I am just a Ruby developer and a gem author that is worried about the current state of the Ruby gems ecosystem. I also am worried that the next negative news around Ruby will involve the problems described below.

What are Ruby gems and what is Rubygems.org?

For those not familiar, Rubygems.org is the most popular repository of “gems” for the Ruby language. Gems are libraries made up out of Ruby (and optionally C) code and can be uploaded by anyone who registered for an account. Rubygems.org currently hosts 50,685 gems which have been downloaded 1,259,533,358 times since July 2009. Ruby gems are not only hosted on Rubygems.org, anyone can run their own repository but Rubygems.org is definitely the most used one.

Current state

Some parts of the current infrastructure are worrying.

  • Hard to tell if gems were changed on the repo. It took the Rubygems.org volunteers more than 24 hours to verify every gem’s checksum against external mirrors.

  • Impossible to tell if gems were uploaded by gem owner. It is currently very hard to know if a gem was actually uploaded by it’s owner. Developer machines can be compromised and a users API credentials for Rubygems.org are kept unencrypted in ~/gem/credentials.

  • Gem owner isn’t notified of new gem uploads. When a gem developer’s credentials are compromised a new version of the gem can be uploaded without the gem owner ever knowing.

  • Impossible to notify a gem user of compromised gems. When a gem developer’s credentials are compromised it is hard to notify anyone who uses any of the gems published by the developer of the situation.

  • Gems can run code on install. This is probably the most interesting attack vector in the foreseeable future. It seems this feature was relatively unintentional as it involves tying into the fact that Ruby gems can contain C code. Running code on install will mean that gems can steal the unencrypted Rubygems.org credentials which can then be used to modify the compromised user’s gems and spread the malicious code further.

Proposals for change

  • Notify gem owners of newly published gems. Adding a simple email notification to the gem owner will at least allow for easier detection of compromised gems. Sadly at this point the gem is already compromised and possibly already spread over any mirrors and downloaded by users.

  • Secure developer’s Rubygems.org credentials. This is pretty simple. My ssh key has a passphrase on it and so should my Rubygems.org credentials. Stealing a rubygems.org API key is easy, using one that requires a passphrase a lot harder.

  • Stop running code on gem install. I totally see the need for having C extensions in a Ruby gems, and those extensions need to be compiled, but we seriously need to find a way to compile C code without allowing for the arbitrary execution of code on install of a gem.

  • Automatically mirror gems and checksums. A system involving the automatic mirroring of gems and their checksums to other servers would definitely have made the verification of gems a lot easier in the last few days.

  • Force signing of gems. Yes, you can sign your gems but almost nobody does (and neither do I). Additionally it’s a pain to force the usage of signed keys on the gem user’s side, not to mention the futility as most gems aren’t signed. Signing is the way to go though and work on this has started. It’s a difficult topic though and work is being done to make it as painless as possible for users and developers.

  • Notify gem users of unsigned/insecure gems. The rubygem binary (together with tools like Bundler) should be updated to allow for verification of signatures which will allow it to notify gem users of unsigned or compromised gems.

How can I help?

  • Code: rubygems, rubygems-trust(fork for implementing a signed approach)
  • Discussion: #rubygems and #rubygems-trust on Freenode

Did I miss anything?

Please let me know and I’ll add it to the list.

Installing Chef Server - On CnetOs 5.8

Following the Fuse day (#6) and the very poor documentation and the amount of bugs found in the Chef Solo cookbooks for the Chef OSS server, I put together a set of script which will attempt to clear all the clutter around installing a Chef OSS server.

There is a Git repository on git hub which will install Chef Server on CentOS 5.8 & 6 and I will be adding support for Ubuntu in the near future (its in the works), there is no magic here just a fair amount of trial and error which led me to automate it – it just was too much time to do manually over and over again …

During my attempt I was planning on using Chef-Solo to do the work based on this wiki page but there were so many bugs in it which led me to user rble repository.

Some notes on Puppet

I’ve been playing with Puppet recently in order to finally teach myself a bit about server management. I decided for Puppet because… well… I didn’t have time to play with Chef yet.

I can’t show any of my code because it contains some stuff I’d rather not open up, but here are some of my global notes on Puppet that I wanted to share.

The good

  • It did the job. I now have a few scripts that I can use to quickly setup a server for Rails, including NGINX, PostgreSQL, Unicorn, Monit, and much more.
  • Quick deployment. I can now deploy a new Rails app to a server within minutes!
  • The Learning Puppet series is a good starting point and explains most of the basics
  • Low tech. Running a puppet script really doesn’t involve much more than running: puppet apply path/to/puppet/file.pp

The bad

  • No single server deployment solution. There doesn’t seem to be a best practice on how to use puppet with just one server. I know that the serious people will have to manage many many servers, but I think that they could make Puppet more accessible to newcomers by having a good solid solution for their own server. Many of us learn new things by trying them out for our own hobby projects before using them in big-business contexts. I have resorted to using Capistrano for deployment, but it just feels wrong somehow.
  • Not many great modules. Puppet has a modules system which allows anyone to package their solutions and share them with the community. Sadly most of the modules are old, unmaintained, and often broken. Additionally the modules often don’t solve the problems in a way that I’d like them to, forcing me to write my own.
  • Convoluted language. Puppet requires Ruby to run, but the DSL is not Ruby, nor is it Javascript, or JSON,or YAML, or anything else that so many developers already know. The architecture for defining classes, types, and modules is convoluted, backwards, and feels very awkward. I think one of the reasons why there aren’t many well written modules is very much related to this.
  • Compiling from source. One of the biggest missing features seems to be some good architecture for installing anything that isn’t packaged up. I often want to run a different Ruby, Nginx, Apache, PHP version than is in the package repositories. I know this is a hard problem, but again I wish there was some kind of best practice.
  • Ordering from hell. Puppet does not run your actions in order as specified in your .pp file. Instead you can tell it if something has a requirement. In my experience almost everything has a requirement and specifying the orders is becoming a nightmare and a real frustration.
  • Missing features. There are a few features that are still missing. One of the most important ones (in my eyes) is the ability to generate a folder recursively (e.g. mkdir -p path/with/multiple/folders). Instead you are now forced to create every layer as a new statement.

Conclusion

Puppet will do for now, but I wish it was a bit more opinionated in how it thought it should be used. The language is not pretty and very verbose, and its lack of best practices for single server deployment is a real omission.

Does anyone know how Chef performs in these regards?

RavenDB NuGet Review

Ayende has been doing a series of posts about how simple and fast RavenDB is, using NuGet as an example and pointing out some complex, poorly performing queries that have been bogging down nuget.org recently.

Series 1

  1. NuGet Perf Problems, part I
  2. Nuget Perf Problem, Part II-Importing To RavenDB
  3. NuGet Perf, Part III-Displaying the Packages page
  4. NuGet Perf, Part IV-Modeling the packages
  5. NugGet Perf, Part V-Searching Packages
  6. NuGet Perf, Part VI AKA how to be the most popular dev around
  7. NuGet Perf, Part VII AKA getting results is only half the work
  8. NuGet Perf, Part VIII: Correcting a mistake and doing aggregations

Series 2

  1. NuGet Perf, The Final Part – Load Testing – Setup
  2. NuGet Perf, The Final Part – Load Testing – The Tests
  3. NuGet Perf, The Final Part – Load Testing – The Results
  4. NuGet Perf, The Final Part – Load Testing – Results ^ 2
  5. NuGet Perf, The Final Part – Load Testing – Source Code

This series piqued my interest given that I’ve been working on Lucene.Net.Linq and integrating it with NuGet.Server. I haven’t worked directly with RavenDB, but I have used some products like Octopus Deploy which uses RavenDB. It seems like a pretty cool product and I don’t have anything against it. However, there are some problems with the blog series that might lead the casual reader to be misled about what has been accomplished, especially with regards to declarations of victory and some misleading performance numbers.

I know the blog series is only meant to show off RavenDB and give potential users an idea of what their code might look like, but nonetheless some hefty claims are made about performance in comparison to the Entity Framework / SQL Server solution in use at nuget.org. Let’s take a look.

Pick a Schema, Any Schema Will Do

Of course if you’re going to model data in a new persistence layer, it makes perfect sense to simplify or improve the way your data is stored and indexed for later retrieval. Ayende points out some problems with the dense, non-semantic way some fields are stored in nuget, such as dependency versions and tags. He uses these as examples that illustrate how RavenDB can handle nested collections of strings and complex objects.

That’s great and all, but if we’re redesigning the schema, we’ve already changed the rules. Nuget.org is having some trouble specifically because of a poorly designed schema that requires too many joins and some other complex hoop-jumping to execute some frequently used queries.

Furthermore, regardless of how the data is stored behind the server, NuGet uses an OData API to expose its packages to clients. If you change the schema, you have to do it in a way that remains compatible with that client API, or the millions of installed instances suddenly stop working.

Semantic Version Sort vs. Lexical Sort

One of the primary ways that packages are sorted in NuGet is by version. Unfortunately, the complex nature of semantic versions, such as 0.9-alpha or 1.0.5, makes it impossible to sort them correctly using a basic lexical sort. For example, 1.0-alpha should get sorted before 1.0, and 1.2 should get sorted before 1.10. With lexical sort they do not.

This problem was pointed out in a comment on Part IV but left unresolved.

Review

In using NuGet as an example for RavenDB, the following features are not addressed:

  1. Providing a backwards-compatible OData endpoint
  2. Sorting correctly by package version
  3. Storing and retrieving package contents (only package metadata is addressed)
  4. Adding new packages into the index
  5. Keeping track of download counters (per-package and total)

Without addressing adding new packages and tracking downloads, the data store is effectively read-only, meaning that caches never get invalidated and there are no writes to disk. Based on this criteria, is it really fair to compare performance benchmarks between this sample and a real, production system?

The Load Tests

In the next part of the series, the sample system is put under some load to see what kind of performance RavenDB can provide when there are concurrent users. The Tests goes into some detail about the load testing plan.

Reviewing the load test, we have another obvious problem: only a handful of sample queries are being used. When the number of search queries is low, say, under 10, the system under load can simply cache the results from the previous time the query was executed and return that to the user. Since there isn’t any variance in which columns are used to sort, what page is retrieved, etc., the system under load doesn’t have to think very hard at all.

Combine the simplicity of the test plan with the fact that the system under test is not doing any writes at all, and you might as well be slinging static html at that point. I guess it does validate, if nothing else, that the caching built into RavenDB works.

One indicator that the load test isn’t really hitting any critical thresholds is that Pages/Sec grows pretty linearly as User Load ramps up. If the system hit such a threshold, one would expect Pages/Sec to peak at a certain rate and (ideally) sustain that rate as concurrency continues to grow.

Summary

To reiterate, I’m not trying to bash RavenDB, which I’m sure is an awesome product and probably really would perform very effectively under load similar to that experienced by nuget.org. However, casual observers may come away with unrealistic expectations after reading the performance results offered up on ayende.com.

Why Do I Care

If Ayende had picked virtually any example other than NuGet packages to do a blog series on, I probably would have read along and not thought much about any of it. But having worked on NuGet.Server + Lucene.Net.Linq for several months earlier this year, the subject matter is near and dear to my heart.

In future posts I’ll get into some more compare and contrast on RavenDB vs. Lucene.Net.Linq and maybe do a load test of my own of our custom NuGet.Server builds.

How Angie’s List Uses Octopus Deploy

I’ve mentioned a few times that my team has been using Octopus Deploy in a few of my posts. Now I’ll describe in more detail some of the ways we’ve integrated with Octopus that may help others.

The Before Times

Octopus started a beta phase in late 2011, so what were we doing before? Since I started working in an asp.net shop in 2006, I’ve been surprised by the lack of robust deployment tools for the ecosystem.

The Web Deployment Projects system provided by Microsoft has always seemed like a non-starter for me. When you have mature processes like version control, continuous integration, automated testing, and quality assurance, how does it make sense to hand the keys over to a developer running Visual Studio to click “Deploy” on their desktop? What guarantee do you have that the code has been checked in, that it builds, that the tests pass, that QA signed off?

Even if you only give trusted users permission to deploy, I’ve never understood how it makes sense that you would need Visual Studio to deploy your projects. I mean, Visual Studio is for development. It isn’t even needed by a build server to compile your code. But now QA (or whoever is allowed to perform deployments) needs to have Visual Studio installed somewhere and checkout the code and build it themselves? So much for automation. So much for predictability. So much for repeatability.

So it was with a lack of viable alternatives that I started writing a deployment tool in 2007. The tool was named Bazooka (because it “shoots” software onto servers), and we used it for 5 years. We never released it as open source because it made several assumptions particular to us and we didn’t take the time to clean it up.

Bazooka was a web application that watched a directory for release candidates to show up as they were prepared by our build server. Each release candidate contained a deployment descriptor that had some meta data and a list of components that could be deployed to servers that matched certain roles like WebServer or AppServer.

Bazooka used xml configuration for everything. The deployment descriptor was xml, the server and environment configuration was xml, the permissions were xml, and the log files that recorded what had been deployed where were also xml. This made Bazooka pretty slow as deployment logs piled up, but it beat the heck out of doing anything more heavy-weight with sql.

Bazooka was, in reality, just a web interface to MSBuild. When you went through the deployment wizard, Bazooka simply executed MSBuild in the directory specified by each component with some properties saying which server to deploy to, which target to execute, what the environment was, etc.

We could have easily kept improving Bazooka, but we had actual code to write for our business, so Bazooka had some weaknesses that we never really addressed. For example, managing servers and environments was done by hand-editing the xml files. We didn’t build a UI for that. Also, it was pretty tough to answer simple questions like, “What was the most recent deployment of my project to the staging environment?”

But it worked as a fine stand-in for 5 years until someone built something better.

Octopus Trial

It was fun playing with Octopus during the beta period and seeing another approach to a web based deployment tool. In the beginning there were plenty of things Bazooka had that Octopus didn’t (yet), but development has been pretty rapid with Octopus delivering frequent releases.

The one time where we asked if using Octopus was right was when Paul Stovell, the creator/developer of Octopus Deploy, took a few months to rewrite the persistence layer, converting it from Entity Framework to RavenDB. It was the right decision to make and we’re happy in the long run, but we found ourselves in a lurch waiting for some important features and bugfixes. We came out fine on the other side though.

We also found that using a local file-based NuGet feed with Octopus doesn’t scale very well, and switching to NuGet Server provided no benefit either. This inspired us to create a custom fork of NuGet Server that uses Lucene.Net and Lucene.Net.Linq to provide a scalable, lightning fast internal feed for Octopus.

The Switch

As we integrated some pilot projects with Octopus, we slowly stopped using Bazooka and eventually turned off Bazooka integration. Some quick stats of our Octopus configuration today:

  • 48 projects
  • 18,080 release candidates
  • 1,405 deployments

Octopus has done a decent job managing our high demands. We have experienced some slow page loads here and there, but Paul has been very responsive about troubleshooting and optimizing these.

Reusing Deployment Scripts

Since we already had a highly automated deployment system, we wanted to preserve our exising capabilities while finding ways to improve the system.

One problem that Octopus doesn’t solve for you is how to share deployment scripts across projects. By default, Octopus will execute deployment scripts contained in each project, but there isn’t a built-in or standard way to reuse common functionality.

One of the first projects we set up in Octopus we named OctopusScripts. This project consists of a collection of PowerShell modules that we want to be available everywhere. When we deploy the project, the deployment scripts install the modules into a standard location where PowerShell will probe for them. Then from other projects, we can simply start a script with:

1
Import-Module SmokeTest

Reuse Moar

Moving most of our scripts into PowerShell modules was working great, but we started to notice that our Deploy.ps1 scripts still looked awfully repetitious.

All of our web projects follow the same basic deployment recipe:

  1. PreDeploy
    1. Disable machine in load balancer
    2. Create IIS site and app pool definitions if missing
    3. Update site host-header bindings as needed
  2. Let Octopus update the document root to point to the new application
  3. PostDeploy
    1. Execute smoke tests against the server and abort if any URLs return a non-200 response
    2. Enable machine in load balancer

Of course we have different configurations and different URLs to use for smoke tests for each project. We ended up creating a configuration based, modular script collection so that each project simply needs to include a stub:

1
2
Import-Module Fool-Octopus
Invoke-OctopusDeploymentTasks

The Invoke-OctopusDeploymentTasks function looks at whatever variable and configuration are present and figure out which steps to execute. The same scripts are used for Windows Service type projects and others too, and they figure out based on conventions if they need to run web steps, create services, etc.

If the stub is missing from a project (because why duplicate the stub?) our build scripts automatically insert a stub Deploy.ps1, PreDeploy.ps1 and PostDeploy.ps1.

We think we’re about as DRY as we can get with regards to our deployment scripts.

Ad-hoc PowerShell

One thing we’d like to see that hasn’t made it into Octopus yet is the concept of ad-hoc powershell scripts. Basically, we want to be able to run some arbitrary scripts during a deployment, only once. It doesn’t need to run once on each machine being deployed to, just once and then done. There’s a story card on Paul’s Trello board that we’re looking forward to. In the mean time, we’ve been emulating this behavior by deploying a small package to a dummy server and letting the script run there.

We mostly want this feature to simplify tasks such as sending email/other notifications when deployments are beginning or completed. It might also be useful for a green/blue style deployment model where the load balancer needs to toggle just once after the servers have been updated.

Without the Ad-hoc feature, one of the stumbling points we run into is sometimes forgetting to check the “Force redeployment” checkbox that Octopus leaves unchecked. When we forget, some steps get skipped leading to confusing results.

Looking Ahead

Because of the level of automation we integrated with Octopus, our business is able to deploy software more frequently and more reliably than ever before. In the coming months and years, we look forward to seeing improvements in Octopus features that will help us with cloud deployments to AWS or Azure. Octopus has definitely filled a gap in our deployment capabilities, allowing us to deliver value to our business quickly, iteratively and predictably.

How Angie’s List Uses NuGet (Part 2)

Last time I talked about how my development team progressed from having all of our .net code in a single repository with a single solution to using a more modular architecture complete with encapsulated domains.

When we started using this appraoch, we were still limited in a few ways:

  • Everyone needs to integrate with the newest code
  • Difficult to patch an old version of a dependency
  • Cascading failures on the build server

Even though we broke the ProjectReference rats nest, we still had an implicit dependency on various shared code. It all had to be checked out and built in the right order.

Binary Package Management

The next logical step was to further decouple our shared code by packaging it up and publishing those packages. If we could do that, we could decide when to upgrade dependencies on a product by product basis.

There are two package managers in the .net ecosystem: OpenWrap and NuGet.

When we started shopping around, OpenWrap had been around longer and seemed to be a better choice. There’s a comparison of the products on stackoverflow.

We worked with OpenWrap for over 6 months and during that time started to find some problems around integration with Visual Studio and ReSharper. OpenWrap wants to manage dependencies per solution, and we have many cases where we want to control dependencies at a per project level. We also started to notice that NuGet was getting new versions released on a fairly regular schedule, while OpenWrap 2.0 was in unstable beta limbo for over a year.

Around the same time we started playing with Octopus Deploy for deploying our code. Since Octopus uses NuGet packages for deployment, we figured it would make sense to standardize on one package management system for both deployments and dependency management. It’s true that those are separate problem spaces, but having less build scripts is always a good thing.

Thoughts on NuGet

Conventions

NuGet has several conventions that make it easy to create simple packages that others can reference. You can share assemblies and content easily, and when you want to customize anything there are some powershell extension points you can hook into.

One problem we run into is that when building packages, sometimes there’s a NuGet convention we want to customize or suppress, and often we can’t.

For example, if you create a nuspec and place it adjacent to a csproj file, NuGet will look at the project and automatically inject metadata and content into the package. For some things, you can override this behavior with explicit specifications in the nuspec, but the behavior can be surprising and confusing.

Dependency Scoping

NuGet supports the concept of transitive dependencies… sort of. If you install package A, and A depends on package B, NuGet will go find a version of B and install it while installing A. However, NuGet doesn’t do any record keeping to remember that B is a transitive dependency. To your project, A and B appear simply as direct dependencies.

There may be cases where A depends on B at runtime, but consumers of A shouldn’t need to code against B at design time.

There may be other cases where B is an optional dependency for A, and A can be used without it.

Since NuGet doesn’t have a concept of scope, it only has one simplistic approach to dealing with transitive dependencies: treat them just like direct dependencies.

Upgrade Behavior

When you ask NuGet to update a specific package, it will first look for updates to transitive dependencies that the package depends on. This may seem obvious or desirable to some, but personally I find it confusing. You can control this behavior with the -IgnoreDependencies flag in the Package Management Console, but oddly you don’t get that option in the command line nuget.exe or from the Visual Studio GUI Package Manager.

Package Feed Performance

We use continuous integration, and every successful build produces “release candidate” versions of packages. We generate 50 to 100 packages a day.

Using the simple NuGet UNC share quickly failed to scale, so next we tried NuGet.Server and found that it doesn’t perform well either.

NuGet Gallery seemed like overkill with its SQL Server requirement, so I started optimizing NuGet.Server. This project ended up taking quite a while, but the good news is that the fruits of the labor are now open source on GitHub at https://github.com/themotleyfool/NuGet.

For more information about that project, see my previous post.

Refactoring Applications and Shared Code

We try to use Semantic Versioning to communicate breaking changes in the packages we publish, so sometimes when we want to use a refactoring tool like Change Method Signature or Use Base Class it would be nice to have application and shared code loaded into a single instance of Visual Studio.

We created a tool called SlimJim that generates these Solution files on the fly.

If you create a Solution with application code and shared library code, ReSharper will be smart enough to apply refactoring tools across the projects even though ProjectReference style references are not being used.

However, Visual Studio won’t know the correct order to build projects in, and won’t automatically copy outputs from shared libraries over to applications.

We extended SlimJim to convert assembly references to project references and back to address this limitation.

Conclusion

In terms of capability and maturity, we’re in a much better place than we were a few years ago. However, we still have a ways to go in terms of productivity and workflow.

NuGet has helped us move in the right direction and we hope to see further enhancements and even contribute some more of our own as we develop them.

How Angie’s List uses NuGet (part 1)

This post describes how we came to using binary package management. In the next part I’ll get into NuGet.

In the beginning, there was one repository and it held all the projects for The Motley Fool, and it was good. There were around a dozen asp.net web projects, a smattering of service and console apps, and a bunch of class libraries to hold shared code. There was one Solution (sln) to rule them all.

As time went on, we found that there are downsides to the one-giant-solution approach to .net development:

  • Big Ball of Mud
  • Slow builds
  • Tight coupling
  • Configuration hell
  • Hard to release different applications on different schedules

Typically our larger applications would be split into several projects following a typical N-tier layered architecture:

  • Web
  • Service
  • Domain
  • Data Access

Despite our attempts to encapsulate data access and domain logic behind the service project, code ended up leaking out to the point where domain projects were using types and methods from unrelated domain projects. Cats and dogs were sleeping together.

Around this time Steven Bohlen presented a talk to the Washington DC Alt.NET User Group titled “Domain Driven Design Implementation Patterns in .NET”. While some of us were already familiar with concepts of DDD, this talk lit a spark for us to try fixing our big ball of mud.

In late 2010 we started to make some changes. Instead of having one giant repository, shared code would be split out into separate repositories. We also took this opportunity to introduce a new project organization and architecture.

We established one repository to hold utility code, broken into specific class libraries:

  • Fool.Abstractions – similar in spirit to System.Web.Abstractions; adds interfaces and wrappers to various FCL types that lack them
  • Fool.Lang – similar in spirit to Jakarta Commons Lang; adds general utility classes and methods not found elsewhere
  • Other projects that extend 3rd party class libraries to make them easier for us to work with in standardized ways.

Then we established another repository to hold Domain Driven, er, Domains. For example, many of our applications and web sites deal with stock market data, so one of our business domains is Quotes. In the Quotes Domain we have these projects:

  • Fool.Quotes – contains service interfaces and value types; serves as an API to the domain
  • Fool.Quotes.Core – contains domain logic, models, and entities; serves as a private implementation
  • Fool.Quotes.Web.Api – exposes Fool.Quotes interfaces over a RESTful web API

The key to keeping our domains distinct and decoupled is to keep Core projects private. While Core is required at runtime, it should never be referenced at compile time. To bridge the gap, we use Dependency Injection to provide concrete implementations.

Domains may depend on other domains provided that they consume each other through the API project. That way entities and business logic are kept focused on their own concerns and don’t leak out to other problem areas where they don’t fit.

Gluing It Together

Having projects split into different repositories and different solutions meant that we couldn’t simply have one mega Solution that includes everything. That’s by design, so good on us. But this introduces a problem in that we still need to reference code from our utility projects and DDD projects in our applications. The first solution we came up with to handle this problem was to use the AssemblyFolders registry to have our libraries appear in the Add Reference dialog. Then to solve the runtime dependency on our private Core assemblies, we install those to the GAC so they can be loaded using reflection by our IoC container.

This approach worked fine, mostly. But we encountered some downsides after using it for a while:

  • Need to have all library code checked out and built on each development machine
  • No built-in way to manage different versions of the same dependency
  • GAC considered harmful
  • Hard to debug build errors and runtime errors

Using Continuous Integration means we’re producing new builds dozens of times a day, so it isn’t practical for us to manage different assembly versions for each build. Like most shops, we leave our assembly versions at 1.0.0.0 despite injecting actual version information into the AssemblyInformationalVersion attribute.

In order to support parallel development, we needed to find a more flexible way of managing dependencies, and at this point we started to look at binary package management.