20 Minute Dev Cycles

Here at Nordstrom, I develop and operate a web service delivering product information for 16 million items.

Our service is horizontally scalable and able to support millions of requests daily from each of our 280+ stores.

Yesterday, a user of our service required a change to our API for a demo they were giving in 20 minutes. Specifically, they needed our API to respond to EAN-13 codes as well as UPC-A. 12-digit UPC-A codes can be transformed into EAN-13 codes by simply prefixing the code with a zero. So, they wanted our API to return valid product information whether they submitted a UPC-12 number or an EAN-13 number.

Here’s how we coded the change to our service and canary deployed it to our 5-server cluster with a few minutes to spare.

Coding the Change

The first 8 minutes were spent coding the requested feature. My coworker and I (not quite a pizza-box team) are fans of microservice architecture and keep our service code as small and single-minded as possible. This means we can make any changes we’d like to the service as long as the existing internal and external contracts are kept. It also means fewer mistakes are made as each service is simple and easy to reason about. This service, for example is a mere 218 lines of Go code. The function implementing our endpoint is in fact 15 lines long. We needed to add 5 more (well-spaced) lines to implement the feature.

Compiling, Integrating and Packaging

We spent less than one minute compiling and packaging a new artifact for deployment. Why did it take so little time? Because we use Docker (introduced in 2013) and Make (introduced in 1977). With a single command, make release, our new code is cross-compiled into the desired architecture (linux/amd64), added to a new Docker image and pushed to our Docker repository.

executable=ic-api-gtin
account=nordstrom
tag=ic-api-gtin

build/$(executable): *.go
	mkdir -p build
	go build -o build/$(executable)

build/container: dist/$(executable)
	docker build --no-cache -t $(executable) .
	mkdir -p build
	touch build/container

dist/$(executable): *.go
	mkdir -p dist
	GOOS=linux GOARCH=amd64 go build -o dist/$(executable)

.PHONY: release
release: build/container
	docker tag -f $(tag) $(account)/$(tag)
	docker push $(account)/$(tag)

The entire Docker configuration is solely concerned with pointing to the compiled binary for the service. It simply adds the binary to one of our maintained Ubuntu base images and sets it as the Docker image’s entry point.

FROM nordstrom/baseimage-ubuntu:14.04
MAINTAINER Paul Payne "paul@payne.io"

ADD dist/ic-api-gtin /bin/ic-api-gtin
ENTRYPOINT ["/bin/ic-api-gtin"]

Go compiles lightning-fast. Docker builds lightning-fast. Make keeps us from wasting time remembering commands and fat-fingering the deploy (especially in high-pressure situations). The only real time needed for this step is to wait for the (automated) copying of our Docker image up to our Docker repo.

Deploy into Production

We then deploy to production.

Wait…​ what about the Testing Environment™? What about the Integration Environment™? What about Jenkins? Well, Go’s tests run in sub-second time so we just run gofmt, go test, go test -cover automatically every time we save our file. More importantly, though, a micro-service approach means our changes are isolated from the rest of the system. Our code’s effects on the total system are well understood. Our code can be deployed independently. Static compilation means we are confident the program has everything it needs. Wrapping our service in a Docker image means the environment is completely controlled. The lack of mystery of the whole process may leave some people a bit bored.

The actual deploy to our production cluster takes the most time. But only because we canary deploy over a few minutes to give us the ability to roll back if needed. We use Fleet on CoreOS, so deploying a service to one of our cluster machines is as simple as restarting the service.

fleetctl stop ic-api-gtin@1.service && fleetctl start ic-api-gtin@1.service

When starting, our service automatically pulls down the latest Docker image and starts it up.

ic-api-gtin@.service

[Unit]
Description=ic-api-gtin daemon

[Service]
Type=simple
Restart=always
RestartSec=30
PermissionsStartOnly=true
ReadOnlyDirectories=/etc
EnvironmentFile=/etc/environment
Environment="RELEASE=latest"

TimeoutStartSec=5m
ExecStartPre=-/usr/bin/docker kill ic-api-gtin-%i
ExecStartPre=-/usr/bin/docker rm ic-api-gtin-%i
ExecStartPre=/usr/bin/docker pull nordstrom/ic-api-gtin:${RELEASE}

ExecStart=/usr/bin/docker run \
  --name ic-api-gtin-%i \
  --hostname ic-api-gtin-%i.cluster.local \
  --publish 19111:80 \
    nordstrom/ic-api-gtin:${RELEASE}

Since we started this service on one of five nodes behind a round-robin load balancer, a fraction (20%) of our users will start receiving responses from our new service immediately. We can also send a request to a specific node to make sure it looks good. If anything looks bad either in sniff-testing, on the metrics dashboard, or in the logs, we can easily drop the node out of rotation and make any adjustments.

In this case, when looking at our canary, we decided it would be a good time to introduce a related feature. We spent a couple more minutes repeating the above steps again for the added feature--we had the time.

Everything looks good? We just roll through our other load-balanced nodes.

20 minutes from feature request to production deployment. Zero downtime.


2015 Paul Payne