Node.js for the Real World
Here at mobile.de we’ve been busy refactoring monolithic legacy applications into a landscape of microservice-oriented architectures. As we believe in concepts like the single responsibility principle and separation of concerns, we think Node.js as back-end for the user-interface layer is a perfect usecase. Today I want to share some of our recent learnings and approaches running Node.js in production, specifically when developing the new mobile.de homepage. Quite a few points are more general and might also apply to other platforms.
The Real World, for Real!
A good starting point to prepare your product for the release might be to figure out how your specific real world actually looks like. More precisely this means to evaluate the requirements of your production environment. In contrast to development systems, where speed and ease of development is important, values to consider in production are ease of deployment, reliability, observability and performance. Where does your application sit within your infrastructure? How does this affect your app? How does it get there? What traffic is expected? How does your app perform during traffic bursts? Do you want to use Node.js internal tools like clustering for scaling? How to achieve zero-downtime deployments? Do you have internal guidelines to follow? This is just an excerpt of questions we had to answer in close collaboration with our great site operation engineers.
The Way to Heaven
At mobile.de, we use our own tool called “Autodeploy“ to deploy and activate software artifacts. Autodeploy has a database which serves as an inventory of applications and their mapping to individual hosts. It is able to deploy to any environment and can be used with different platforms.
As a continuous integration server we’re happy to use the Open Source project Jenkins which Autodeploy is seamlessly integrated into. Jenkins takes care about our build as soon as we push code to Github. Among others the steps include:
- Installing dependencies via npm and bower
- Linting (eslint)
- Running Tests (Mocha)
- Code coverage analysis (Istanbul)
- Code quality analysis (Sonar)
- Packaging source files
When Jenkins successfully did its job the package is queued for deployment. Deployments that stem from feature - or development branches are automatically deployed to their postproduction/staging host. For production deployments, we decided to not automatically deploy but to explicitly trigger the deployment. Currently this can happen via commandline or a button showing up in the Autodeploy UI. There is a decent article describing how we use Jenkins in more depth.
We bootstrap environments by heavily using configuration stored in environment variables. We’re using dotenv to populate ENV via a .env file. This way we make sure each deployment environment gets its app version delivered properly configured.
Drop it when it’s hot
As part of reliability we had to think about what happens in case of our app would crash – intended or not intended. In particualar the usual way in Node.js to recover from programmer errors (bugs) is to let the app crash as you can’t do anything about them anyways. Have you tried turning it off and on again? It’s the fastest and most reliable way to restore app state in those cases. The important thing is to log and monitor such restarts to fix the reasons behind them as soon as possible. To kill these birds with one stone, we found the process manager PM2 to come in handy.
It allows to keep our app alive, as it automatically restarts app instances in the event of a crash. We also use it to start and stop the app when it’s been deployed. The tool even provides some basic monitoring and logging features which helps to gather app stdout – useful especially for dependencies that write logs to process.stdout, which our own logger does not capture.
Recently we had some problems with multiple PM2 god daemons running on one host after deploying a new version of our app. This led to shadow apps serving / referencing old content that was not available anymore at this point in time, resulting in 404s. Luckily, we had our logger module running that let us discover this issue quickly. Being observable for production running apps should be one of the top priorities and is of the utmost importance. We use Logstash internally to centralize, aggregate, parse and filter log files.
As our Logstash configuration embraces JSON as log-file format, we’re using Bunyan as a base for our logger module. Its output is per default line delimited JSON, which makes it easy to consume. It’s build around streams and you can define multiple output streams at different log levels. When debugging your app on your local machine, you don’t want to log to a file, but want to see output as fast as possible. On the other side, when running in production you don’t want to necessarily log debug level information to logstash. This is totally possible by using Bunyan. Here at mobile.de, each app we build needs to conform to certain guidelines. For logging, this includes for instance adding informations like build_timestamp, app_revision or log_level. For production usage, we wrote a bunyan-logstash transform stream that would add these fields at runtime and pipe the output to a file. For local development, we use a bunyan-debug transform stream that pipes all log levels to stdout. We are currently experimenting with this setup and constantly trying to improve this. For visualizing logs we use Kibana as a dashboard. This instantly lets us discover errors and unexpected issues.
Watch Your Health
One of our app requirements is to have proper monitoring set up. What does monitoring mean? Actually it’s about collecting numeric time-series data. For mobile.de this includes asynchronous forwarding of metrics to an aggregator (push style) and also providing various endpoints to verify application health (pull style). This helps for monitoring but also implementing reactive behaviour in a microservice landscape. Not suffering from NIH syndrome, we had a look into multiple open source and commercial products that would tackle this problem, properly providing a solution for our all new mobile.de homepage running on Node.js. As most of our apps use Graphite as a real-time graphing system, we also wanted to make use of it. We wanted to collect some default system metrics of the host, like usage of CPU, memory or garbage collection. In addition the module also should provide a possibility to get HTTP information about incoming and outgoing connections. Unfortunately we couldn’t find anything out there that would encapsulate and fit our needs. On top of node-measured and node-graphite, we built our own node-metrics module. It currently offers the following features:
- Gathering vm related metrics
- CPU usage
- memory consumption
- GC stats
- Custom metrics creation
- Middlewares for (semi) automated metrics collection
- Timers for all routes inbound.routes.[route]
- Meters for all status codes inbound.statuses.[statusCode]
- HTTP server middleware
- Express middleware
Option of periodically reporting to graphite
The module so far does a solid job and there are plans to open source it.
For visualizing collected metrics we use Grafana as a dashboard.
To verify our application could handle the expected traffic easily, we ran various load tests before launching. Everything worked well until we reached a certain amount of concurrent users. The application would then respond with a 500 on every second request. With monitoring and logging in place we figured that the error was caused by engine-munger, a component of our rendering strategy. We decided to simplify our view/template implementation by throwing away the confusing construct of Dust, adaro and engine-munger. This instantly boosted performance and made our tests go green. Without our app being observable, this crucial incident would have been deployed to production. Before deciding on a dependency we have to make sure to fully understand it and evaluate if it’s really necessary.
Real World Facts
On Twitter: @jonykrause
Node.js and ES6 Instead of Java – A War Story: article series by my colleage Patrick Hund about creating the Node.js based new mobile.de home page
Taming the Hydra: article series by my colleage Marc Günther about maintaining our Jenkins CI systems
"Node.js Red Pill": cartoon by Patrick Hund