Asynchronicity in Node.js

One of the biggest advantages of using Node.js over Python or Ruby is that Node has a non-blocking I/O mechanism. To illustrate this let me use an example of a line in a Starbucks coffeeshop. Let’s pretend that each person standing in line for a drink is a task, and everything behind the counter — cashier, register, barista — is a server or server application. When we order a cup of regular drip coffee, like Pike, or hot tea, like Earl Grey, the barista makes it. While the whole line waits while that drink is made, and the person is charged the appropriate amount…

Non-Blocking I/O

One of the biggest advantages of using Node.js over Python or Ruby is that Node has a non-blocking I/O mechanism. To illustrate this, let me use an example of a line in a Starbucks coffee shop. Let’s pretend that each person standing in line for a drink is a task, and everything behind the counter — cashier, register, barista — is a server or server application. When we order a cup of regular drip coffee, like Pike, or hot tea, like Earl Grey, the barista makes it. The whole line waits while that drink is made, and the person is charged the appropriate amount.

Asynchronicity in Node.js
Asynchronicity in Node.js

Of course, we know that these kinds of drinks are easy to make; just pour the liquid and it’s done. But what about those fancy choco-mocha-frappe-latte-soy-decafs? What if everybody in line decides to order these time-consuming drinks? The line will be held up by each order, and it will grow longer and longer. The manager of the coffee shop will have to add more registers and put more baristas to work (or even stand behind the register him/herself). This is not good, right? But this is how virtually all server-side technologies work, except Node. Node is like a real Starbucks. When you order something, the barista yells the order to the other employee, and you leave the register. Another person gives their order while you wait for your state-of-the-art eye-opener in a paper cup. The line moves, the processes are executed asynchronously and without blocking the queue by waiting.

This is why Node.js blows everything else away (except maybe low-level C/C++) in terms of performance and scalability. With Node, you just don’t need that many CPUs and servers to handle the load.

Asynchronous Way of Coding

Asynchronicity requires a different way of thinking for programmers familiar with Python, PHP, C or Ruby. It’s easy to introduce a bug unintentionally by forgetting to end the execution of the code with a proper return expression.

Here is a simple example illustrating this scenario:

var test = function (callback) {
  return callback();  
  console.log('test') //shouldn't be printed

var test2 = function(callback){
  console.log('test2') //printed 3rd

  console.log('callback1') //printed first
  console.log('callback2') //printed 2nd

If we don’t use return callback() and just use callback() our string test2 will be printed (test is not printed).


For fun I’ve added a setTimeout() delay for the callback2 string, and now the order has changed:

var test = function (callback) {
  return callback();  
  console.log('test') //shouldn't be printed

var test2 = function(callback){
  console.log('test2') //printed 2nd

  console.log('callback1') //printed first
      console.log('callback2') //printed 3rd



The last example illustrates that the two functions are independent of each other and run in parallel. The faster function will finish sooner than the slower one. Going back to our Starbucks examples, you might get your drink faster than the other person who was in front of you in the line. Better for people, and better for programs! :-)

Decreasing 64-bit Tweet ID in JavaScript

JavaScript is only able to handle integers up to 53-bit in size, here is a script to decrease tweet ID which is a 64-bit number in JavaScript without libraries or recursion, to use with max_id or since_id in Twitter API

As some of you might know, JavaScript is only able to handle integers up to 53-bit in size. This post, Working with large integers in JavaScript (which is a part of Numbers series) does a great job at explaining general concepts on dealing with large numbers in JS.

64-bit Tweet ID is "rounded" in JS
64-bit Tweet ID is “rounded” in JS

I had to do some research on the topic when I was re-writing some JavaScript code responsible for handling Twitter search in Storify editor: we had tweet duplicates in results! In this article, Working with Timelines, Twitter official documentation says:

Environments where a Tweet ID cannot be represented as an integer with 64 bits of precision (such as JavaScript) should skip this step.

So true, because id and id_str fields in a Twitter API response were different. Apparently, JavaScript engine just “rounds” inappropriately large numbers. :-( The task was complicated by the fact that I needed to subtract 1 from the last tweet’s ID to prevent its reappearance in a second search response. After the subtraction I could have easily passed the value to max_id parameter of Twitter API.

I’ve come across different solutions, but decided to write my own function which is simple to understand and not heavy on resources. Here is a script to decrease tweet ID which is a 64-bit number in JavaScript without libraries or recursion, to use with max_id or since_id in Twitter API:

function decStrNum (n) {
    n = n.toString();
    var result=n;
    var i=n.length-1;
    while (i>-1) {
      if (n[i]==="0") {
        i --;
      else {
        return result;
    return result;

To check if it works, you can run these logs:


Alternative solution which I’ve found in a StackOverflow question was suggested by Bob Lauer, but it involves recursion and IMHO is more complicated:

function decrementHugeNumberBy1(n) {
    // make sure s is a string, as we can't do math on numbers over a certain size
    n = n.toString();
    var allButLast = n.substr(0, n.length - 1);
    var lastNumber = n.substr(n.length - 1);

    if (lastNumber === "0") {
        return decrementHugeNumberBy1(allButLast) + "9";
    else {      
        var finalResult = allButLast + (parseInt(lastNumber, 10) - 1).toString();
        return trimLeft(finalResult, "0");

function trimLeft(s, c) {
    var i = 0;
    while (i < s.length && s[i] === c) {

    return s.substring(i);

Now, if you’re the type of person who likes to shoot sparrows with a howitzer, there are full-blown libraries to handle operations on large numbers in JavaScript; just to name a few: BigInteger, js-numbers and javascript-bignum.

MongoDB migration with Node and Monk

Recently one of our top users complained that their Storify account is unaccessible. We’ve checked the production database and it appeared to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks for a great MongoHQ service we had a backup database in less than 15 minutes.

Recently one of our top users complained that their Storify account was unaccessible. We’ve checked the production database and it appeares to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks to a great MongoHQ service, we had a backup database in less than 15 minutes.
There were two options to proceed with the migration:

  1. Mongo shell script
  2. Node.js program

Because Storify user account deletion involves deletion of all related objects — identities, relationships (followers, subscriptions), likes, stories — we’ve decided to proceed with the latter option. It worked perfectly, and here is a simplified version which you can use as a boilerplate for MongoDB migration (also at

Restoring MongoDB Records
Restoring MongoDB Records

Let’s load all the modules we need: Monk, Progress, Async, and MongoDB:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ObjectId=require('mongodb').ObjectID;

By the way, made by LeanBoost, Monk is a tiny layer that provides simple yet substantial usability improvements for MongoDB usage within Node.JS.

Monk takes connection string in the following format:


So we can create the following objects:

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

We need to know the object ID which we want to restore:

var userId = ObjectId(YOUR-OBJECT-ID); 

This is a handy restore function which we can reuse to restore objects from related collections by specifying query (for more on MongoDB queries go to post Querying 20M-Record MongoDB Collection. To call it, just pass a name of the collection as a string, e.g., "stories" and a query which associates objects from this collection with your main object, e.g., {}. The progress bar is needed to show us nice visuals in the terminal.

var restore = function(collection, query, callback){'restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
      else {
      if (bar.complete) {
        console.log('restoring '+collection+' is completed');
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
    } else {

Now we can use async to call the restore function mentioned above:

  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
        else {
          console.log('resored user: '+ user.username);

  restoreIdentity: function(callback){  
    }, callback);

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);

  }, function(e) {
  console.log('restoring is completed!');

The full code is available at and here:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ms = require('ms');
var ObjectId=require('mongodb').ObjectID;

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

var userId = ObjectId(YOUR-OBJECT-ID); // monk should have auto casting but we need it for queries

var restore = function(collection, query, callback){'restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
      else {
      if (bar.complete) {
        console.log('restoring '+collection+' is completed');
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
    } else {

  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
        else {
          console.log('resored user: '+ user.username);

  restoreIdentity: function(callback){  
    }, callback);

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);

  }, function(e) {
  console.log('restoring is completed!');

To launch it, run npm install/update and change hard-coded database values.

Sample of Rapid Prototyping with JS

Free sample chapter of Rapid Prototyping with JS which is a hands-on book which introduces you to rapid software prototyping using the latest cutting-edge web and mobile technologies including NodeJS, MongoDB, BackboneJS, Twitter Bootstrap, LESS, jQuery,, Heroku and others.

Rapid Prototyping with JS is a hands-on book which introduces you to rapid software prototyping using the latest cutting-edge web and mobile technologies including NodeJS, MongoDB, BackboneJS, Twitter Bootstrap, LESS, jQuery,, Heroku and others.

Rapid Prototyping with JS

Here is a free sample, first chapter — Introduction, of Rapid Prototyping with JS. You can also get a free PDF from LeanPub and explore code examples at To buy a full version in PDF, Mobi/Kindle and ePub/iPad formats go to


Rapid Prototyping with JS is a hands-on book which introduces you to rapid software prototyping using the latest cutting-edge web and mobile technologies including Node.js, MongoDB, Twitter Bootstrap, LESS, jQuery,, Heroku and others.

Who This Book is For

The book is designed for advanced-beginner and intermediate level web and mobile developers: somebody who has just started programming and somebody who is an expert in other languages like Ruby on Rails, PHP, and Java and wants to learn JavaScript and Node.js.

Rapid Prototyping with JS, as you can tell from the name, is about taking your idea to a functional prototype in the form of a web or a mobile application as fast as possible. This thinking adheres to the Lean Startup methodology; therefore, this book would be more valuable to startup founders, but big companies’ employees might also find it useful, especially if they plan to add new skills to their resume.


Mac OS X or UNIX/Linux systems are highly recommended for this book’s examples and for web development in general, although it’s still possible to hack your way on a Windows-based system.

Some cloud services require users’ credit/debit card information even for free accounts.

What to Expect

Expect a lot of coding and not much of a theory. All the theory we cover is directly related to some of the practical aspects and essential for better understanding of technologies and specific approaches in dealing with them, e.g., JSONP and cross-domain calls.

In addition to coding examples, the book covers virtually all setup and deployment step-by-step.

You’ll learn on the example of Message Board web/mobile applications starting with front-end components. There are a few versions of these applications, but by the end we’ll put front-end and back-end together and deploy to production environment. The Message Board application contains all the necessary components typical for a basic web app, and will give you enough confidence to continue developing on your own, apply for a job/promotion or build a startup!

This is a digital version of the book, so most of the links are hidden just like on any other web page, e.g., jQuery instead of The content of the book has local hyperlinks which allow you to jump to any section.

All the source code for examples used in this book is available in the book as well as in a public GitHub repository You can also download files as a ZIP archive or use Git to pull them. More on how to install and use Git will be covered later in the book. The source code files, folder structure and deployment files are supposed to work locally and/or remotely on PaaS solutions, i.e., Windows Azure and Heroku, with minor or no modifications.


This is what source code blocks look like:

var object = {}; = "Bob";

Terminal commands have a similar look but start with dollar sign, $:

$ git push origin heroku
$ cd /etc/
$ ls 

Inline filenames, path/folder names, quotes and special words/names are italicized while command names, e.g., mongod, and emphasized words, e.g., Note, are bold.

Web Basics


The bigger picture of web and mobile application development consists of the following steps:

  1. User types a URL or follows a link in her browser (aka client);
  2. Browser makes HTTP request to the server;
  3. Server processes the request, and if there’re any parameters in a query string and/or body of the request takes them into account;
  4. Server updates/gets/transforms data in the database;
  5. Server responds with HTTP response containing data in HTML, JSON or other formats;
  6. Browser receives HTTP response;
  7. Browser renders HTTP response to the user in HTML or any other format, e.g., JPEG, XML, JSON.

Mobile applications act in the same manner as regular websites, only instead of a browser there might be a native app. Other minor differences include: data transfer limitation due to carrier bandwidth, smaller screens, and the more efficient use of the local storage.

There are a few approaches to mobile development, each with its own advantages and disadvantages:

  • Native iOS, Android, Blackberry apps build with Objective-C and Java;
  • Native apps build with JavaScript in Appcelerator and then complied into native Objective-C or Java;
  • Mobile websites tailored for smaller screens with responsive design, CSS frameworks like Twitter Bootstrap or Foundation, regular CSS or different templates;
  • HTML5 apps which consists of HTML, CSS and JavaScript, and are usually build with frameworks like Sencha Touch,, JO, and then wrapped into native app with PhoneGap.

Hyper Text Markup Language

Hyper Text Markup Language, or HTML, is not a programming language in itself. It is a set of markup tags which describes the content and presents it in a structured and formatted way. HTML tags consist of a tag name inside of the angle brackets (<>). In most cases tags surround the content with the end tag having forward slash before the tag name.

In this example each line is an HTML element:

<h2>Overview of HTML</h2>
<div>HTML is a ...</div>
<link rel="stylesheet" type="text/css" href="style.css" />

The HTML document itself is an element of html tag and all other elements are children of that html tag:

<!DOCTYPE html>
<html lang="en">
    <link rel="stylesheet" type="text/css" href="style.css" />
    <h2>Overview of HTML</h2>
    <p>HTML is a ...</p>

There are different flavors and versions of HTML, e.g., DHTML, XHTML 1.0, XHTML 1.1, XHTML 2, HTML 4, HTML 5. This article does a good job of explaining the differences — Misunderstanding Markup: XHTML 2/HTML 5 Comic Strip.

More information is available at Wikipedia and w3schools.

Cascading Style Sheets

Cascading Style Sheets, or CSS, is a way to format and present content. An HTML document can have several stylesheets with the tag link as in previous examples or style tag:

  body {
  padding-top: 60px; /* 60px to make some space */

Each HTML element can have id and class attribute:

<div id="main" class="large">Lorem ipsum dolor sit amet,  Duis sit amet neque eu.</div>

In CSS we access elements by their id, class, tag name and in some edge cases by parent-child relationship or element attribute value:

p {
div#main {
.large {
body > div {
input[name="email"] {

More information for further reading is available at Wikipedia and w3schools.

CSS3 is an upgrade to CSS which includes new ways of doing things such as rounded corners, borders and gradients, which were possible in regular CSS only with the help of PNG/GIF images and by using other tricks.

For more information refer to, w3school
and CSS3 vs CSS comparison article on Smashing.


JavaScript was started in 1995 at Netscape as LiveScript. It has the same relationship with Java as a hamster and a ham :)
It is used for both client and server side development as well as in desktop applications.

There is a script tag to use JavaScript in the HTML document:

<script type="text/javascript" language="javascript>
  alert("Hello world!");
  //simple alert dialog window

Usually it a good idea to separate JavaScript code from HTML; in this example we include app.js file:

<script src="js/app.js" type="text/javascript" language="javascript" />

Here are the main types of JavaScript objects/classes:

  • Array object, e.g., var arr = ["apple", "orange", 'kiwi"];
  • Boolean primitive object, e.g., var bool = true;
  • Date object, e.g., var d = new Date();
  • Math object, e.g., var x = Math.floor(3.4890);
  • Number primitive object, e.g., var num = 1;
  • String primitive object, e.g., var str = "some string";
  • RegExp object, e.g., var pattern = /[A-Z]+/;
  • Global properties and functions, e.g., NaN
  • Browser objects, e.g., window.location = '';
  • DOM objects, e.g., var table = document.createElement('table');

Full JavaScript and DOM objects and classes reference with examples are available at w3school.

Typical syntax for function declaration:

function Sum(a,b) {
  var sum = a+b;
  return sum;

Functions in JavaScript are first-class citizens due to functional programming nature of the language. Therefore functions can be used as other variables/objects; for example, functions can be passed to other functions as arguments:

var f = function (str1){
  return function(str2){
  return str1+' '+str2;
var a = f('hello');
var b = f('goodbye');

JavaScript has a loose/weak typing, as opposed to strong typing in languages like C and Java, which makes JavaScript a better programming language for prototyping.

More information about browser-run JavaScript is available at Wikipedia and w3schools.

Agile Methodologies

Agile software development methodology evolved due to the fact that traditional methods, like Waterfall, weren’t good enough in situations of high unpredictability, i.e., when the solution is unknown. Agile methodology includes Scrum/Sprint, Test-Driven Development, Continuous Deployment, Paired Programming and other practical techniques many of which were borrowed from Extreme Programming.


In regard to the management, Agile methodology uses Scrum approach. More about Scrum can be read at:

Scrum methodology is a sequence of short cycles, and each cycle is called sprint. One sprint usually lasts from one to two weeks. Sprint starts and ends with sprint planning meeting where new tasks can be assigned to team members. New tasks cannot be added to the sprint in progress; they can be added only at the sprint meetings.

An essential part of the Scrum methodology is the daily scrum meeting, hence the name. Each scrum is a 5–15 minutes long meeting which is often conducted in the hallways. On scrum meetings each team member answers three questions:

  1. What have you done since yesterday?
  2. What are you going to do today?
  3. Do you need anything from other team members?

Flexibility makes Agile an improvement over Waterfall methodology, especially in situations of high uncertainty, i.e., startups.

Advantage of Scrum methodology: effective where it is hard to plan ahead of the time, and also in situations where a feedback loop is used as a main decision-making authority.

Test-Driven Development

Test-Driven Development, or TDD, consists of following steps:

  1. Write failing automated test cases for new feature/task or enhancement by using assertions that are either true or false.
  2. Write code to successfully pass the test cases.
  3. Refactor code if needed, and add functionality while keeping the test cases passed.
  4. Repeat until the task is complete.

Advantages of Test-Driven Development:

  • fewer bugs/defects,
  • more efficient codebase,
  • provides programmers with confidence that code works and doesn’t break old functionality.

Continuous Deployment

Continuous Deployment, or CD, is the set of techniques to rapidly deliver new features, bug fixes, and enhancements to the customers. CD includes automated testing and automated deployment. By utilizing Continuous Deployment the manual overheard is decreased, and the feedback loop time is minimized. Basically, the faster developer can get the feedback from the customers, the sooner the product can pivot, which leads to more advantages over the competition. Many startups deploy multiple times in a single day in comparison to the 6–12 month release cycle which is still typical for corporations and big companies.

One of the most popular solutions for CD is Continuous Integration server Jenkins.

Advantages of Continuous Deployment approach: decreases feedback loop time and manual labor overhead.

Pair Programming

Pair Programming is a technique when two developers work together on one machine. One of the developers is a driver and the other is observer. The driver writes the code and the observer watches it, assists, and makes suggestions. Then they switch the roles. The driver has a more tactical role of focusing on the current task. In contrast, the observer has a more strategic role overseeing “the bigger picture,” and the ways to improve the codebase and to make it more efficient.

Advantages of Paired Programming:

  • Pair attributes to shorter and more efficient codebase, and introduces fewer bugs and defects.
  • As an added bonus, knowledge is passed along programmers as they work together. However, situations of conflicts between developers are possible.


Node.js is an event-driven asynchronous I/O server-side technology for building scalable and efficient web servers. Node.js consists of Google’s V8 JavaScript engine.

The purpose and use of Node.js is similar to Twisted for Python and EventMachine for Ruby. The JavaScript implementation of Node was the third one after attempts at using the Ruby and C++ programming languages.

Node.js is not in itself a framework like Ruby on Rails; it’s more comparable to the pair PHP+Apache. Here are some of Node.js frameworks: Express, Meteor, Tower.js, Railsway JS, Geddy, Derby.

Advantages of using NodeJS:

  • Developers have high chances of familiarity with JavaScript due to its status as a de facto standard of the application development for web and mobile.
  • One language for front-end and back-end development speeds up coding process. A developer’s brain doesn’t have to switch between different syntaxes. The learning of methods and classes goes faster.
  • With NodeJS, you could prototype quickly and go to market to do your customer development and customer acquisition early. This is an important competitive advantage over the other companies, which use less agile technologies, e.g., PHP and MySQL.
  • NodeJS is build to support real-time applications by utilizing web-sockets.

For more information go to Wikipedia,, and articles on ReadWrite and O’Reilly.

NoSQL and MongoDB

MongoDB, from huMONGOus, is a high-performance no-relationship database for huge data. NoSQL concept came out when traditional Relational Database Management Systems, or RDBMS, were unable to meet the challenges of huge amounts of data.

Advantages of using MongoDB:

  • Scalable due to distributed nature: multiple servers and data centers could have redundant data.
  • High-performance: MongoDB is very effective for storing and retrieving data, not the relationship between elements.
  • Key-value store is ideal for prototyping because it doesn’t require one to know the schema and there is no need for a fixed data model.

Cloud Computing

Could computing consists of:

  • Infrastructure as s Service (IaaS), e.g., Rackspace, Amazon Web Services;
  • Platform as a Service (PaaS), e.g., Heroku, Windows Azure;
  • Software as a Service (SaaS), e.g., Google Apps,

Cloud application platforms provide:

  • scalability, e.g., spawn new instances in a matter of minutes;
  • easy deployment, e.g., to push to Heroku you can just use $ git push;
  • pay-as-you-go plan: add or remove memory and disk space based on demands;
  • usually there is no need to install and configure databases, app servers, packages, etc.;
  • security and support.

PaaS are ideal for prototyping, building minimal viable products (MVP) and for early stage startups in general.

Here is the list of most popular PaaS solutions:

HTTP Requests and Responses

Each HTTP Request and Response consists of the following components:

  1. Header: information about encoding, length of the body, origin, content type, etc.;
  2. Body: content, usually parameters or data which is passed to the server or sent back to a client;

In addition, HTTP Request contains:

  • Method: There are several methods; the most common are GET, POST, PUT, DELETE.
  • URL: host, port, path;
  • Query string, i.e., everything after a question mark in the URL.


RESTful (REpresentational State Transfer) API became popular due to the demand in distributed systems where each transaction needs to include enough information about the state of the client. In a sense this standard is stateless because no information about the clients’ state is stored on the server, thus making it possible for each request to be served by a different system.

Distinct characteristics of RESTful API:

  • Has better scalability support due to the fact that different components can be independently deployed to different servers;
  • Replaced Simple Object Access Protocol (SOAP) because of the simpler verb and noun structure;
  • Utilizes HTTP methods: GET, POST, DELETE, PUT, OPTIONS etc.

Here is an example of simple Create, Read, Update and Delete (CRUD) REST API for Message Collection:

Method URL Meaning
GET /messages.json Return list of messages in JSON format
PUT /messages.json Update/replace all messages and return status/error in JSON
POST /messages.json Create new message and return its id in JSON format
GET /messages/{id}.json Return message with id {id} in JSON format
PUT /messages/{id}.json Update/replace message with id {id}, if {id} message doesn’t exists create it
DELETE /messages/{id}.json Delete message with id {id}, return status/error in JSON format

REST is not a protocol; it is an architecture in the sense that it’s more flexible than SOAP, which is a protocol. Therefore, REST API URLs could look like /messages/list.html or /messages/list.xml in case we want to support these formats.

PUT and DELETE are idempotent methods, which means that if the server receives two or more similar requests, the end result will be the same.

GET is nullipotent and POST is not idempotent and might affect state and cause side-effects.

Further reading on REST API at Wikipedia and A Brief Introduction to REST article.

Querying 20M-Record MongoDB Collection

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

The script was simple: find elements, see if there are no dependencies, delete orphan elements, neveretheless it was timing out or just becoming unresponsive. After a few hours of running different modifications I came up with the working solution.

Here are some of the suggestions when dealing with big collections on Node.js + MongoDB stack:

Befriend Shell

Interactive shell, or mongo, is a good place to start. To launch it, just type mongo in your terminal window:

$ mongo

Assuming you have correct paths set-up during your MongoDB installation, the command will start the shell and present angle brace.


Use JS files

To execute JavaScript file in a Mongo shell run:

$ mongo fix.js --shell

Queries look the same:


To output results use:




To connect to a database:

db = connect("<host>:<port>/<dbname>")

Break Down

Separate your query into a few scripts with smaller queries. You can output each script to a file (as JSON or CSV) and then look at the output and see if your script is doing what it is actually supposed to do.

To execute JavaScript file (fix.js) and output results into another file (fix.txt) instead of the screen, use:

$ mongo fix.js > fix.txt --shell


$ mongo --quiet fix.js > fix.txt --shell

Check count()

Simply run count() to see the number of elements in the collection:


or a cursor:


Use limit()

You can apply limit() function to your cursor without modifying anything else in a script to test the output without spending too much time waiting for the whole result.

For example:

 db.find({…}).limit(10).forEach(function() {…});


 db.find({…}).limit(1).forEach(function() {…});

is better than using:


because findOne() returns single document while find() and limit() still returns a cursor.

Hit Index

hint() index will allow you to manually use particular index:

 db.elemetns.find({…}).hint({active:1, status:1, slug:1});

Make sure you have actual indexes with ensureIndex():


Narrow Down

Use additional criteria such as $ne, $where, $in, e.g.:

db.elements.find({ $and:[{type:'link'}
  ,{'date.created':{$gt: new Date("November 30 2012")}}
  ,{$where: function () {
    if (this.meta&& {
      return this.meta.title!;
    } else {
      return false;
  , {'date.created': {$lt: new Date("December 2 2012")}}]}).forEach(function(e, index, array){