Express.js FUNdamentals: An Essential Overview of Express.js

Note: This text is a part of upcoming ebook JavaScript and Node FUNdamentals: A Collection of Essential Basics.

Express.js is an amazing framework for Node.js projects and used in the majority of such web apps. Unfortunately, there’s a lack of tutorials and examples on how to write good production-ready code. To mitigate this need, we released Express.js Guide: The Comprehesive Book on Express.js. However, all things start from basics, and for that reason we’ll give you a taste of the framework in this post, so you can decide if you want to continue the learning further.

Continue reading “Express.js FUNdamentals: An Essential Overview of Express.js”

ExpressWorks: an Automated Express.js/Node.js Workshop and Tutorial

ExpressWorks is an automated Express.js/Node.js workshop.

TL;DR: ExpressWorks is an automated Express.js/Node.js workshop.

Continue reading “ExpressWorks: an Automated Express.js/Node.js Workshop and Tutorial”

Solving Programming Problems

Programming problems are not that much different from mathematics or physics problems. There are usually an input and an output to which someone needs to arrive by providing an algorithm. This algorithm is typically a function or series of functions.

Programming puzzles and toy problems are good exercises to sharpen skills and prepare for technical interviews. No wonder that more and more online coding schools (e.g., CodeAcademy) make those metal workouts main staple of their courses.

Beginner programmers might benefit by applying these steps to their process of solving a programming problem:

Programming problems are not that much different from mathematics or physics problems. There are usually an input and an output to which someone needs to arrive by providing an algorithm. This algorithm is typically a function or series of functions.

Programming puzzles and toy problems are good exercises to sharpen skills and prepare for technical interviews. No wonder that more and more online coding schools (e.g., CodeAcademy) make those metal workouts main staple of their courses.

Beginner programmers might benefit by applying these steps to their process of solving a programming problem:

Continue reading “Solving Programming Problems”

The Release of Express.js Guide: The Comprehensive Book on Express.js

After weeks of writing and editing, Azat and his team are happy to announce the release of Express.js Guide: The Most Popular Node.js Framework Manual! The book is very approachable and suitable for beginners. If someone wants to save time searching the web and learn the best practices from the trenches, Express.js Guide is the book that has everything: Express.js API reference, quick start guides, 20+ meticulously explained examples and tutorials on over 270 pages with more than 60 illustrations.

Express.js is a de facto standard of Node.js development and the most popular NPM library as of today! However, as with any framework, sometimes the learning curve is steep. At HackReactor, I often asked the same questions about code organization, authentication, database connections and deployment.

Continue reading “The Release of Express.js Guide: The Comprehensive Book on Express.js”

Todo App with Express.js/Node.js and MongoDB

Todo apps are considered to be quintessential in showcasing frameworks akin to famous Todomvc.com for front-end JavaScript frameworks. In this example, we’ll use Jade, forms, LESS, AJAX/XHR and CSRF.

Note: This tutorial is a part of Express.js Guide: The Comprehensive Book on Express.js.

Todo apps are considered to be quintessential in showcasing frameworks akin to famous Todomvc.com for front-end JavaScript frameworks. In this example, we’ll use Jade, forms, LESS, AJAX/XHR and CSRF.

In our Todo app, we’ll intentionally not use Backbone.js or Angular to demonstrate how to build traditional websites with the use of forms and redirects. In addition to that, we’ll explain how to plug-in CSRF and LESS.

Example: All the source code is in the github.com/azat-co/todo-express for your convenience.

Continue reading “Todo App with Express.js/Node.js and MongoDB”

JS FUNdamentals: An Essential Overview of JavaScript

Programming languages like BASIC, Python, C has boring machine-like nature which requires developers to write extra code that’s not directly related to the solution itself. Think about line numbers in BASIC or interfaces, classes and patterns in Java.

On the other hand JavaScript inherits the best traits of pure mathematics, LISP, C# which lead to a great deal of expressiveness (and fun!).

More about Expressive Power in this post: What does “expressive” mean when referring to programming languages?

If it’s not fun, it’s not JavaScript.

Note: This text is a part of upcoming ebook JavaScript and Node FUNdamentals: A Collection of Essential Basics.

Expressiveness

Programming languages like BASIC, Python, C has boring machine-like nature which requires developers to write extra code that’s not directly related to the solution itself. Think about line numbers in BASIC or interfaces, classes and patterns in Java.

Continue reading “JS FUNdamentals: An Essential Overview of JavaScript”

Markdown Web Publishing

TL;DR: Know and use Markdown because it’s fast and convenient.

Meet Markdown

“What is a Markdown?” my editor asked me the other day. She is an experienced content and copy editor and has worked for magazines and book publishers. However, she is not familiar with the powerful and convenient Markdown because it’s still a rather unknown approach to publishing except for an elite circle of early adopters and technology professionals. Even the so called re-invented web publishing experience Medium doesn’t support Markdown, but many other services and apps including my favorites (ByWord and LeanPub) build their whole flow around Markdown! In fact, I’m such a huge fan of Markdown that I’m writing my daily journals in it as well as this blog post.

Continue reading “Markdown Web Publishing”

Interview Hacks

About a year and a half ago I wrote a ranting post about same old boring technical interview questions. Those pesky hash tables, arrays and trees! Now it’s time to revisit the topic, and enhance it with some useful tips, tricks and insights (a.k.a., hacks) that I’ve observed over the years in software engineering.

Continue reading “Interview Hacks”

Tutorial: Node.js and MongoDB JSON REST API server with Mongoskin and Express.js

Tutorial: building a JSON REST API server with Node.js and MongoDB using Mongoskin and Express.js utilizing Mocha and Super Agent for BDD/TDD.

Update3: Expess 4 version of this tutorial is available at Express.js 4, Node.js and MongoDB REST API Tutorial, and github.com/azat-co/rest-api-express (master branch). This tutorial will work with Express 3.x.

Update2: “Mongoskin removed ‘db.collection.id’ and added some actionById methods” from this pull request with this code changes. To use the code in this post, just install an older version of Mongoskin (0.5.0?). The code in the GitHub will work with Mongoskin 1.3.20.

Update2: “Mongoskin removed ‘db.collection.id’ and added some actionById methods” from this pull request with this code changes. To use the code in this post, just install an older version of Mongoskin (0.5.0?)

Update: use the enhanced code from this repository github.com/azat-co/rest-api-express (express3 branch).

Note: This text is a part of Express.js Guide: The Comprehensive Book on Express.js.

This tutorial will walk you through writing test using the Mocha and Super Agent libraries and then use them in a test-driven development manner to build a Node.js free JSON REST API server utilizing Express.js framework and Mongoskin library for MongoDB. In this REST API server, we’ll perform create, read, update and delete (CRUD) operations and harness Express.js middleware concept with app.param() and app.use() methods.

Continue reading “Tutorial: Node.js and MongoDB JSON REST API server with Mongoskin and Express.js”

Express.js Tutorial: Instagram Gallery Example App with Storify API

Storify runs on Node.js and Express.js, therefore why not use these technologies to write an application that demonstrates how to build apps that rely on third-party APIs and HTTP requests to them?

Note: This text is a part of Express.js Guide: The Comprehensive Book on Express.js.

An example of an Express.js app using Storify API as a data source is a continuation of introduction to Express.js tutorials.

Continue reading “Express.js Tutorial: Instagram Gallery Example App with Storify API”

Intro to Express.js: Parameters, Error Handling and Other Middleware

Express.js is a node.js framework that among other things provides a way to organize routes. Each route is defined via a method call on an application object with a URL patter as a first parameter (RegExp is also supported). The best practice is to keep router lean and thin by moving all the logic into corresponding external modules/files. This way important server configuration parameters will be neatly in one place, right there when you need them! :-)

Note: This text is a part of Express.js Guide: The Comprehensive Book on Express.js.

Express.js is one of the most popular and mature Node.js frameworks. You can read more about in Intro to Express.js series on webapplog.com:

To learn how to create an application from scratch please refer to the earlier post.

Request Handlers

Express.js is a node.js framework that among other things provides a way to organize routes. Each route is defined via a method call on an application object with a URL patter as a first parameter (RegExp is also supported), for example:

app.get('api/v1/stories/', function(res, req){
  ...
})

or, for a POST method:

app.post('/api/v1/stories'function(req,res){
  ...
})

It’s needless to say that DELETE and PUT methods are supported as well.
The callbacks that we pass to get() or post() methods are called request handlers, because they take requests (req), process them and write to response (res) objects. For example:

app.get('/about', function(req,res){
  res.send('About Us: ...');
});

We can have multiple request handlers, hence the name middleware. They accept a third parameter next calling which (next()) will switch the execution flow to the next handler:

app.get('/api/v1/stories/:id', function(req,res, next) {
  //do authorization
  //if not authorized or there is an error 
  // return next(error);
  //if authorized and no errors
  return next();
}), function(req,res, next) {
  //extract id and fetch the object from the database
  //assuming no errors, save story in the request object
  req.story = story;
  return next();
}), function(req,res) {
  //output the result of the database search
  res.send(res.story);
});

ID of a story in URL patter is a query string parameter which we need for finding a matching items in the database.

Parameters Middleware

Parameters are values passed in a query string of a URL of the request. If we didn’t have Express.js or similar library, and had to use just the core Node.js modules, we’d had to extract parameters from HTTP.request object via some require('querystring').parse(url) or require('url').parse(url, true) functions trickery.

Thanks to Connect framework and people at VisionMedia, Express.js already has support for parameters, error handling and many other important features in the form of middlewares. This is how we can plug param middleware in our app:

app.param('id', function(req,res, next, id){
  //do something with id
  //store id or other info in req object
  //call next when done
  next();
});

app.get('/api/v1/stories/:id',function(req,res){
  //param middleware will be execute before and
  //we expect req object already have needed info
  //output something
  res.send(data);
});

For example:

app.param('id', function(req,res, next, id){
  req.db.get('stories').findOne({_id:id}, function (e, story){
    if (e) return next(e);
    if (!story) return next(new Error('Nothing is found'));
    req.story = story;
    next();
  });
});

app.get('/api/v1/stories/:id',function(req,res){
  res.send(req.story);
});

Or we can use multiple request handlers but the concept remains the same: we can expect to have req.story object or an error thrown prior to the execution of this code so we abstract common code/logic of getting parameters and their respective objects:

app.get('/api/v1/stories/:id', function(req,res, next) {
  //do authorization
  }),
  //we have an object in req.story so no work is needed here
  function(req,res) {
  //output the result of the database search
  res.send(story);
});

Authorization and input sanitation are also good candidates for residing in the middlewares.

Function param() is especially cool because we can combine different keys, e.g.:

app.get('/api/v1/stories/:storyId/elements/:elementId',function(req,res){
  res.send(req.element);
});

Error Handling

Error handling is typically used across the whole application, therefore it’s best to implement it as a middleware. It has the same parameters plus one more, error:

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  res.send(500);
})

In fact, the response can be anything:

JSON string

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  res.send(500, {status:500, message: 'internal error', type:'internal'});
})

Text message

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  res.send(500, 'internal server error');
})

Error page

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  //assuming that template engine is plugged in
  res.render('500');
})

Redirect to error page

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  res.redirect('/public/500.html');
})

Error HTTP response status (401, 400, 500, etc.)

app.use(function(err, req, res, next) {
  //do logging and user-friendly error message display
  res.end(500);
})

By the way, logging is also should be abstracted in a middleware!

To trigger an error from within your request handlers and middleware you can just call:

next(error);

or

next(new Error('Something went wrong :-(');

You can also have multiple error handlers, and use named instead of anonymous functions as its shows in Express.js Error handling guide.

Other Middleware

In addition to extracting parameters, it can be used for many things, like authorization, error handling, sessions, output, and others.

res.json() is one of them. It conveniently outputs JavaScript/Node.js object as a JSON. For example:

app.get('/api/v1/stories/:id', function(req,res){
  res.json(req.story);
});

is equivalent to (if req.story is an Array and Object):

app.get('/api/v1/stories/:id', function(req,res){
  res.send(req.story);
});

or

app.get('api/v1/stories/:id',function(req,res){
  res.set({
    'Content-Type': 'application/json'
  });
  res.send(req.story);
});

Abstraction

Middleware is flexible. You can use anonymous or named functions, but the best thing is to abstract request handlers into external modules based on the functionality:

var stories = require.('./routes/stories');
var elements = require.('./routes/elements');
var users = require.('./routes/users');
...
app.get('/stories/,stories.find);
app.get('/stories/:storyId/elements/:elementId', elements.find);
app.put('/users/:userId',users.update);

routes/stories.js:

module.exports.find = function(req,res, next) {
};

routes/elements.js:

module.exports.find = function(req,res,next){
};

routes/users.js:

module.exports.update = function(req,res,next){
};

You can use some functional programming tricks, like this:

function requiredParamHandler(param){
  //do something with a param, e.g., check that it's present in a query string
  return function (req,res, next) {
    //use param, e.g., if token is valid proceed with next();
    next();
  });
}

app.get('/api/v1/stories/:id', requiredParamHandler('token'), story.show);

var story  = {
  show: function (req, res, next) {
    //do some logic, e.g., restrict fields to output
    return res.send();
  }
}   

As you can see middleware is a powerful concept for keeping code organized. The best practice is to keep router lean and thin by moving all the logic into corresponding external modules/files. This way important server configuration parameters will be neatly in one place, right there when you need them! :-)

Node.js MVC: Express.js + Derby.js Hello World Tutorial

Express.js is a popular node frameworks which uses middleware concept to enhance functionality of applications. Derby is a new sophisticated Model View Controller (MVC) framework which is designed to be used with Express as it’s middleware. Derby also comes with the support of Racer, data synchronization engine, and Handlebars-like template engine among many other features.

DerbyJS — Node.js MVC Framework

Express.js is a popular node frameworks which uses middleware concept to enhance functionality of applications. Derby is a new sophisticated Model View Controller (MVC) framework which is designed to be used with Express as it’s middleware. Derby also comes with the support of Racer, data synchronization engine, and Handlebars-like template engine among many other features.

Derby.js Installation

Let’s set up a basic Derby application architecture without the use of scaffolding. Usually project generators are confusing when people just start to learn a new comprehensive framework. This is a bare minimum “Hello World” application tutorial that still illustrates Derby skeleton and demonstrates live-templates with websockets.

Of course we’ll need Node.js and NPM which can be obtained at nodejs.org. To install derby globally run:

$ npm install -g derby

To check the installation:

$ derby -V

My version as of April 2013 is 0.3.15. We should be good to go to creating our first app!

File Structure in a Derby.js App

This is the project folder structure:

project/
  -package.json
  -index.js
  -derby-app.js
  views/
    derby-app.html
  styles/
    derby-app.less

Dependencies for The Derby.js Project

Let’s include dependencies and other basic information in package.json file:

 {
  "name": "DerbyTutorial",
  "description": "",
  "version": "0.0.0",
  "main": "./server.js",
  "dependencies": {
    "derby": "*",
    "express": "3.x"
  },
  "private": true
}

Now we can run npm install which will download our dependencies into node_modules folder.

Views in Derby.js

Views must be in views folder and they must be either in index.html under a folder which has the same name as your derby app JavaScript file, i.e., views/derby-app/index.html, or be inside of a file which has the same name as your derby app JS file, i.e., derby-app.html.

In this example “Hello World” app we’ll use <Body:> template and {message} variable. Derby uses mustach-handlebars-like syntax for reactive binding. index.html looks like this:

<Body:>
  <input value="{message}"><h1>{message}</h1>

Same thing with Stylus/LESS files, in our example index.css has just one line:

h1 {
  color: blue;
}

To find out more about those wonderful CSS preprocessors check out documentation at Stylus and LESS.

Building The Main Derby.js Server

index.js is our main server file, and we begin it with an inclusion of dependencies with require() function:

var http = require('http'),
  express = require('express'),
  derby = require('derby'),
  derbyApp = require('./derby-app');

Last line is our derby application file derby-app.js.

Now we’re creating Express.js application (v3.x has significant differences between 2.x) and an HTTP server:

var expressApp = new express(),
  server = http.createServer(expressApp);

Derby uses Racer data synchronization library which we create like this:

var store = derby.createStore({
  listen: server
});

To fetch some data from back-end to the front-end we instantiate model object:

var model = store.createModel();

Most importantly we need to pass model and routes as middlewares to Express.js app. We need to expose public folder for socket.io to work properly.

expressApp.
  use(store.modelMiddleware()).
  use(express.static(__dirname + '/public')).
  use(derbyApp.router()).
  use(expressApp.router);

Now we can start the server on port 3001 (or any other):

server.listen(3001, function(){
  model.set('message', 'Hello World!');
});

Full code of index.js file:

var http = require('http'),
  express = require('express'),
  derby = require('derby'),
  derbyApp = require('./derby-app');

var expressApp = new express(),
  server = http.createServer(expressApp);

var store = derby.createStore({
  listen: server
});

var model = store.createModel();

expressApp.
  use(store.modelMiddleware()).
  use(express.static(__dirname + '/public')).
  use(derbyApp.router()).
  use(expressApp.router);

server.listen(3001, function(){
  model.set('message', 'Hello World!');
});

Derby.js Application

Finally, Derby app file which contains code for both a front-end and a back-end. Front-end only code is inside of app.ready() callback. To start, let’s require and create an app. Derby uses unusual construction (not the same familiar good old module.exports = app):

var derby = require('derby'),
  app = derby.createApp(module);

To make socket.io magic work we need to subscribe model attribute to its visual representation, in other words bind data and view. We can do it in the root route, and this is how we define it (patter is /, a.k.a. root):

app.get('/', function(page, model, params) {
  model.subscribe('message', function() {
    page.render();  
  })  
});

Full code of derby-app.js file:

var derby = require('derby'),
  app = derby.createApp(module);

app.get('/', function(page, model, params) {
  model.subscribe('message', function() {
    page.render();  
  })  
});  

Launching Hello World App

Now everything should be ready to boot our server. Execute node . or node index.js and open a browser at http://localhost:3001. You should be able to see something like this: http://cl.ly/image/3J1O0I3n1T46.

Derby + Express.js Hello World App

Passing Values to Back-End in Derby.js

Of course static data is not much, so we can slightly modify our app to make back-end and front-end pieces talks with each other.

In the server file index.js add store.afterDb to listen to set events on message attribute:

server.listen(3001, function(){
  model.set('message', 'Hello World!');
  store.afterDb('set','message', function(txn, doc, prevDoc, done){
    console.log(txn)
    done();
  }) 
});

Full code of index.js after modifications:

var http = require('http'),
  express = require('express'),
  derby = require('derby'),
  derbyApp = require('./derby-app');

var expressApp = new express(),
  server = http.createServer(expressApp);

var store = derby.createStore({
  listen: server
});

var model = store.createModel();

expressApp.
  use(store.modelMiddleware()).
  use(express.static(__dirname + '/public')).
  use(derbyApp.router()).
  use(expressApp.router);

server.listen(3001, function(){
  model.set('message', 'Hello World!');
  store.afterDb('set','message', function(txn, doc, prevDoc, done){
    console.log(txn)
    done();
  })   
});

In Derby application file derby-app.js add model.on() to app.ready():

  app.ready(function(model){
	    model.on('set', 'message',function(path, object){
	    console.log('message has been changed: '+ object);
	  })
  });

Full derby-app.js file after modifications:

var derby = require('derby'),
  app = derby.createApp(module);

app.get('/', function(page, model, params) {
  model.subscribe('message', function() {
    page.render();
  })
});

app.ready(function(model) {
  model.on('set', 'message', function(path, object) {
    console.log('message has been changed: ' + object);
  })
});

Now we’ll see logs both in the terminal window and in the browser Developer Tools console. The end result should look like this in the browser: http://cl.ly/image/0p3z1G3M1E2c, and like this in the terminal: http://cl.ly/image/322I1u002n38.

Hello World App: Browser Console Logs

Hello World App: Terminal Console Logs

For more magic in the persistence area, check out Racer’s db property. With it you can set up an automatic synch between views and database!

Let me know if you’re interested in any specific topic for future blog post and don’t forget to checkout my JavaScript books:

The full code of all the files in this Express.js + Derby Hello World app is available as a gist at https://gist.github.com/azat-co/5530311.

Node.js OAuth1.0 and OAuth2.0: Twitter API v1.1 Examples

Recently we had to work on modification to accommodate Twitter API v1.1. The main difference between Twitter API v1.1 and, soon to be deprecated, Twitter API v1.0 is that most of the REST API endpoints now require user or application context. In other words, each call needs to be performed via OAuth 1.0A or OAuth 2.0 authentication.

Recently we had to work on modification to accommodate Twitter API v1.1. The main difference between Twitter API v1.1 and, soon to be deprecated, Twitter API v1.0 is that most of the REST API endpoints now require user or application context. In other words, each call needs to be performed via OAuth 1.0A or OAuth 2.0 authentication.

Node.js OAuth
Node.js OAuth

At Storify we run everything on Node.js so it was natural that we used oauth module by Ciaran Jessup: NPM and GitHub. It’s mature and supports all the needed functionality but lacks any kind of examples and/or interface documentation.

Here are the examples of calling Twitter API v1.1, and a list of methods. I hope that nobody will have to dig through the oauth module source code anymore!

OAuth 1.0

Let start with a good old OAuth 1.0A. You’ll need four values to make this type of a request to Twitter API v1.1 (or any other service):

  1. Your Twitter application key, a.k.a., consumer key
  2. Your Twitter secret key
  3. User token for your app
  4. User secret for your app

All four of them can be obtained for your own apps at dev.twitter.com. In case that the user is not youself, you’ll need to perform 3-legged OAuth, or Sign in with Twitter, or something else.

Next we create oauth object with parameters, and call get() function to fetch a secured resource. Behind the scene get() function constructs unique values for the request header — Authorization header. The method encrypts URL, timestamp, application and other information in a signature. So the same header won’t work for another URL or after a specific time window.

var OAuth = require('OAuth');
var oauth = new OAuth.OAuth(
      'https://api.twitter.com/oauth/request_token',
      'https://api.twitter.com/oauth/access_token',
      'your Twitter application consumer key',
      'your Twitter application secret',
      '1.0A',
      null,
      'HMAC-SHA1'
    );
    oauth.get(
      'https://api.twitter.com/1.1/trends/place.json?id=23424977',
      'your user token for this app', 
      //you can get it at dev.twitter.com for your own apps
      'your user secret for this app', 
      //you can get it at dev.twitter.com for your own apps
      function (e, data, res){
        if (e) console.error(e);        
        console.log(require('util').inspect(data));
        done();      
      });    
});

OAuth Echo

OAuth Echo is similar to OAuth 1.0. If you’re a Delegator (service to which requests to Service Provider are delegated by Consumer) all you need to do is just pass the value of x-verify-credentials-authorization header to the Service Provider in Authorization header. Twitter has a good graphics on OAuth Echo.

There is OAuthEcho object which inherits must of its methods from normal OAuth class. In case if you want to write Consumer code (or for functional tests, in our case Storify is the delegator) and you need x-verify-credentials-authorization/Authorization header values, there is a authHeader method. If we look at it, we can easily reconstruct the headers with internal methods of oauth module such as _prepareParameters() and _buildAuthorizationHeaders(). Here is a function that will give us required values based on URL (remember that URL is a part of Authorization header):

  function getEchoAuth(url) { 
  //helper to construct echo/oauth headers from URL
    var oauth = new OAuth('https://api.twitter.com/oauth/request_token',
      'https://api.twitter.com/oauth/access_token',
      "AAAAAAAAAAAAAAAAAAAA",
      //test app token
      "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB", 
      //test app secret
    '1.0A',
    null,
      'HMAC-SHA1');
    var orderedParams = oauth._prepareParameters(
      "1111111111-AAAAAA", //test user token
    "AAAAAAAAAAAAAAAAAAAAAAA", //test user secret
    "GET",
    url
    );
    return oauth._buildAuthorizationHeaders(orderedParams);
  }

From your consumer code you can maker request with superagent or other http client library (e.g., node.js core http module’s http.request):

var request = require('super agent');

request.post('your delegator api url')
  .send({...}) 	
  //your json data
  .set(
    'x-auth-service-provider',
    'https://api.twitter.com/1.1/account/verify_credentials.json')
  .set(
    'x-verify-credentials-authorization',
    getEchoAuth("https://api.twitter.com/1.1/account/verify_credentials.json"))
  .end(function(res){console.log(res.body)});

OAuth2

OAuth 2.0 is a breeze to use comparing to the other authentication methods. Some argue that it’s not as secure, so make sure that you use SSL and HTTPS for all requests.

 var OAuth2 = OAuth.OAuth2;    
 var twitterConsumerKey = 'your key';
 var twitterConsumerSecret = 'your secret';
 var oauth2 = new OAuth2(
   twitterconsumerKey,
   twitterConsumerSecret, 
   'https://api.twitter.com/', 
   null,
   'oauth2/token', 
   null);
 oauth2.getOAuthAccessToken(
   '',
   {'grant_type':'client_credentials'},
   function (e, access_token, refresh_token, results){
     console.log('bearer: ',access_token);
     oauth2.get('protected url', 
       access_token, function(e,data,res) {
         if (e) return callback(e, null);
         if (res.statusCode!=200) 
           return callback(new Error(
             'OAuth2 request failed: '+
             res.statusCode),null);
         try {
           data = JSON.parse(data);        
         }
         catch (e){
           return callback(e, null);
         }
         return callback(e, data);
      });
   });

Please note the JSON.parse() function, oauth module returns string, not a JavaScript object.

Consumers of OAuth2 don’t need to fetch the bearer/access token for every request. It’s okay to do it once and save value in the database. Therefore, we can make requests to protected resources (i.e. Twitter API v.1.1) with only one secret password. For more information check out Twitter application only auth.

Node.js oauth API

Node.js oauth OAuth

oauth.OAuth()

Parameters:

  • requestUrl
  • accessUrl
  • consumerKey
  • consumerSecret
  • version
  • authorize_callback
  • signatureMethod
  • nonceSize
  • customHeaders

Node.js oauth OAuthEcho

oauth.OAuthEcho()

Parameters:

  • realm
  • verify_credentials
  • consumerKey
  • consumerSecret
  • version
  • signatureMethod
  • nonceSize
  • customHeaders

OAuthEcho sharers the same methods as OAuth

Node.js oauth Methods

Secure HTTP request methods for OAuth and OAuthEcho classes:

OAuth.get()

Parameters:

  • url
  • oauth_token
  • oauth_token_secret
  • callback

OAuth.delete()

Parameters:

  • url
  • oauth_token
  • oauth_token_secret
  • callback

OAuth.put()

Parameters:

  • url
  • oauth_token
  • oauth_token_secret
  • post_body
  • post_content_type
  • callback

OAuth.post()

Parameters:

  • url
  • oauth_token
  • oauth_token_secret
  • post_body
  • post_content_type
  • callback

https://github.com/ciaranj/node-oauth/blob/master/lib/oauth.js

Node.js oauth OAuth2

OAuth2 Class

OAuth2()

Parameters:

  • clientId
  • clientSecret
  • baseSite
  • authorizePath
  • accessTokenPath
  • customHeaders

OAuth2.getOAuthAccessToken()

Parameters:

  • code
  • params
  • callback

OAuth2.get()

Parameters:

  • url
  • access_token
  • callback

https://github.com/ciaranj/node-oauth/blob/master/lib/oauth2.js

The authors of node.js oauth did a great job but currently there are 32 open pull requests (mine is one of them) and it makes me sad. Please let them know that we care about improving Node.js ecosystem of modules and developers community!

UPDATE: Pull request was successfully merged!

Useful Twitter API v1.1 Resources

Just because they are vast and not always easy to find.

Tools

Intro to Express.js: Simple REST API app with Monk and MongoDB

After looking at Google Analytics stats I’ve realized that there is a demand for short Node.js tutorial and quick start guides. This is an introduction to probably the most popular (as of April 2013) Node.js framework Express.js.

Why?

After looking at Google Analytics stats I’ve realized that there is a demand for short Node.js tutorial and quick start guides. This is an introduction to probably the most popular (as of April 2013) Node.js framework Express.js.

Express.js — Node.js framework
Express.js — Node.js framework

mongoui

This app is a start of mongoui project. A phpMyAdmin counterpart for MongoDB written in Node.js. The goal is to provide a module with a nice web admin user interface. It will be something like Parse.com, Firebase.com, MongoHQ or MongoLab has but without trying it to any particular service. Why do we have to type db.users.findOne({'_id':ObjectId('...')}) any time we want to look up the user information? The alternative of MongoHub mac app is nice (and free) but clunky to use and not web based.

REST API app with Express.js and Monk

Ruby enthusiasts like to compare Express to Sinatra framework. It’s similarly flexible in the way how developers can build there apps. Application routes are set up in a similar manner, i.e., app.get('/products/:id', showProduct);. Currently Express.js is at version number 3.1. In addition to Express we’ll use Monk module.

We’ll use Node Package Manager which is usually come with a Node.js installation. If you don’t have it already you can get it at npmjs.org.

Create a new folder and NPM configuration file, package.json, in it with the following content:

{
  "name": "mongoui",
  "version": "0.0.1",
  "engines": {
    "node": ">= v0.6"
  },
  "dependencies": {
    "mongodb":"1.2.14",
    "monk": "0.7.1",
    "express": "3.1.0"
  }
}

Now run npm install to download and install modules into node_module folder. If everything went okay you’ll see bunch of folders in node_modules folders. All the code for our application will be in one file, index.js, to keep it simple stupid:

var mongo = require('mongodb');
var express = require('express');
var monk = require('monk');
var db =  monk('localhost:27017/test');
var app = new express();

app.use(express.static(__dirname + '/public'));
app.get('/',function(req,res){
  db.driver.admin.listDatabases(function(e,dbs){
      res.json(dbs);
  });
});
app.get('/collections',function(req,res){
  db.driver.collectionNames(function(e,names){
    res.json(names);
  })
});
app.get('/collections/:name',function(req,res){
  var collection = db.get(req.params.name);
  collection.find({},{limit:20},function(e,docs){
    res.json(docs);
  })
});
app.listen(3000)

Let break down the code piece by piece. Module declaration:

var mongo = require('mongodb');
var express = require('express');
var monk = require('monk');

Database and Express application instantiation:

var db =  monk('localhost:27017/test');
var app = new express();

Tell Express application to load and server static files (if there any) from public folder:

app.use(express.static(__dirname + '/public'));

Home page, a.k.a. root route, set up:

app.get('/',function(req,res){
  db.driver.admin.listDatabases(function(e,dbs){
      res.json(dbs);
  });
});

get() function just takes two parameters: string and function. The string can have slashes and colons, for example product/:id. The function must have two parapemets request and response. Request has all the information like query string parameters, session, headers and response is an object to with we output the results. In this case we do it by calling res.json() function. db.driver.admin.listDatabases() as you might guess give us a list of databases in async manner.

Two other routes are set up in a similar manner with get() function:

app.get('/collections',function(req,res){
  db.driver.collectionNames(function(e,names){
    res.json(names);
  })
});
app.get('/collections/:name',function(req,res){
  var collection = db.get(req.params.name);
  collection.find({},{limit:20},function(e,docs){
    res.json(docs);
  })
});

Express conveniently supports other HTTP verbs like post and update. In the case of setting up a post route we write this:

app.post('product/:id',function(req,res) {...});

Express also has support for middeware. Middleware is just a request function handler with three parameters: request, response, and next. For example:

app.post('product/:id', authenticateUser, validateProduct, addProduct);

function authenticateUser(req,res, next) {
  //check req.session for authentication
  next();
}

function validateProduct (req, res, next) {
   //validate submitted data
   next();
}

function addProduct (req, res) {
  //save data to database
}

validateProduct and authenticateProduct are middleware. They are usually put into separate file (or files) in a big projects.

Another way to set up middle ware in Express application is to use use() function. For example earlier we did this for static assets:

app.use(express.static(__dirname + '/public'));

We can also do it for error handlers:

app.use(errorHandler);

Assuming you have mongoDB installed this app will connect to it (localhost:27017) and display collection name and items in collections. To start mongo server:

$ mongod

to run app (keep the mongod terminal window open):

$ node .

or

$ node index.js

To see the app working, open http://localhost:3000 in Chrome with JSONViewer extension (to render JSON nicely).

Tom Hanks' The Polar Express
Tom Hanks’ The Polar Express

Test-Driven Development in Node.js With Mocha

Don’t waste time writing tests for throwaway scripts, but please adapt the habit of Test-Driven Development for the main code base. With a little time spent in the beginning, you and your team will save time later and have confidence when rolling out new releases. Test Driven Development is a really really really good thing.

Who needs Test-Driven Development?

Imagine that you need to implement a complex feature on top of an existing interface, e.g., a ‘like’ button on a comment. Without tests you’ll have to manually create a user, log in, create a post, create a different user, log in with a different user and like the post. Tiresome? What if you’ll need to do it 10 or 20 times to find and fix some nasty bug? What if your feature breaks existing functionality, but you notice it 6 months after the release because there was no test!

Mocha: simple, flexible, fun
Mocha: simple, flexible, fun

Don’t waste time writing tests for throwaway scripts, but please adapt the habit of Test-Driven Development for the main code base. With a little time spent in the beginning, you and your team will save time later and have confidence when rolling out new releases. Test Driven Development is a really really really good thing.

Quick Start Guide

Follow this quick guide to set up your Test-Driven Development process in Node.js with Mocha.

Install Mocha globally by executing this command:

$ sudo npm install -g mocha

We’ll also use two libraries, Superagent and expect.js by LeanBoost. To install them fire up npm commands in your project folder like this:

$ npm install superagent
$ npm install expect.js   

Open a new file with .js extension and type:

var request = require('superagent');
var expect = require('expect.js');

So far we’ve included two libraries. The structure of the test suite going to look like this:

describe('Suite one', function(){
  it(function(done){
  ...
  });
  it(function(done){
  ...
  });
});
describe('Suite two', function(){
  it(function(done){
  ...
  });
});

Inside of this closure we can write request to our server which should be running at localhost:8080:

...
it (function(done){
  request.post('localhost:8080').end(function(res){
    //TODO check that response is okay
  });
});
...

Expect will give us handy functions to check any condition we can think of:

...
expect(res).to.exist;
expect(res.status).to.equal(200);
expect(res.body).to.contain('world');
...

Lastly, we need to add done() call to notify Mocha that asynchronous test has finished its work. And the full code of our first test looks like this:

var request = require('superagent');
var expect = require('expect.js');
  
describe('Suite one', function(){
 it (function(done){
   request.post('localhost:8080').end(function(res){
    expect(res).to.exist;
    expect(res.status).to.equal(200);
    expect(res.body).to.contain('world');
    done();
   });
  });
});

If we want to get fancy, we can add before and beforeEach hooks which will, according to their names, execute once before the test (or suite) or each time before the test (or suite):

before(function(){
  //TODO seed the database
});
describe('suite one ',function(){
  beforeEach(function(){
    //todo log in test user
  });
  it('test one', function(done){
  ...
  });
});

Note that before and beforeEach can be placed inside or outside of describe construction.

To run our test simply execute:

$ mocha test.js

To use different report type:

$ mocha test.js -R list
$ mocha test.js -R spec

Asynchronicity in Node.js

One of the biggest advantages of using Node.js over Python or Ruby is that Node has a non-blocking I/O mechanism. To illustrate this let me use an example of a line in a Starbucks coffeeshop. Let’s pretend that each person standing in line for a drink is a task, and everything behind the counter — cashier, register, barista — is a server or server application. When we order a cup of regular drip coffee, like Pike, or hot tea, like Earl Grey, the barista makes it. While the whole line waits while that drink is made, and the person is charged the appropriate amount…

Non-Blocking I/O

One of the biggest advantages of using Node.js over Python or Ruby is that Node has a non-blocking I/O mechanism. To illustrate this, let me use an example of a line in a Starbucks coffee shop. Let’s pretend that each person standing in line for a drink is a task, and everything behind the counter — cashier, register, barista — is a server or server application. When we order a cup of regular drip coffee, like Pike, or hot tea, like Earl Grey, the barista makes it. The whole line waits while that drink is made, and the person is charged the appropriate amount.

Asynchronicity in Node.js
Asynchronicity in Node.js

Of course, we know that these kinds of drinks are easy to make; just pour the liquid and it’s done. But what about those fancy choco-mocha-frappe-latte-soy-decafs? What if everybody in line decides to order these time-consuming drinks? The line will be held up by each order, and it will grow longer and longer. The manager of the coffee shop will have to add more registers and put more baristas to work (or even stand behind the register him/herself). This is not good, right? But this is how virtually all server-side technologies work, except Node. Node is like a real Starbucks. When you order something, the barista yells the order to the other employee, and you leave the register. Another person gives their order while you wait for your state-of-the-art eye-opener in a paper cup. The line moves, the processes are executed asynchronously and without blocking the queue by waiting.

This is why Node.js blows everything else away (except maybe low-level C/C++) in terms of performance and scalability. With Node, you just don’t need that many CPUs and servers to handle the load.

Asynchronous Way of Coding

Asynchronicity requires a different way of thinking for programmers familiar with Python, PHP, C or Ruby. It’s easy to introduce a bug unintentionally by forgetting to end the execution of the code with a proper return expression.

Here is a simple example illustrating this scenario:

var test = function (callback) {
  return callback();  
  console.log('test') //shouldn't be printed
}

var test2 = function(callback){
  callback();
  console.log('test2') //printed 3rd
}

test(function(){
  console.log('callback1') //printed first
  test2(function(){
  console.log('callback2') //printed 2nd
  })
});

If we don’t use return callback() and just use callback() our string test2 will be printed (test is not printed).

callback1
callback2
tes2

For fun I’ve added a setTimeout() delay for the callback2 string, and now the order has changed:

var test = function (callback) {
  return callback();  
  console.log('test') //shouldn't be printed
}

var test2 = function(callback){
  callback();
  console.log('test2') //printed 2nd
}

test(function(){
  console.log('callback1') //printed first
  test2(function(){
    setTimeout(function(){
      console.log('callback2') //printed 3rd
    },100)
  })
});

Prints:

callback1
tes2
callback2

The last example illustrates that the two functions are independent of each other and run in parallel. The faster function will finish sooner than the slower one. Going back to our Starbucks examples, you might get your drink faster than the other person who was in front of you in the line. Better for people, and better for programs! :-)

Cheat Sheets for Web Development

Web development usually involves a large number of languages each with its own syntax, keywords, special sauce and magic tricks. Here is a collection of web development cheat sheets, in no particular order, which I’ve amassed by browsing the Internet over many years of web development

Cheat sheets are great ways to organize frequently used information and keep it handy. I used cheat sheets for learning and memorizing during my crams at school, and use them now for reference.

Cheat sheet for Web Development
Cheat sheet for Web Development

Web development usually involves a large number of languages each with its own syntax, keywords, special sauce and magic tricks.
Here is a collection of web development cheat sheets, in no particular order, which I’ve amassed by browsing the Internet over many years of web development. They cover the following topics:

  • jQuery
  • CSS3
  • Git
  • Heroku
  • HTML5
  • Linux Command Line
  • Mod reWrite
  • CoffeeScript
  • JavaScript
  • CSS2
  • JavaScript DOM
  • Mac Glyphs
  • Node.js
  • PHP
  • RGB Hex
  • Sublime Text 2
  • SEO
  • WordPress

Get zip archive at http://www.webapplog.com/wp-content/uploads/web-dev-cheat-sheets.zip.

Full list of the files:


jquery12_colorcharge.png
cdiehl_firefox.pdf
CSS Help Sheet outlined.pdf
css-cheat-sheet-v2.pdf
css-cheat-sheet.pdf
CSS3 Help Sheet outlined.pdf
css3-cheat-sheet.pdf
dan-schmidt_jquery-utility-functions-type-testing.pdf
danielschmitz_jquery-mobile.pdf
davechild_css2.pdf
davechild_html-character-entities.pdf
davechild_javascript.pdf
davechild_linux-command-line.pdf
davechild_mod-rewrite.pdf
davechild_regular-expressions.pdf
dimitrios_coffeescript-cheat-sheet.pdf
gelicia_sublime-text-2-shortcuts-verbose-mac.pdf
Git_Cheat_Sheet_grey.pdf
heroku_cheatsheet.pdf
HTML Help Sheet 02.pdf
html5-cheat-sheet.pdf
i3quest_jquery.pdf
javascript_refererence.pdf
javascript-cheat-sheet-v1.pdf
JavaScript-DOM-Cheatsheet.pdf
javascriptcheatsheet.pdf
jQuery-1_3-Visual-Cheat-Sheet.pdf
Mac_Glyphs_All.pdf
Node Help Sheet.pdf
PHP Help Sheet 01.pdf
pyro19d_javascript.pdf
rgb-hex-cheat-sheet-v1.pdf
salesforce_git_developer_cheatsheet.pdf
samcollett_git.pdf
sanoj_web-programming.pdf
SEO_Web_Developer_Cheat_Sheet.pdf
skrobul_sublime-text-2-linux.pdf
WordPress-Help-Sheet.pdf
Help Sheets.zip

MongoDB migration with Node and Monk

Recently one of our top users complained that their Storify account is unaccessible. We’ve checked the production database and it appeared to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks for a great MongoHQ service we had a backup database in less than 15 minutes.

Recently one of our top users complained that their Storify account was unaccessible. We’ve checked the production database and it appeares to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks to a great MongoHQ service, we had a backup database in less than 15 minutes.
There were two options to proceed with the migration:

  1. Mongo shell script
  2. Node.js program

Because Storify user account deletion involves deletion of all related objects — identities, relationships (followers, subscriptions), likes, stories — we’ve decided to proceed with the latter option. It worked perfectly, and here is a simplified version which you can use as a boilerplate for MongoDB migration (also at gist.github.com/4516139).

Restoring MongoDB Records
Restoring MongoDB Records

Let’s load all the modules we need: Monk, Progress, Async, and MongoDB:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ObjectId=require('mongodb').ObjectID;

By the way, made by LeanBoost, Monk is a tiny layer that provides simple yet substantial usability improvements for MongoDB usage within Node.JS.

Monk takes connection string in the following format:

username:password@dbhost:port/database

So we can create the following objects:

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

We need to know the object ID which we want to restore:

var userId = ObjectId(YOUR-OBJECT-ID); 

This is a handy restore function which we can reuse to restore objects from related collections by specifying query (for more on MongoDB queries go to post Querying 20M-Record MongoDB Collection. To call it, just pass a name of the collection as a string, e.g., "stories" and a query which associates objects from this collection with your main object, e.g., {userId:user.id}. The progress bar is needed to show us nice visuals in the terminal.

var restore = function(collection, query, callback){
  console.info('restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
        console.error(e);
        bar.tick();
      }
      else {
        bar.tick();
      }
      if (bar.complete) {
        console.log();
        console.log('restoring '+collection+' is completed');
        callback();                
      }
    };
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
      });        
    } else {
      callback();
    }
  });
}

Now we can use async to call the restore function mentioned above:

async.series({
  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
          console.log(e);
        }
        else {
          console.log('resored user: '+ user.username);
        }
        callback();
      });
    });
  },

  restoreIdentity: function(callback){  
    restore('identities',{
      userid:userId
    }, callback);
  },

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);
  }

  }, function(e) {
  console.log();
  console.log('restoring is completed!');
  process.exit(1);
});

The full code is available at gist.github.com/4516139 and here:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ms = require('ms');
var ObjectId=require('mongodb').ObjectID;

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

var userId = ObjectId(YOUR-OBJECT-ID); // monk should have auto casting but we need it for queries

var restore = function(collection, query, callback){
  console.info('restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
        console.error(e);
        bar.tick();
      }
      else {
        bar.tick();
      }
      if (bar.complete) {
        console.log();
        console.log('restoring '+collection+' is completed');
        callback();                
      }
    };
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
      });        
    } else {
      callback();
    }
  });
}

async.series({
  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
          console.log(e);
        }
        else {
          console.log('resored user: '+ user.username);
        }
        callback();
      });
    });
  },

  restoreIdentity: function(callback){  
    restore('identities',{
      userid:userId
    }, callback);
  },

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);
  }

  }, function(e) {
  console.log();
  console.log('restoring is completed!');
  process.exit(1);
});
           

To launch it, run npm install/update and change hard-coded database values.

Wintersmith — Node.js static site generator

This past weekend was a very productive one for me, because I’ve started to work on and released my book’s one-page website —rapidprototypingwithjs.com. I’ve used Wintersmith to learn something new and to ship fast. Wintersmith is a Node.js static site generator. It greatly impressed me with flexibility and ease of development. In addition I could stick to my favorite tools such as Markdown, Jade and Underscore.

This past weekend was a very productive one for me, because I’ve started to work on and released my book’s one-page website —rapidprototypingwithjs.com. I’ve used Wintersmith to learn something new and to ship fast. Wintersmith is a Node.js static site generator. It greatly impressed me with flexibility and ease of development. In addition I could stick to my favorite tools such as Markdown, Jade and Underscore.

Wintersmith is a Node.js static site generator

Why Static Site Generators

Here is a good article on why using a static site generator is a good idea in general, An Introduction to Static Site Generators. It basically boils down to a few main things:

Templates

You can use template engine such as Jade. Jade uses whitespaces to structure nested elements and its syntax is similar to Ruby on Rail’s Haml markup.

Markdown

I’ve copied markdown text from my book’s Introduction chapter and used it without any modifications. Wintersmith comes with marked parser by default. More on why Markdown is great in my old post, Markdown Goodness.

Simple Deployment

Everything is HTML, CSS and JavaScript so you just upload the files with FTP client, e.g., Transmit by Panic or Cyberduck.

Basic Hosting

Due to the fact that any static web server will work well, there is no need for Heroku or Nodejitsu PaaS solutions, or even PHP/MySQL hosting.

Performance

There are no database calls, no server-side API calls, no CPU/RAM overhead.

Flexibility

Wintersmith allows for different plugins for contents and templates and you can even write you own plugin.

Getting Started with Wintersmith

There is a quick getting started guide on github.com/jnordberg/wintersmith.

To install Wintersmith globally, run NPM with -g and sudo:

$ sudo npm install wintersmith -g

Then run to use default blog template:

$ wintersmith new <path>

or for empty site:

$ wintersmith new <path> -template basic

or use a shortcut:

$ wintersmith new <path> -T basic

Similar to Ruby on Rails scaffolding Wintersmith will generate a basic skeleton with contents and templates folders. To preview a website, run these commands:

$ cd <path>
$ wintersmith preview
$ open http://localhost:8080

Most of the changes will be updates automatically in the preview mode except for the config.json file.

Images, CSS, JavaScript and other files go into contents folder.
Wintersmith generator has the following logic:

  1. looks for *.md files in contents folder,
  2. reads metadata such as template name,
  3. processes *.jade templates per metadate in *.md files.

When you’re done with your static site, just run:

$ wintersmith build

Other Static Site Generators

Here are some of the other Node.js static site generators:

More detailed overview of these static site generators is available in the post, Node Based Static Site Generators.

For other languages and frameworks like Rails and PHP take a look at Static Site Generators by GitHub Watcher Count and the “mother of all site generator lists”.

Querying 20M-Record MongoDB Collection

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

The script was simple: find elements, see if there are no dependencies, delete orphan elements, neveretheless it was timing out or just becoming unresponsive. After a few hours of running different modifications I came up with the working solution.

Here are some of the suggestions when dealing with big collections on Node.js + MongoDB stack:

Befriend Shell

Interactive shell, or mongo, is a good place to start. To launch it, just type mongo in your terminal window:

$ mongo

Assuming you have correct paths set-up during your MongoDB installation, the command will start the shell and present angle brace.

>

Use JS files

To execute JavaScript file in a Mongo shell run:

$ mongo fix.js --shell

Queries look the same:

db.elements.find({...}).limit(10).forEach(printjson);

To output results use:

print();

or

printjson();

To connect to a database:

db = connect("<host>:<port>/<dbname>")

Break Down

Separate your query into a few scripts with smaller queries. You can output each script to a file (as JSON or CSV) and then look at the output and see if your script is doing what it is actually supposed to do.

To execute JavaScript file (fix.js) and output results into another file (fix.txt) instead of the screen, use:

$ mongo fix.js > fix.txt --shell

or

$ mongo --quiet fix.js > fix.txt --shell

Check count()

Simply run count() to see the number of elements in the collection:

 db.collection.count();

or a cursor:

 db.collection.find({…}).count();

Use limit()

You can apply limit() function to your cursor without modifying anything else in a script to test the output without spending too much time waiting for the whole result.

For example:

 db.find({…}).limit(10).forEach(function() {…});

or

 db.find({…}).limit(1).forEach(function() {…});

is better than using:

 db.findOne({…})

because findOne() returns single document while find() and limit() still returns a cursor.

Hit Index

hint() index will allow you to manually use particular index:

 db.elemetns.find({…}).hint({active:1, status:1, slug:1});

Make sure you have actual indexes with ensureIndex():

 db.collection.ensureIndex({…})

Narrow Down

Use additional criteria such as $ne, $where, $in, e.g.:

db.elements.find({ $and:[{type:'link'}
  ,{"source.href":{$exists:true}}
  ,{'date.created':{$gt: new Date("November 30 2012")}}
  ,{$where: function () {
    if (this.meta&&this.data&&this.data&&this.data.link) {
      return this.meta.title!=this.data.link.title;
    } else {
      return false;
    }}} 
  , {'date.created': {$lt: new Date("December 2 2012")}}]}).forEach(function(e, index, array){
    print(e._id.str);
    });