Nested Objects in Mongoose

There is a certain magic in ORMs like Mongoose. I learned it the hard way (as usual!), when I was trying to iterate over nested object’s properties…

There is a certain magic in ORMs like Mongoose. I learned it the hard way (as usual!), when I was trying to iterate over nested object’s properties. For example, here is a schema with a nested object features defines like this:

var User = module.exports = new Schema({
  features: { 
    realtime_updates: {
      type: Boolean
    },
    storylock: {
      type: Boolean
    },
    custom_embed_style: {
      type: Boolean
    },
    private_stories: {
      type: Boolean
    },
    headerless_embed:{
      type: Boolean
    }
};

Let’s say I want to overwrite object features_enabled with these properties:

if (this.features) { 
  for (var k in this.features) {
    features_enabled[k] = this.features[k];
  }
}
console.log(features_enabled)
return features_enabled;

Not so fast, I was getting a lot of system properties specific to Mongoose. Instead we need to use toObject(), e.g.:

if (this.features.toObject()) { 
  for (var k in this.features.toObject()) {
    console.log('!',k)
    features_enabled[k] = this.features.toObject()[k];
  }
}

Remember rule number one, computer is always right. If we think that it’s wrong — look up the rule number one. :-)

MongoDB migration with Node and Monk

Recently one of our top users complained that their Storify account is unaccessible. We’ve checked the production database and it appeared to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks for a great MongoHQ service we had a backup database in less than 15 minutes.

Recently one of our top users complained that their Storify account was unaccessible. We’ve checked the production database and it appeares to be that the account might have been compromised and maliciously deleted by somebody using user’s account credentials. Thanks to a great MongoHQ service, we had a backup database in less than 15 minutes.
There were two options to proceed with the migration:

  1. Mongo shell script
  2. Node.js program

Because Storify user account deletion involves deletion of all related objects — identities, relationships (followers, subscriptions), likes, stories — we’ve decided to proceed with the latter option. It worked perfectly, and here is a simplified version which you can use as a boilerplate for MongoDB migration (also at gist.github.com/4516139).

Restoring MongoDB Records
Restoring MongoDB Records

Let’s load all the modules we need: Monk, Progress, Async, and MongoDB:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ObjectId=require('mongodb').ObjectID;

By the way, made by LeanBoost, Monk is a tiny layer that provides simple yet substantial usability improvements for MongoDB usage within Node.JS.

Monk takes connection string in the following format:

username:password@dbhost:port/database

So we can create the following objects:

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

We need to know the object ID which we want to restore:

var userId = ObjectId(YOUR-OBJECT-ID); 

This is a handy restore function which we can reuse to restore objects from related collections by specifying query (for more on MongoDB queries go to post Querying 20M-Record MongoDB Collection. To call it, just pass a name of the collection as a string, e.g., "stories" and a query which associates objects from this collection with your main object, e.g., {userId:user.id}. The progress bar is needed to show us nice visuals in the terminal.

var restore = function(collection, query, callback){
  console.info('restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
        console.error(e);
        bar.tick();
      }
      else {
        bar.tick();
      }
      if (bar.complete) {
        console.log();
        console.log('restoring '+collection+' is completed');
        callback();                
      }
    };
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
      });        
    } else {
      callback();
    }
  });
}

Now we can use async to call the restore function mentioned above:

async.series({
  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
          console.log(e);
        }
        else {
          console.log('resored user: '+ user.username);
        }
        callback();
      });
    });
  },

  restoreIdentity: function(callback){  
    restore('identities',{
      userid:userId
    }, callback);
  },

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);
  }

  }, function(e) {
  console.log();
  console.log('restoring is completed!');
  process.exit(1);
});

The full code is available at gist.github.com/4516139 and here:

var async = require('async');
var ProgressBar = require('progress');
var monk = require('monk');
var ms = require('ms');
var ObjectId=require('mongodb').ObjectID;

var dest = monk('localhost:27017/storify_localhost');
var backup = monk('localhost:27017/storify_backup');

var userId = ObjectId(YOUR-OBJECT-ID); // monk should have auto casting but we need it for queries

var restore = function(collection, query, callback){
  console.info('restoring from ' + collection);
  var q = query;
  backup.get(collection).count(q, function(e, n) {
    console.log('found '+n+' '+collection);
    if (e) console.error(e);
    var bar = new ProgressBar('[:bar] :current/:total :percent :etas', { total: n-1, width: 40 })
    var tick = function(e) {
      if (e) {
        console.error(e);
        bar.tick();
      }
      else {
        bar.tick();
      }
      if (bar.complete) {
        console.log();
        console.log('restoring '+collection+' is completed');
        callback();                
      }
    };
    if (n>0){
      console.log('adding '+ n+ ' '+collection);
      backup.get(collection).find(q, { stream: true }).each(function(element) {
        dest.get(collection).insert(element, tick);
      });        
    } else {
      callback();
    }
  });
}

async.series({
  restoreUser: function(callback){   // import user element
    backup.get('users').find({_id:userId}, { stream: true, limit: 1 }).each(function(user) {
      dest.get('users').insert(user, function(e){
        if (e) {
          console.log(e);
        }
        else {
          console.log('resored user: '+ user.username);
        }
        callback();
      });
    });
  },

  restoreIdentity: function(callback){  
    restore('identities',{
      userid:userId
    }, callback);
  },

  restoreStories: function(callback){
    restore('stories', {authorid:userId}, callback);
  }

  }, function(e) {
  console.log();
  console.log('restoring is completed!');
  process.exit(1);
});
           

To launch it, run npm install/update and change hard-coded database values.

Querying 20M-Record MongoDB Collection

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

The script was simple: find elements, see if there are no dependencies, delete orphan elements, neveretheless it was timing out or just becoming unresponsive. After a few hours of running different modifications I came up with the working solution.

Here are some of the suggestions when dealing with big collections on Node.js + MongoDB stack:

Befriend Shell

Interactive shell, or mongo, is a good place to start. To launch it, just type mongo in your terminal window:

$ mongo

Assuming you have correct paths set-up during your MongoDB installation, the command will start the shell and present angle brace.

>

Use JS files

To execute JavaScript file in a Mongo shell run:

$ mongo fix.js --shell

Queries look the same:

db.elements.find({...}).limit(10).forEach(printjson);

To output results use:

print();

or

printjson();

To connect to a database:

db = connect("<host>:<port>/<dbname>")

Break Down

Separate your query into a few scripts with smaller queries. You can output each script to a file (as JSON or CSV) and then look at the output and see if your script is doing what it is actually supposed to do.

To execute JavaScript file (fix.js) and output results into another file (fix.txt) instead of the screen, use:

$ mongo fix.js > fix.txt --shell

or

$ mongo --quiet fix.js > fix.txt --shell

Check count()

Simply run count() to see the number of elements in the collection:

 db.collection.count();

or a cursor:

 db.collection.find({…}).count();

Use limit()

You can apply limit() function to your cursor without modifying anything else in a script to test the output without spending too much time waiting for the whole result.

For example:

 db.find({…}).limit(10).forEach(function() {…});

or

 db.find({…}).limit(1).forEach(function() {…});

is better than using:

 db.findOne({…})

because findOne() returns single document while find() and limit() still returns a cursor.

Hit Index

hint() index will allow you to manually use particular index:

 db.elemetns.find({…}).hint({active:1, status:1, slug:1});

Make sure you have actual indexes with ensureIndex():

 db.collection.ensureIndex({…})

Narrow Down

Use additional criteria such as $ne, $where, $in, e.g.:

db.elements.find({ $and:[{type:'link'}
  ,{"source.href":{$exists:true}}
  ,{'date.created':{$gt: new Date("November 30 2012")}}
  ,{$where: function () {
    if (this.meta&&this.data&&this.data&&this.data.link) {
      return this.meta.title!=this.data.link.title;
    } else {
      return false;
    }}} 
  , {'date.created': {$lt: new Date("December 2 2012")}}]}).forEach(function(e, index, array){
    print(e._id.str);
    });