Querying 20M-Record MongoDB Collection

Storify saves a lot of meta data about social elements: tweets, Facebook status updates, blog posts, news articles, etc. MongoDB is great for storing such unstructured data but last week I had to fix some inconsistency in 20-million-record Elements collection.

The script was simple: find elements, see if there are no dependencies, delete orphan elements, neveretheless it was timing out or just becoming unresponsive. After a few hours of running different modifications I came up with the working solution.

Here are some of the suggestions when dealing with big collections on Node.js + MongoDB stack:

Befriend Shell

Interactive shell, or mongo, is a good place to start. To launch it, just type mongo in your terminal window:

$ mongo

Assuming you have correct paths set-up during your MongoDB installation, the command will start the shell and present angle brace.


Use JS files

To execute JavaScript file in a Mongo shell run:

$ mongo fix.js --shell

Queries look the same:


To output results use:




To connect to a database:

db = connect("<host>:<port>/<dbname>")

Break Down

Separate your query into a few scripts with smaller queries. You can output each script to a file (as JSON or CSV) and then look at the output and see if your script is doing what it is actually supposed to do.

To execute JavaScript file (fix.js) and output results into another file (fix.txt) instead of the screen, use:

$ mongo fix.js > fix.txt --shell


$ mongo --quiet fix.js > fix.txt --shell

Check count()

Simply run count() to see the number of elements in the collection:


or a cursor:


Reading blog posts is good, but watching video courses is even better because they are more engaging.

A lot of developers complained that there is a lack of affordable quality video material on Node. It's distracting to watch to YouTube videos and insane to pay $500 for a Node video course!

Go check out Node University which has FREE video courses on Node: node.university.

[End of sidenote]


Use limit()

You can apply limit() function to your cursor without modifying anything else in a script to test the output without spending too much time waiting for the whole result.

For example:

 db.find({…}).limit(10).forEach(function() {…});


 db.find({…}).limit(1).forEach(function() {…});

is better than using:


because findOne() returns single document while find() and limit() still returns a cursor.

Hit Index

hint() index will allow you to manually use particular index:

 db.elemetns.find({…}).hint({active:1, status:1, slug:1});

Make sure you have actual indexes with ensureIndex():


Narrow Down

Use additional criteria such as $ne, $where, $in, e.g.:

db.elements.find({ $and:[{type:'link'}
  ,{'date.created':{$gt: new Date("November 30 2012")}}
  ,{$where: function () {
    if (this.meta&&this.data&&this.data&&this.data.link) {
      return this.meta.title!=this.data.link.title;
    } else {
      return false;
  , {'date.created': {$lt: new Date("December 2 2012")}}]}).forEach(function(e, index, array){

Best Regards,
Azat Mardan
Microsoft MVP | Book and Course Author | Software Engineering Leader
Azat Mardan avatar
To contact Azat, the main author of this blog, submit the contact form or schedule a call at clarity.fm/azat and we can go over your bugs, questions and career.

3 thoughts on “Querying 20M-Record MongoDB Collection

  1. Pingback: MongoDB migration with Node and Monk | Web App Log

  2. Tony

    Love your tutorials, I want to buy your book but I can’t afford, I am full time student and father of 3 can you help with a coupon?

    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.