Installing a MongoDB replica set and backup script

While I generally stick to Relational databases such as mySQL, there are times when you really only need a document store and the right tool for the job is something else. In this case, I needed to install and configure a mac machine for a development environment that replicated the production one. A small replica set of 3 nodes all running MongoDB.

In addition to installation and configuring, it's also prudent to set up how to backup and restore your data. So this post will cover that as well:

Installing MongoDB on Mac

First off, we have to download mongodb:

curl -O http://downloads.mongodb.org/osx/mongodb-osx-x86_64-3.0.2.tgz

And being security conscious, we'll check that the download is valid by computing it's SHA hashes:

curl -LO http://downloads.mongodb.org/osx/mongodb-osx-x86_64-3.0.2.tgz.sha1
curl -LO http://downloads.mongodb.org/osx/mongodb-osx-x86_64-3.0.2.tgz.sha256

shasum mongodb-osx-x86_64-3.0.2.tgz
9018f01e80eef7428f57ad23cf1e8bcbeea6b472  mongodb-osx-x86_64-3.0.2.tgz

cat mongodb-osx-x86_64-3.0.2.tgz.sha1 
9018f01e80eef7428f57ad23cf1e8bcbeea6b472  mongodb-osx-x86_64-3.0.2.tgz 

shasum -a 256 mongodb-osx-x86_64-3.0.2.tgz
6d435f66cc25a888ab263be27106abe7ef8067189199869f3e7e8126757f5286  mongodb-osx-x86_64-3.0.2.tgz

cat mongodb-osx-x86_64-3.0.2.tgz.sha256 
6d435f66cc25a888ab263be27106abe7ef8067189199869f3e7e8126757f5286  mongodb-osx-x86_64-3.0.2.tgz

Convinced in the authenticity of the files we can move onto installing:

tar -zxvf mongodb-osx-x86_64-3.0.2.tgz
mkdir /opt/local/mongodb
sudo cp -R -n mongodb-osx-x86_64-3.0.2/ /opt/local/mongodb
echo 'export PATH=/opt/local/mongodb/bin:$PATH' >> ~/.profile
source ~/.profile
mkdir -p /opt/local/mongodb/data/db

At this point you're almost ready, you just need to set the permissions correctly. Namely that whoever is starting the mongo process can read and write to the data directory.

sudo chown -R user:group /opt/local/mongodb

And now it will run when you run mongo via the mongod command:

mongod --dbpath /opt/local/mongodb/data/db

Setting up a local Replica Set

With the basics done and ready, we can now configure and create a 3 node replica set. Mongodb's website provides documentation on this but for completeness, I'll reproduce it here specifically to the setup described above:

First, the data directories for each replica:

mkdir -p /opt/local/mongodb/data/srv/mongodb/rs0-{0,1,2}

Next, we'll make it easier to start up our local cluster:

osascript -e 'tell app "Terminal"
	do script "mongod --port 27017 --dbpath /opt/local/mongodb/data/srv/mongodb/rs0-0 --replSet rs0 --smallfiles --oplogSize 128"
	do script "mongod --port 27018 --dbpath /opt/local/mongodb/data/srv/mongodb/rs0-1 --replSet rs0 --smallfiles --oplogSize 128"
	do script "mongod --port 27019 --dbpath /opt/local/mongodb/data/srv/mongodb/rs0-2 --replSet rs0 --smallfiles --oplogSize 128"
end tell'

Place this code into a script, called: 'startmongo.sh' and chmod +x the file. Run it from your command line ./startmongo.sh and you'll see three windows appear, each running an instance of mongod.

The arguments to each mongod instance are pretty regular, the only ones which may be out of the ordinary are --smallfiles and --oplogSize, which the documentation says:

The --smallfiles and --oplogSize settings reduce the disk space that each mongod instance uses. This is ideal for testing and development deployments as it prevents overloading your machine. For more information on these and other configuration options, see Configuration File Options.

Running 3 instances of mongo does not give us a replica set quite yet. We need to tell the servers to be one first, so connect to one of them:

mongo --port 27017

Then create the configuration object:

rsconf = {
       _id: "rs0",
       members: [
                  {
                   _id: 0,
                   host: "localhost:27017"
                  }
                ]
     }
rs.initiate(rsconf)
rs.add("localhost:27018")
rs.add("localhost:27019")
rs.status()

and the last command should display something like so:

{
	"set" : "rs0",
	"date" : ISODate("2015-05-01T17:26:11.546Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "localhost:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 4607,
			"optime" : Timestamp(1430501157, 1),
			"optimeDate" : ISODate("2015-05-01T17:25:57Z"),
			"electionTime" : Timestamp(1430501100, 2),
			"electionDate" : ISODate("2015-05-01T17:25:00Z"),
			"configVersion" : 3,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "localhost:27018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 18,
			"optime" : Timestamp(1430501157, 1),
			"optimeDate" : ISODate("2015-05-01T17:25:57Z"),
			"lastHeartbeat" : ISODate("2015-05-01T17:26:09.903Z"),
			"lastHeartbeatRecv" : ISODate("2015-05-01T17:26:11.430Z"),
			"pingMs" : 0,
			"lastHeartbeatMessage" : "could not find member to sync from",
			"configVersion" : 3
		},
		{
			"_id" : 2,
			"name" : "localhost:27019",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 13,
			"optime" : Timestamp(1430501157, 1),
			"optimeDate" : ISODate("2015-05-01T17:25:57Z"),
			"lastHeartbeat" : ISODate("2015-05-01T17:26:09.903Z"),
			"lastHeartbeatRecv" : ISODate("2015-05-01T17:26:10.027Z"),
			"pingMs" : 0,
			"configVersion" : 3
		}
	],
	"ok" : 1
}

Backing up your data

First we'll make a little bit of test data. From within the mongo console you're running:

use site
db.Site.insert( { page: "home", "url" : "/" } )
db.Site.insert( { page: "contact" , url : "/contact"})

We could use the mongodump and mongorestore tools to do out backup. But, these tools take a while if you've got a lot of data. While in our example we obviously don't, an example is only as useful as it is scalable. Plus, the two tools don't lock the writes to the database, and therefore during the long period of time a backup or restore can take, the data may become inconsistent with the snapshot we're taking.

Since that's somewhat counter to the point of making a backup. We'll elect a file system snapshot instead. First we'll lock the database from writes:

db.fsyncLock()

Once the database is locked, you can perform a snapshot of the data directory itself easily:

cd /opt/local/mongodb/
tar -czf data.bak.tar.gz data

Note that the back up will have a mongod.lock file in each directory so you'll have to remove that before you restore from the backup.

Once you've saved a copy of the data directory you can unlock the database:

db.fsyncUnlock()

This does have some caveats of course. When done on a local machine such as what we've configured, it's easy to get a snapshot of each replica all in one go. On an actual production setup, each instance would likely be on a different server. So you'd need to ssh to the servers. To determine which node is the primary you can ask any of the servers:

db.serverStatus().repl.primary

Then go from there. Here's a full cli session pretending to have a hemoraging database:

# start up the mongo cluster
./start-mongodb.sh

# 3 terminals should appear and we'll see our data if we ask for it:
mongo --port 27017
use site
db.Site.find()
{ "_id" : ObjectId("5543b9ac53eeee01a167b662"), "page" : "home", "url" : "/" }
{ "_id" : ObjectId("5543c6ea53eeee01a167b663"), "page" : "contact", "url" : "/contact" }

# Now we backup our data:
db.fsyncLock()
{
	"info" : "now locked against writes, use db.fsyncUnlock() to unlock",
	"seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
	"ok" : 1
}

exit
cd /opt/local/mongodb
tar -czf data.bak.tar.gz data

# Now we connect to mongo and let it go continue:

mongo --port 27017
db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }

# And now let's screw up our data and such
exit
mv data data.corrupt
killall mongod # you will see errors here

# Pretend we noticed mongo was down and decided to restart it:
./start-mongodb.sh

# Oh no we have bad data! Guess we need to restore out data!
Shutdown each mongo cluster

killall mongodb

# or use service mongo stop or what have you. Then, restore:

tar -xzf data.bak.tar.gz 

# remove the lock files
rm data/srv/mongodb/rs0-0/mongod.lock 

# start mongo services
./start-mongodb.sh

mongo --port 27017
use site
db.Site.find()
{ "_id" : ObjectId("55453e54eb69b58f76c761f7"), "page" : "home", "url" : "/" }
{ "_id" : ObjectId("55453e66eb69b58f76c761f8"), "page" : "contact", "url" : "/contact" }

# Phew we're safe!

In general, snapshotting the file system is the safest and best way to preserve all your data. Using the mongo dump and restore tools is ok if you don't have much data, or if your data isn't complex enough to need its entire BSON data saved. The best part is you can take the zipped files and save them to a backup server for safety or local use!