Skip to content

mongodb

installation

In linux debian, it does have an install candidate : V2.4 which works.

Note that v 2.4 cannot handle 3D geo coordinate search, 3.4 and up do. Stretch has version 3, jessie does NOT.

If switching to mongodb-org, the legacy driver will be removed by apt-get. You will need to manually compile libmongocxx to compile c++

I will presume you want to install the latest version, in that case you need an extra apt.source.

If you are upgrading to 3.6 from 2.X and also have data in you mongodb, you need to install 3.4 first. see upgrade below.

Otherwise i presume the latest, which is 3.6 at the time of writing. <https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/>_

changetitle
1
2
3
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5
echo "deb http://repo.mongodb.org/apt/debian jessie/mongodb-org/3.6 main" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list
apt-get install mongodb-org

If no errors, the mongodb daemon should be running, try to connect with the command line tool mongo :

changetitle
mongo

upgrade

It is best to install 3.4 if you have data in you database : <https://docs.mongodb.com/v3.4/tutorial/install-mongodb-on-debian/>_

Probably the data will be readable by 3.6 if you started the database under 3.4 . I stuck with 3.4 and have not tried 3.6 after that.

quick start

Now try out the web interface at port 28017, http://localhost:28017/ A very terse set of instructions :

changetitle
mongo
use newdb;
show dbs;
row = { "member" : "value" }
db.collection.insert(row)
show collections
db.collection.find()
db.collection.remove()
show collections
db.collection.drop();
show collections
use newdb
db.dropDatabase()
show dbs
exit

overview

As database is created when you use it so :

changetitle
use bag
show dbs

Yields:

changetitle
bag (empty)
mydb 0.203125GB

Closest object to a collection in mysql would be table, but of course it has no table structure, because mongodb enforces no structure to documents.

If you did the use bag commands before, you will probably get nothing when showing the collections :

changetitle
show collections

Again, collections are created when used, so read 'documents' below first or just do :

changetitle
j = { name : "mongo" }
k = { x : 3 }

These are 2 documents, insert them into (and create along the way) the collection coll

changetitle
1
2
3
db.coll.insert(j)
db.coll.insert(k)
db.coll.insert(k)

Yes, the db is mandatory, and it is not the name of the database, just always "db". After this you will have a collection:

changetitle
show collections
changetitle
1
2
3
4
<pre>
 coll
 system.indexes
</pre>

system.indexes was not there before either, it contains the indexes like :

changetitle
db.system.indexes.find()

Gives :

changetitle
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "bag.testData", "name" : "_id_" }
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "bag.coll", "name" : "_id_" }

And the command :

changetitle
db.coll.find()

will show you your coll collection:

changetitle
1
2
3
{ "_id" : ObjectId("51ed584d395b6f70b21b68a9"), "name" : "mongo" }
{ "_id" : ObjectId("51ed5850395b6f70b21b68aa"), "x" : 3 }
{ "_id" : ObjectId("51ed5851395b6f70b21b68ab"), "x" : 3 }

I inserted k twice, and you see that it get's a different id, but is the same for the remainder of the document.

changetitle
1
2
3
<pre>
 db.collection.remove()
</pre>

this will delete the contents of the collection, but it's still visible with show collections

<pre>
 db.collection.drop()
</pre>
will wipe that entry as well.

<h4>
Document

</h4>
Closest object in mysql would be a table row.

How to create is already shown above, but in detail it is a free form json document and it will get a generated unique id when inserting it into a collection.

Actually... it is not json but [bson]

<h4>
import and export

</h4>
To get bulk data into mongodb, these commands come in handy. Let's start with mongoexport, because that will give a file with the correct format to import again.

<pre>
 mongoexport --db bag --collection coll --out out.json
</pre>
It will give something like this (not unexpectedly ;) :

<pre>
 { "_id" : { "$oid" : "51ed584d395b6f70b21b68a9" }, "name" : "mongo" }
 { "_id" : { "$oid" : "51ed5850395b6f70b21b68aa" }, "x" : 3 }
 { "_id" : { "$oid" : "51ed5851395b6f70b21b68ab" }, "x" : 3 }
</pre>
So it is my guess that mongoimport will accept this as input :

changetitle
1
2
3
{ "name" : "newmongo" }
{ "x" : 33 }
{ "newfield" : 88 }

My next guess was almost correct, but the file parameter is --file, not --in so :

<pre>
 mongoimport --db bag -- collection coll --file in.json
</pre>

searching

mongodb uses regular expressions for this. Some examples :

<pre>
 db.street.find()
</pre>
Gives tries to return all records, but it's limited to the first x (?) records. The x seems to be a screen full , you can see it's not all records by the closing statement "has more".

You can limit it explicitly, but it will still not show more than it's own limit:

<pre>
 db.street.find().limit(12)
</pre>
You can also apply a direct search, note that the key can be without quotes but the value was entered as a string and the quotes are needed.

<pre>
 db.street.find( {name : "Biesenhof"} )
</pre>
Result :

<pre>
 { "_id" : ObjectId("52028c76483cd6bd4c5a2e64"), "id" : "1883300000000160", "name" : "Biesenhof", "type" : "Weg", "city_id" : "2819" }
 { "_id" : ObjectId("52028c76483cd6bd4c5a2e65"), "id" : "1883300000000160", "name" : "Biesenhof", "type" : "Weg", "city_id" : "3512" }
</pre>
It appears two times, probably because this is a road ("Weg") between two cities (both records have a different city_id).

wildcards

You can use regular expressions to do some fuzzy search :

changetitle
db.authors.find({name: {$regex: '^a'});

Also to search for a value case-insensitive, use something like :

changetitle
db.number.find({postcode:{$regex: "3151aw", $options: '-i'} })

Though keep in mind it will be rather slow, for full speed convert everything (data and search) to one case, either upper or down.

indexes

Or is it indices , don't care...

I take an example of the bag database, in which all postal codes with all numbers occupy one document in the database so :

<pre>
 3151AW 248 
 3151AW 246
</pre>
Me and the neighbors are bot one 'row' in the database. A query on this with a bogus postal code (lower case 3151aw was enough already) took 6 seconds. That's not why I chose mongodb, it should be able to cope with shit loads of data. But here is how to add an index :

creating indexes

<pre>
 db.number.ensureIndex( { "postcode" : 1 })
</pre>
The 1 means ascending order (s you can probably guess what -1 will do 8)

The same find operation :

<pre>
 db.number.find({postcode:"3151aw" }
</pre>
Now returns instantly although it could not find anything. Do we need another index on number ? Well this also return directly (WITH a result) :

<pre>
 db.number.find({postcode:"3151AW", nummer:248 }
</pre>

index info

Showing your index can be done with :

changetitle
db.number.getIndexes()

Result :

changetitle
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "bag.number",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "postcode" : 1
                },
                "ns" : "bag.number",
                "name" : "postcode_1"
        }
]

As you can see, there is another index on "_id" (it is default for all collections).

index deletion

Deleting with :

<pre>
 db.number.dropIndex( { "postcode" : 1 } )
</pre>

geographical indexes

First of all, this will not work in the debian default mongo installation (2.0.6) you will need at least version 2.6. You can install it by adding this to your repository :

changetitle
echo 'deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list

And then install mongodb-org

changetitle
apt-get update
apt-get install mongodb-org

Now you're not done of course, because the client libs will not work anymore. Sigh... A newer version of these are at mongodb.org But it needs to be compiled from source. However that can't be called a hard job, but it takes some time :

changetitle
1
2
3
wget https://github.com/mongodb/mongo-cxx-driver/archive/legacy-0.0-26compat-2.6.3.tar.gz
tar -zxvf ~/Downloads/mongo-cxx-driver-legacy-0.0-26compat-2.6.3.tar.gz
cd mongo-cxx-driver-legacy-0.0-26compat-2.6.3/

Now this hideous compile line is needed to get things both compiled and installed :

changetitle
sudo scons --full --use-system-boost --prefix=/usr install-mongoclient

Why is beyond me, for instance the install-mongoclient target will not be recognized without the --full flag, and more of that nonsense ... just do it.

To create a geographical index, use

changetitle
db.house.ensureIndex({coords:"2dsphere"})

To delete :

changetitle
db.house.dropIndex({coords:"2dsphere"})

troubleshooting

recovery

I had a problem starting up after a hard crash, so you can't really blame mongodb, however read this : https://www.quora.com/What-is-the-proper-way-to-handle-MongoDB-file-corruption. At least that states that postgres is still more robust and that you should not use mongodb for everything, only BIG data.

But this recovery was quite straightforward :

  • mongodb uses a journal that can be used to recover after a crash
  • this is default on in /etc/mongod.conf, so recovery is done at startup
  • in my case the journal file was corrupt
  • this means the journal is useless, and i deleted all files under /var/lib/mongodb/journal

That at least gives you the pre-crash db back.

indices

When creating an index at one point i got :

<pre>
mmap: can't map area of size 0 file: /var/lib/mongodb/_tmp/esort.1380464575.393277965//file.20
</pre>
This is plainly because /tmp is full, create some space in the root and you will get more results.

C++

Programming mongodb in c/c++ means learning about binary json objects or BSON.

TODO : get samples from klopt.cpp in grid ...

BSON

Testing if a returned BSONObj is valid : It is not a pointer so NULL won;t work, use isValid() :

changetitle
BSONObjBuilder bb;
bb.append("nevenadres", addridd);

cursor =
this->conn.query("bag.house", bb.obj());
    while (cursor->more()) {
        BSONObj cn = cursor->next();
        if (cn.isValid()) 
            BSONObj coords = cn.getObjectField("coords");
        if (coords.isValid()) 
            points = coords.getObjectField("coordinates");
    }
}

if (!points.isValid()) return false;

defrag is always

At startup it says :

changetitle
1
2
3
4
5
6
Server has startup warnings: 
2016-12-28T12:13:20.131+0100 I CONTROL  [initandlisten] 
2016-12-28T12:13:20.131+0100 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-12-28T12:13:20.131+0100 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2016-12-28T12:13:20.131+0100 I CONTROL  [initandlisten] 
> 

This should be done with :

changetitle
echo never > /sys/kernel/mm/transparent_hugepage/defrag

But this fails to work : probably because a reboot is needed

This works !!:

http://unix.stackexchange.com/questions/99154/disable-transparent-hugepages#99172

in short, put these into /etc/rc.local

changetitle
1
2
3
4
5
6
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then 
    echo never > /sys/kernel/mm/transparent_hugepage/enabled 
fi 
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then 
    echo never > /sys/kernel/mm/transparent_hugepage/defrag 
fi

Also after reboot, everything is ok !

numactl warning

When starting mongo cli, you might get a warning like this :

changetitle
1
2
3
4
5
6
7
Wed Dec 20 10:54:01.873 [initandlisten] 
Wed Dec 20 10:54:01.873 [initandlisten] ** WARNING: You are running on a NUMA machine.
Wed Dec 20 10:54:01.873 [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
Wed Dec 20 10:54:01.873 [initandlisten] **              numactl --interleave=all mongod [other options]
Wed Dec 20 10:54:01.873 [initandlisten] 
> 
bye

It only happens on servert, which obviously is "a NUMA machine". Numa has to do with allocation of memory and we should probably do what mongodb advises, but not system wide.

It turns out that /etc/init.d/mongodb has support for this but it is hijacked by systemd halfway and systemd does NOT support this. Systemd DOES suck !!

I still see no single advantage in systemd whatsoever !

So we have to do that ourselves. First make sure you installed numactl.

changetitle
apt-get install numactl

Now find the startup script for mongodb and edit the line with ExecStart:

changetitle
1
2
3
4
5
/etc/init.d/mongodb stop
find /etc/systemd | grep mongo # probably this file :
vi /etc/systemd/system/multi-user.target.wants/mongodb.service
systemctl daemon-reload
/etc/init.d/mongodb start

The changes i made :

changetitle
1
2
3
[Service]
User=mongodb
ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongodb.conf

This does take care of the message.

Call to undefined method MongoDBDriverWriteConcern::isDefault

That is the top message running mongodb programs. It is because of a version mismatch between the low-level driver and the convenience driver.

The new way to use mongodb for php is to use the mongodb driver rather than the mongo driver :

On jessie the easiest way is still the 'legacy' way :

changetitle
apt -get install php5-mongo 1.5.7-1

On stretch that package is not even available anymore so : However that package is nearly unusable so you need a second layer to be installed with composer.

stretch :

changetitle
apt -get install php-mongodb 1.2.3-1
composer --no-ansi require mongodb/mongodb

If you run this on a vanilla stretch install it will print :

changetitle
Problem 1
  - mongodb/mongodb 1.2.0 requires ext-mongodb ^1.3.0 -> the requested PHP extension mongodb has the wrong version (1.2.3) installed.

And indeed the version listed above is not 1.3.0. However it is more effort to upgrade mongo-ext than to downgrade the other :

changetitle
composer --no-ansi require mongodb/mongodb 1.1

composer is a local install tool, you have to be in the project directory to use the installed library.