Red Hat Training

A Red Hat training course is available for Red Hat Fuse

Chapter 79. MongoDB

Camel MongoDB component

Available as of Camel 2.10
According to Wikipedia: "NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees." NoSQL solutions have grown in popularity in the last few years, and major extremely-used sites and services such as Facebook, LinkedIn, Twitter, etc. are known to use them extensively to achieve scalability and agility.
Basically, NoSQL solutions differ from traditional RDBMS (Relational Database Management Systems) in that they don't use SQL as their query language and generally don't offer ACID-like transactional behaviour nor relational data. Instead, they are designed around the concept of flexible data structures and schemas (meaning that the traditional concept of a database table with a fixed schema is dropped), extreme scalability on commodity hardware and blazing-fast processing.
MongoDB is a very popular NoSQL solution and the camel-mongodb component integrates Camel with MongoDB allowing you to interact with MongoDB collections both as a producer (performing operations on the collection) and as a consumer (consuming documents from a MongoDB collection).
MongoDB revolves around the concepts of documents (not as is office documents, but rather hierarchical data defined in JSON/BSON) and collections. This component page will assume you are familiar with them. Otherwise, visit http://www.mongodb.org/.
Maven users will need to add the following dependency to their pom.xml for this component:
<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-mongodb</artifactId>
    <version>x.x.x</version>
    <!-- use the same version as your Camel core version -->
</dependency>

URI format

mongodb:connectionBean?database=databaseName&collection=collectionName&operation=operationName[&moreOptions...]

Endpoint options

MongoDB endpoints support the following options, depending on whether they are acting like a Producer or as a Consumer (options vary based on the consumer type too).
Name Default Value Description Producer Tailable Cursor Consumer
database none Required. The name of the database to which this endpoint will be bound. All operations will be executed against this database unless dynamicity is enabled and the CamelMongoDbDatabase header is set. (/) (/)
collection none Required. The name of the collection (within the specified database) to which this endpoint will be bound. ll operations will be executed against this database unless dynamicity is enabled and the CamelMongoDbDatabase header is set. (/) (/)
collectionIndex none Camel 2.12: An optional index to create when inserting new collections. (/)
operation none
Required for producers. The id of the operation this endpoint will execute. Pick from the following:
  • Query operations: findById, findOneByQuery, findAll, count
  • Write operations: insert, save, update
  • Delete operations: remove
  • Other operations: getDbStats, getColStats
(/)
createCollection true Determines whether the collection will be automatically created in the MongoDB database during endpoint initialisation if it doesn't exist already. If this option is false and the collection doesn't exist, an initialisation exception will be thrown. (/)
invokeGetLastError false (behaviour may be inherited from connections WriteConcern) Instructs the MongoDB Java driver to invoke getLastError() after every call. Default behaviour in version 2.7.2 of the MongoDB Java driver is that only network errors will cause the operation to fail, because the actual operation is executed asynchronously in the MongoDB server without holding up the client - to increase performance. The client can obtain the real result of the operation by explicitly invoking getLastError() on the WriteResult object returned or by setting the appropriate WriteConcern. If the backend operation has not finished yet, the client will block until the result is available. Setting this option to true will make the endpoint behave synchronously and return an Exception if the underlying operation failed. (/)
writeConcern none (driver's default) Set a WriteConcern on the operation out of MongoDB's parameterised values. See WriteConcern.valueOf(String). (/)
writeConcernRef none Sets a custom WriteConcern that exists in the Registry. Specify the bean name. (/)
readPreference none Sets a ReadPreference on the connection. Accepted values: the name of any inner subclass of ReadPreference. For example: PrimaryReadPreference, SecondaryReadPreference, TaggedReadPreference. (/)
dynamicity false If set to true, the endpoint will inspect the CamelMongoDbDatabase and CamelMongoDbCollection headers of the incoming message, and if any of them exists, the target collection and/or database will be overridden for that particular operation. Set to false by default to avoid triggering the lookup on every Exchange if the feature is not desired. (/)
writeResultAsHeader false Available as of Camel 2.10.3 and 2.11: In write operations (save, update, insert, etc.), instead of replacing the body with the WriteResult object returned by MongoDB, keep the input body untouched and place the WriteResult in the CamelMongoWriteResult header (constant MongoDbConstants.WRITERESULT). (/)
persistentTailTracking false Enables or disables persistent tail tracking for Tailable Cursor consumers. See below for more information. (/)
persistentId none Required if persistent tail tracking is enabled. The id of this persistent tail tracker, to separate its records from the rest on the tail-tracking collection. (/)
tailTrackingIncreasingField none Required if persistent tail tracking is enabled. Correlation field in the incoming record which is of increasing nature and will be used to position the tailing cursor every time it is generated. The cursor will be (re)created with a query of type: tailTrackIncreasingField > lastValue (where lastValue is possibly recovered from persistent tail tracking). Can be of type Integer, Date, String, etc. NOTE: No support for dot notation at the current time, so the field should be at the top level of the document. (/)
cursorRegenerationDelay 1000ms Establishes how long the endpoint will wait to regenerate the cursor after it has been killed by the MongoDB server (normal behaviour). (/)
tailTrackDb same as endpoint's Database on which the persistent tail tracker will store its runtime information. (/)
tailTrackCollection camelTailTracking Collection on which the persistent tail tracker will store its runtime information. (/)
tailTrackField lastTrackingValue Field in which the persistent tail tracker will store the last tracked value. (/)

MongoDB operations - producer endpoints

Query operations

findById

This operation retrieves only one element from the collection whose _id field matches the content of the IN message body. The incoming object can be anything that has an equivalent to a BSON type. See http://bsonspec.org/#/specification and http://www.mongodb.org/display/DOCS/Java+Types.
from("direct:findById")
    .to("mongodb:myDb?database=flights&collection=tickets&operation&operation=findById")
    .to("mock:resultFindById");
Supports fields filter
This operation supports specifying a fields filter. See Specifying a fields filter.

findOneByQuery

Use this operation to retrieve just one element from the collection that matches a MongoDB query. The query object is extracted from the IN message body, i.e. it should be of type DBObject or convertible to DBObject. It can be a JSON String or a Hashmap. See Type conversions for more info.
Example with no query (returns any object of the collection):
from("direct:findOneByQuery")
    .to("mongodb:myDb?database=flights&collection=tickets&operation&operation=findOneByQuery")
    .to("mock:resultFindOneByQuery");
Example with a query (returns one matching result):
from("direct:findOneByQuery")
    .setBody().constant("{ \"name\": \"Raul Kripalani\" }")
    .to("mongodb:myDb?database=flights&collection=tickets&operation&operation=findOneByQuery")
    .to("mock:resultFindOneByQuery");
Supports fields filter
This operation supports specifying a fields filter. See Specifying a fields filter.

findAll

The findAll operation returns all documents matching a query, or none at all, in which case all documents contained in the collection are returned. The query object is extracted from the IN message body, i.e. it should be of type DBObject or convertible to DBObject. It can be a JSON String or a Hashmap. See Type conversions for more info.
Example with no query (returns all object in the collection):
from("direct:findAll")
    .to("mongodb:myDb?database=flights&collection=tickets&operation=findAll")
    .to("mock:resultFindAll");
Example with a query (returns all matching results):
from("direct:findAll")
    .setBody().constant("{ \"name\": \"Raul Kripalani\" }")
    .to("mongodb:myDb?database=flights&collection=tickets&operation=findAll")
    .to("mock:resultFindAll");
Paging and efficient retrieval is supported via the following headers:
Header key Quick constant Description (extracted from MongoDB API doc) Expected type
CamelMongoDbNumToSkip MongoDbConstants.NUM_TO_SKIP Discards a given number of elements at the beginning of the cursor. int/Integer
CamelMongoDbLimit MongoDbConstants.LIMIT Limits the number of elements returned. int/Integer
CamelMongoDbBatchSize MongoDbConstants.BATCH_SIZE Limits the number of elements returned in one batch. A cursor typically fetches a batch of result objects and store them locally. If batchSize is positive, it represents the size of each batch of objects retrieved. It can be adjusted to optimize performance and limit data transfer. If batchSize is negative, it will limit of number objects returned, that fit within the max batch size limit (usually 4MB), and cursor will be closed. For example if batchSize is -10, then the server will return a maximum of 10 documents and as many as can fit in 4MB, then close the cursor. Note that this feature is different from limit() in that documents must fit within a maximum size, and it removes the need to send a request to close the cursor server-side. The batch size can be changed even after a cursor is iterated, in which case the setting will apply on the next batch retrieval. int/Integer
Additionally, you can set a sortBy criteria by putting the relevant DBObject describing your sorting in the CamelMongoDbSortBy header, quick constant: MongoDbConstants.SORT_BY.
The findAll operation will also return the following OUT headers to enable you to iterate through result pages if you are using paging:
Header key Quick constant Description (extracted from MongoDB API doc) Data type
CamelMongoDbResultTotalSize MongoDbConstants.RESULT_TOTAL_SIZE Number of objects matching the query. This does not take limit/skip into consideration. int/Integer
CamelMongoDbResultPageSize MongoDbConstants.RESULT_PAGE_SIZE Number of objects matching the query. This does not take limit/skip into consideration. int/Integer
Supports fields filter
This operation supports specifying a fields filter. See Specifying a fields filter.

Specifying a fields filter

Query operations will, by default, return the matching objects in their entirety (with all their fields). If your documents are large and you only require retrieving a subset of their fields, you can specify a field filter in all query operations, simply by setting the relevant DBObject (or type convertible to DBObject, such as a JSON String, Map, etc.) on the CamelMongoDbFieldsFilter header, constant shortcut: MongoDbConstants.FIELDS_FILTER.
Here is an example that uses MongoDB's BasicDBObjectBuilder to simplify the creation of DBObjects. It retrieves all fields except _id and boringField:
// route: from("direct:findAll").to("mongodb:myDb?database=flights&collection=tickets&operation=findAll")
DBObject fieldFilter = BasicDBObjectBuilder.start().add("_id", 0).add("boringField", 0).get();
Object result = template.requestBodyAndHeader("direct:findAll", (Object) null, MongoDbConstants.FIELDS_FILTER, fieldFilter);

Create/update operations

insert

Inserts an new object into the MongoDB collection, taken from the IN message body. Type conversion is attempted to turn it into DBObject or a List. Two modes are supported: single insert and multiple insert. For multiple insert, the endpoint will expect a List, Array or Collections of objects of any type, as long as they are - or can be converted to - DBObject. All objects are inserted at once. The endpoint will intelligently decide which backend operation to invoke (single or multiple insert) depending on the input.
Example:
from("direct:insert")
    .to("mongodb:myDb?database=flights&collection=tickets&operation=insert");
The operation will return a WriteResult, and depending on the WriteConcern or the value of the invokeGetLastError option, getLastError() would have been called already or not. If you want to access the ultimate result of the write operation, you need to retrieve the CommandResult by calling getLastError() or getCachedLastError() on the WriteResult. Then you can verify the result by calling CommandResult.ok(), CommandResult.getErrorMessage() and/or CommandResult.getException().
Note that the new object's _id must be unique in the collection. If you don't specify the value, MongoDB will automatically generate one for you. But if you do specify it and it is not unique, the insert operation will fail (and for Camel to notice, you will need to enable invokeGetLastError or set a WriteConcern that waits for the write result).
This is not a limitation of the component, but it is how things work in MongoDB for higher throughput. If you are using a custom _id, you are expected to ensure at the application level that is unique (and this is a good practice too).

save

The save operation is equivalent to an upsert (UPdate, inSERT) operation, where the record will be updated, and if it doesn't exist, it will be inserted, all in one atomic operation. MongoDB will perform the matching based on the _id field.
Beware that in case of an update, the object is replaced entirely and the usage of MongoDB's $modifiers is not permitted. Therefore, if you want to manipulate the object if it already exists, you have two options:
  1. perform a query to retrieve the entire object first along with all its fields (may not be efficient), alter it inside Camel and then save it.
  2. use the update operation with $modifiers, which will execute the update at the server-side instead. You can enable the upsert flag, in which case if an insert is required, MongoDB will apply the $modifiers to the filter query object and insert the result.
For example:
from("direct:insert")
    .to("mongodb:myDb?database=flights&collection=tickets&operation=save");

update

Update one or multiple records on the collection. Requires a List<DBObject> as the IN message body containing exactly 2 elements:
  • Element 1 (index 0) => filter query => determines what objects will be affected, same as a typical query object
  • Element 2 (index 1) => update rules => how matched objects will be updated. All modifier operations from MongoDB are supported.
Multiupdates
By default, MongoDB will only update 1 object even if multiple objects match the filter query. To instruct MongoDB to update all matching records, set the CamelMongoDbMultiUpdate IN message header to true.
A header with key CamelMongoDbRecordsAffected will be returned (MongoDbConstants.RECORDS_AFFECTED constant) with the number of records updated (copied from WriteResult.getN()).
Supports the following IN message headers:
Header key Quick constant Description (extracted from MongoDB API doc) Expected type
CamelMongoDbMultiUpdate MongoDbConstants.MULTIUPDATE If the update should be applied to all objects matching. See http://www.mongodb.org/display/DOCS/Atomic+Operations boolean/Boolean
CamelMongoDbUpsert MongoDbConstants.UPSERT If the database should create the element if it does not exist boolean/Boolean
For example, the following will update all records whose filterField field equals true by setting the value of the "scientist" field to "Darwin":
// route: from("direct:update").to("mongodb:myDb?database=science&collection=notableScientists&operation=update");
DBObject filterField = new BasicDBObject("filterField", true);
DBObject updateObj = new BasicDBObject("$set", new BasicDBObject("scientist", "Darwin"));
Object result = template.requestBodyAndHeader("direct:update", new Object[] {filterField, updateObj}, MongoDbConstants.MULTIUPDATE, true);

Delete operations

remove

Remove matching records from the collection. The IN message body will act as the removal filter query, and is expected to be of type DBObject or a type convertible to it. The following example will remove all objects whose field 'conditionField' equals true, in the science database, notableScientists collection:
// route: from("direct:remove").to("mongodb:myDb?database=science&collection=notableScientists&operation=remove");
DBObject conditionField = new BasicDBObject("conditionField", true);
Object result = template.requestBody("direct:remove", conditionField);
A header with key CamelMongoDbRecordsAffected is returned (MongoDbConstants.RECORDS_AFFECTED constant) with type int, containing the number of records deleted (copied from WriteResult.getN()).

Other operations

count

Returns the total number of objects in a collection, returning a Long as the OUT message body. The following example will count the number of records in the "dynamicCollectionName" collection. Notice how dynamicity is enabled, and as a result, the operation will not run against the "notableScientists" collection, but against the "dynamicCollectionName" collection.
// from("direct:count").to("mongodb:myDb?database=tickets&collection=flights&operation=count&dynamicity=true");
Long result = template.requestBodyAndHeader("direct:count", "irrelevantBody", MongoDbConstants.COLLECTION, "dynamicCollectionName");
assertTrue("Result is not of type Long", result instanceof Long);

getDbStats

Equivalent of running the db.stats() command in the MongoDB shell, which displays useful statistic figures about the database. For example:
> db.stats();
{
	"db" : "test",
	"collections" : 7,
	"objects" : 719,
	"avgObjSize" : 59.73296244784423,
	"dataSize" : 42948,
	"storageSize" : 1000058880,
	"numExtents" : 9,
	"indexes" : 4,
	"indexSize" : 32704,
	"fileSize" : 1275068416,
	"nsSizeMB" : 16,
	"ok" : 1
}
Usage example:
// from("direct:getDbStats").to("mongodb:myDb?database=flights&collection=tickets&operation=getDbStats");
Object result = template.requestBody("direct:getDbStats", "irrelevantBody");
assertTrue("Result is not of type DBObject", result instanceof DBObject);
The operation will return a data structure similar to the one displayed in the shell, in the form of a DBObject in the OUT message body.

getColStats

Equivalent of running the db.collection.stats() command in the MongoDB shell, which displays useful statistic figures about the collection. For example:
> db.camelTest.stats();
{
	"ns" : "test.camelTest",
	"count" : 100,
	"size" : 5792,
	"avgObjSize" : 57.92,
	"storageSize" : 20480,
	"numExtents" : 2,
	"nindexes" : 1,
	"lastExtentSize" : 16384,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 8176,
	"indexSizes" : {
		"_id_" : 8176
	},
	"ok" : 1
}
Usage example:
// from("direct:getColStats").to("mongodb:myDb?database=flights&collection=tickets&operation=getColStats");
Object result = template.requestBody("direct:getColStats", "irrelevantBody");
assertTrue("Result is not of type DBObject", result instanceof DBObject);
The operation will return a data structure similar to the one displayed in the shell, in the form of a DBObject in the OUT message body.

Dynamic operations

An Exchange can override the endpoint's fixed operation by setting the CamelMongoDbOperation header, defined by the MongoDbConstants.OPERATION_HEADER constant. The values supported are determined by the MongoDbOperation enumeration and match the accepted values for the operation parameter on the endpoint URI.
For example:
// from("direct:insert").to("mongodb:myDb?database=flights&collection=tickets&operation=insert");
Object result = template.requestBodyAndHeader("direct:insert", "irrelevantBody", MongoDbConstants.OPERATION_HEADER, "count");
assertTrue("Result is not of type Long", result instanceof Long);

Tailable Cursor Consumer

MongoDB offers a mechanism to instantaneously consume ongoing data from a collection, by keeping the cursor open just like the tail -f command of *nix systems. This mechanism is significantly more efficient than a scheduled poll, due to the fact that the server pushes new data to the client as it becomes available, rather than making the client ping back at scheduled intervals to fetch new data. It also reduces otherwise redundant network traffic.
There is only one requisite to use tailable cursors: the collection must be a "capped collection", meaning that it will only hold N objects, and when the limit is reached, MongoDB flushes old objects in the same order they were originally inserted. For more information, please refer to: http://www.mongodb.org/display/DOCS/Tailable+Cursors.
The Camel MongoDB component implements a tailable cursor consumer, making this feature available for you to use in your Camel routes. As new objects are inserted, MongoDB will push them as DBObjects in natural order to your tailable cursor consumer, who will transform them to an Exchange and will trigger your route logic.

How the tailable cursor consumer works

To turn a cursor into a tailable cursor, a few special flags are to be signalled to MongoDB when first generating the cursor. Once created, the cursor will then stay open and will block upon calling the DBCursor.next() method until new data arrives. However, the MongoDB server reserves itself the right to kill your cursor if new data doesn't appear after an indeterminate period. If you are interested to continue consuming new data, you have to regenerate the cursor. And to do so, you will have to remember the position where you left off or else you will start consuming from the top again.
The Camel MongoDB tailable cursor consumer takes care of all these tasks for you. You will just need to provide the key to some field in your data of increasing nature, which will act as a marker to position your cursor every time it is regenerated, e.g. a timestamp, a sequential ID, etc. It can be of any datatype supported by MongoDB. Date, Strings and Integers are found to work well. We call this mechanism "tail tracking" in the context of this component.
The consumer will remember the last value of this field and whenever the cursor is to be regenerated, it will run the query with a filter like: increasingField > lastValue, so that only unread data is consumed.
Setting the increasing field: Set the key of the increasing field on the endpoint URI tailTrackingIncreasingField option. In Camel 2.10, it must be a top-level field in your data, as nested navigation for this field is not yet supported. That is, the "timestamp" field is okay, but "nested.timestamp" will not work. Please open a ticket in the Camel JIRA if you do require support for nested increasing fields.
Cursor regeneration delay: One thing to note is that if new data is not already available upon initialisation, MongoDB will kill the cursor instantly. Since we don't want to overwhelm the server in this case, a cursorRegenerationDelay option has been introduced (with a default value of 1000ms.), which you can modify to suit your needs.
An example:
from("mongodb:myDb?database=flights&collection=cancellations&tailTrackIncreasingField=departureTime")
    .id("tailableCursorConsumer1")
    .autoStartup(false)
    .to("mock:test");
The above route will consume from the "flights.cancellations" capped collection, using "departureTime" as the increasing field, with a default regeneration cursor delay of 1000ms.

Persistent tail tracking

Standard tail tracking is volatile and the last value is only kept in memory. However, in practice you will need to restart your Camel container every now and then, but your last value would then be lost and your tailable cursor consumer would start consuming from the top again, very likely sending duplicate records into your route.
To overcome this situation, you can enable the persistent tail tracking feature to keep track of the last consumed increasing value in a special collection inside your MongoDB database too. When the consumer initialises again, it will restore the last tracked value and continue as if nothing happened.
The last read value is persisted on two occasions: every time the cursor is regenerated and when the consumer shuts down. We may consider persisting at regular intervals too in the future (flush every 5 seconds) for added robustness if the demand is there. To request this feature, please open a ticket in the Camel JIRA.

Enabling persistent tail tracking

To enable this function, set at least the following options on the endpoint URI:
  • persistentTailTracking option to true
  • persistentId option to a unique identifier for this consumer, so that the same collection can be reused across many consumers
Additionally, you can set the tailTrackDb, tailTrackCollection and tailTrackField options to customise where the runtime information will be stored. Refer to the endpoint options table at the top of this page for descriptions of each option.
For example, the following route will consume from the "flights.cancellations" capped collection, using "departureTime" as the increasing field, with a default regeneration cursor delay of 1000ms, with persistent tail tracking turned on, and persisting under the "cancellationsTracker" id on the "flights.camelTailTracking", storing the last processed value under the "lastTrackingValue" field (camelTailTracking and lastTrackingValue are defaults).
from("mongodb:myDb?database=flights&collection=cancellations&tailTrackIncreasingField=departureTime&persistentTailTracking=true" + 
     "&persistentId=cancellationsTracker")
	.id("tailableCursorConsumer2")
	.autoStartup(false)
	.to("mock:test");
Below is another example identical to the one above, but where the persistent tail tracking runtime information will be stored under the "trackers.camelTrackers" collection, in the "lastProcessedDepartureTime" field:
from("mongodb:myDb?database=flights&collection=cancellations&tailTrackIncreasingField=departureTime&persistentTailTracking=true" + 
     "&persistentId=cancellationsTracker"&tailTrackDb=trackers&tailTrackCollection=camelTrackers" + 
     "&tailTrackField=lastProcessedDepartureTime")
	.id("tailableCursorConsumer3")
	.autoStartup(false)
	.to("mock:test");

Type conversions

The MongoDbBasicConverters type converter included with the camel-mongodb component provides the following conversions:
Name From type To type How?
fromMapToDBObject Map DBObject constructs a new BasicDBObject via the new BasicDBObject(Map m) constructor
fromBasicDBObjectToMap BasicDBObject Map BasicDBObject already implements Map
fromStringToDBObject String DBObject uses com.mongodb.util.JSON.parse(String s)
fromAnyObjectToDBObject Object  DBObject  uses the Jackson library to convert the object to a Map, which is in turn used to initialise a new BasicDBObject
This type converter is auto-discovered, so you don't need to configure anything manually.

See also