26.5. Using the Infinispan Query DSL with Spark

The Infinispan Query DSL can be used as a filter for the InfinispanRDD, allowing data to be pre-filtered at the source rather than at RDD level.

Important

Data in the cache must have been encoded with protobuf for the querying DSL to function correctly. Instructions on using protobuf encoding are found in Section 17.5, “Protobuf Encoding”.
Consider the following example which retrieves a list of books that includes any author whose name contains Doe:

Example 26.9. Filtering by a Query (Scala)

import org.infinispan.client.hotrod.impl.query.RemoteQuery
import org.infinispan.client.hotrod.{RemoteCacheManager, Search}
import org.infinispan.spark.domain._
[...]
val query = Search.getQueryFactory(remoteCacheManager.getCache(getCacheName))
    .from(classOf[Book])
    .having("author").like("Doe")
    .toBuilder[RemoteQuery].build()
    
val rdd = createInfinispanRDD[Int, Book]
    .filterByQuery[Book]](query, classOf[Book])
Projections are also fully supported; for instance, the above example may be adjusted to only obtain the title and publication year, and sorting on the latter field:

Example 26.10. Filtering with a Projection (Scala)

import org.infinispan.client.hotrod.impl.query.RemoteQuery
import org.infinispan.client.hotrod.{RemoteCacheManager, Search}
import org.infinispan.spark.domain._
[...]
val query = Search.getQueryFactory(remoteCacheManager.getCache(getCacheName))
    .select("title","publicationYear")
    .from(classOf[Book])
    .having("author").like("Doe")
    .groupBy("publicationYear")
    .toBuilder[RemoteQuery].build()
    
val rdd = createInfinispanRDD[Int, Book]
    .filterByQuery[Array[AnyRef]](query, classOf[Book])
In addition, if a filter has already been deployed to the JBoss Data Grid server it may be referenced by name, as seen below:

Example 26.11. Filtering with a Deployed Filter (Scala)

val rdd = InfinispanRDD[String,Book] = .... 
// "my-filter-factory" filter and converts Book to a String, and has two parameters
val filteredRDD = rdd.filterByCustom[String]("my-filter-factory", "param1", "param2")