Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

6.6. Search and Text Extraction

The full-text search language and JCR-SQL2's full-text search constraint both have the ability to find nodes using a simpler search-engine-like expression with wildcards and phrases.
One can imagine how the hierarchical database performs these matches against a node's name and properties containing STRING, LONG, DATE, DOUBLE, DECIMAL, NAME, and PATH values. But for BINARY values, in order to determine whether the search expressions match, the hierarchical database has to determine what text is contained within each BINARY value. Indeed, the hierarchical database can only match against the BINARY value if it can extract the text from that value. This is where text extraction comes into play.
A text extractor is a component that knows how to extract searchable text from a BINARY value. Each text extract describes whether it can process files of a particular MIME type. If it can, the hierarchical database will (when necessary) call the extractor to obtain the searchable text for a supplied BINARY value.