3.5. The query language

The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI.

The language is based on the (seemingly defunct) Xesam user search language specification.

If the results of a query language search puzzle you and you doubt what has been actually searched for, you can use the GUI Show Query link at the top of the result list to check the exact query which was finally executed by Xapian.

Here follows a sample request that we are going to explain:

          author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
      

This would search for all documents with John Doe appearing as a phrase in the author field (exactly what this is would depend on the document type, ie: the From: header, for an email message), and containing either beatles or lennon and either live or unplugged but not potatoes (in any part of the document).

An element is composed of an optional field specification, and a value, separated by a colon (the field separator is the last colon in the element). Examples: Eugenie, author:balzac, dc:title:grandet dc:title:"eugenie grandet"

The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).

All elements in the search entry are normally combined with an implicit AND. It is possible to specify that elements be OR'ed instead, as in Beatles OR Lennon. The OR must be entered literally (capitals), and it has priority over the AND associations: word1 word2 OR word3 means word1 AND (word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are not supported.

As of Recoll 1.21, you can use parentheses to group elements, which will sometimes make things clearer, and may allow expressing combinations which would have been difficult otherwise.

An element preceded by a - specifies a term that should not appear.

As usual, words inside quotes define a phrase (the order of words is significant), so that title:"prejudice pride" is not the same as title:prejudice title:pride, and is unlikely to find a result.

Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.

To save you some typing, recent Recoll versions (1.20 and later) interpret a comma-separated list of terms as an AND list inside the field. Use slash characters ('/') for an OR list. No white space is allowed. So

author:john,lennon

will search for documents with john and lennon inside the author field (in any order), and

author:john/ringo

would search for john or ringo.

Modifiers can be set on a double-quote value, for example to specify a proximity search (unordered). See the modifier section. No space must separate the final double-quote and the modifiers value, e.g. "two one"po10

Recoll currently manages the following default fields:

Recoll 1.20 and later have a way to specify aliases for the field names, which will save typing, for example by aliasing filename to fn or containerfilename to cfn. See the section about the fields file

The field syntax also supports a few field-like, but special, criteria:

The document input handlers used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.

3.5.1. Modifiers

Some characters are recognized as search modifiers when found immediately after the closing double quote of a phrase, as in "some term"modifierchars. The actual "phrase" can be a single term of course. Supported modifiers:

  • l can be used to turn off stemming (mostly makes sense with p because stemming is off by default for phrases).

  • o can be used to specify a "slack" for phrase and proximity searches: the number of additional terms that may be found between the specified ones. If o is followed by an integer number, this is the slack, else the default is 10.

  • p can be used to turn the default phrase search into a proximity one (unordered). Example:"order any in"p

  • C will turn on case sensitivity (if the index supports it).

  • D will turn on diacritics sensitivity (if the index supports it).

  • A weight can be specified for a query element by specifying a decimal value at the start of the modifiers. Example: "Important"2.5.