FALVEY MEMORIAL LIBRARY

You are exploring: VU > Library > Blogs > Library Technology Development > Using Dismax for VuFind’s Advanced Search

Using Dismax for VuFind’s Advanced Search

  • Posted by: Demian Katz
  • Posted Date: April 15, 2011
  • Filed Under: VuFind

The Problem

One of the complexities of dealing with Solr searching is the fact that it has multiple query parsers with different strengths and weaknesses. The “Standard” query parser (sometimes referred to as the “Lucene” parser) offers traditional features like wildcards and boolean operators, but it doesn’t always do a good job when you need to search multiple index fields at the same time. The “Dismax” query parser uses fancy logic to do cross-fielded keyword searching that often seems to work like magic, but it lacks support for all the operators found in the Standard parser. VuFind currently uses a blend of these two mechanisms — most of the time, it relies on the Dismax handler, since that tends to yield the best results… but when a search contains features that Dismax can’t cope with (like a boolean AND or a * wildcard), it fails over to the Standard handler.

One of the big limitations of this situation was that VuFind’s advanced search screen always generated a Standard query, since the advanced search form forces the use of boolean operators, and Dismax doesn’t support booleans. This meant that advanced searches were often slightly inconsistent with basic searches, not to mention being slightly less effective in some cases. Fortunately, due to some little-known and little-documented Solr features, the next VuFind release will address this problem.

The Solution

As it turns out, the Standard query parser supports a pseudo-field called “_query_” which allows you to combine multiple non-Standard queries using Standard operators. You can specify the parser to use in each subquery through the {!parser} syntax. As a result, as long as each individual field of the advanced search form can be handled by Dismax, it is possible to use the Dismax parser for the separate chunks of the advanced search while still combining the chunks together using the Standard parser’s boolean capabilities!

For example, suppose you wanted to combine a Dismax author search with a Dismax title search. You could do it through this Standard search:

_query_:”{!dismax qf=\”author^100 author2^50\”}charles dickens” AND _query_:”{!dismax qf=\”title^100 alt_title^50\”}tale of two cities”

This will perform two Dismax searches (note that you can specify qf boosts inline) and then return only the results that match both of them. It’s not pretty thanks to the need to escape quotes inside the subquery, but it works… and attractiveness doesn’t really matter when it’s all generated automatically by code. Admittedly, VuFind’s search generation logic is fairly convoluted right now, but adding support for this capability only required the addition of a few more lines, as you can see from the patch posted in JIRA, and the benefits are significant.

The Future

Hopefully things can be improved even further in the near future. The latest release of Solr (version 3.1) adds an “extended Dismax” parser which combines many of the best features of the Standard and Dismax parsers. This should greatly reduce the number of situations in which we need to use Standard instead of Dismax, and it may even eliminate the need for the current nest of recursive code that builds cross-field-capable Standard queries. Once I find time to upgrade VuFind’s Solr instance to the new version, I will begin investigating how much of the search logic can be simplified through the use of this new feature.

Like

2 Comments »

  1. Comment by Data Center — February 23, 2012 @ 2:37 pm

    I’ve used VuFind and love how it makes things so much easier for people – especially the younger generation – to find things. We’re used to “searching” online, so this just fits our behaviour better that the old ways. One question – not sure if this is happening now or not – do you think libraries will begin storing/hosting video content, like they do books? Seems like things are headed to a “video only” society….

  2. Comment by dkatz — February 23, 2012 @ 3:12 pm

    Libraries are definitely getting into video content, though it comes with a whole different set of challenges than books. At Villanova, our digital library has just started adding audio content (see http://digital.library.villanova.edu/Philadelphia%20Ceili%20Group/) and video will follow sooner or later!

RSS feed for comments on this post. TrackBack URI

Leave a comment

*

 


Last Modified: April 15, 2011