FALVEY MEMORIAL LIBRARY



You are exploring: VU > Library > Blogs > Library Technology Development

Expanded ILS Functionality in VuFind

  • Posted by: Demian Katz
  • Posted Date: June 2, 2011
  • Filed Under: VuFind

VuFind uses simple PHP classes called ILS drivers to communicate with external integrated library systems in order to obtain information and perform actions that are outside the scope of its own index and database. This includes things like listing a patron’s checked out items or determining whether books are currently on the shelf. In the past, VuFind’s drivers have been fairly week with regard to important patron activities like placing holds and renewing books. Several libraries have implemented local customizations to support these features, but the native support involved, at best, linking off to a page in a third-party OPAC.

With the forthcoming VuFind 1.2 release (date not yet determined, but probably late summer or early fall), all that will change. The VuFind driver model has been updated with robust support for expanded patron functionality (thanks largely to the tireless efforts of Luke O’Sullivan, who has been collaborating with me for months on this problem). The ILS Driver Specification has already been updated to reflect the new features, but since this is somewhat complicated, I thought a more narrative explanation of how the new features work might be beneficial.

This article is designed to explain exactly what you need to do to add hold, recall and renewal functionality to your ILS driver. It will also touch on some of the infrastructure changes in VuFind needed to support these new features, and some general best practices for extending drivers. As always, if you want more detail on anything, you are free to contact me through comments on the blog or the VuFind mailing lists.

Basic Principles

One of the complicated things about implementing a generic system for dealing with things like holds and renewals is that different systems have different capabilities and rely on different data in order to achieve these actions. Our design tries to keep as much logic inside the ILS driver as possible. VuFind interacts with the driver in two key ways:

• It queries the driver (by checking for the existence of certain methods and/or using the getConfig method) to determine which features are available. Unsupported capabilities will simply be hidden from the end user.
• It tries to feed the driver with its own data as much as possible. In many cases, the inputs to some methods are outputs from other methods. VuFind makes no assumptions about the contents of the data — it just pushes it to the appropriate places. Associative arrays and delimited strings are the driver author’s friends — these can be used to encapsulate whatever data the driver needs, and VuFind will make sure they end up in the right places. This should all become more clear when you see some examples below!

The Least Common Denominator

As mentioned earlier, the simplest way to support advanced ILS features is to simply link to the ILS’ native OPAC. This does not generally provide a good user experience, but sometimes it is the only option. There are several methods you can implement if you want (or need) to settle for this minimal level of functionality:

getCancelHoldLink
getHoldLink
getRenewLink

While getHoldLink has been around for a long time, the other two methods are new… and both of them demonstrate the “driver using its own data” principle discussed above. getCancelHoldLink is fed with an entry from the array returned by getMyHolds, while getRenewLink is similarly fed from getMyTransactions. This is very convenient: when you’re retrieving information from the ILS about current holds or checkouts, it’s easy enough to pull whatever details are needed to link to the system’s OPAC… then you simply assemble it into a URL in the getLink method and you’re done!

Placing Holds Inside VuFind

Obviously, the ideal solution is not linking to a legacy system; it’s filling out a form within VuFind itself. Fortunately, this is now achievable. It requires a few methods to be implemented:

getConfig – Before offering holds functionality, VuFind will call the getConfig method with a parameter of “Holds”. As the driver spec describes in more detail, the method needs to return an associative array containing entries VuFind uses to render the hold form correctly. It is up to you whether to hard-code these values in your ILS driver or pass them along from the driver’s .ini file. The most critical key is the “HMACKeys” value, which tells VuFind which form fields to use in generating an HMAC message authentication code that helps prevent users from placing holds on items that they are not supposed to. If you omit HMACKeys, VuFind will assume that native holds are disabled and will fail over to the getHoldLink approach.
getHolding – Chances are you already have a getHolding method in your driver, but you may need to augment it with some extra fields in the return array if you need extra data to place holds (for example, a “hold” vs. “recall” status, or an item ID in place of a bib ID). Fortunately, you can include any field of the getHolding return array as part of getConfig’s HMACKeys list in order to ensure that it is passed along to the placeHold method below. This allows you to pass any or all necessary data without VuFind having to know exactly what is needed! If possible, you should also make sure that your getHolding array includes the “addlink” key indicating whether or not the current user is allowed to place a hold on the current item — this key makes it possible to use the “driver” option in config.ini’s Catalog:holds_mode setting, which is usually the smartest way for VuFind to present links.
placeHold – This method receives an associative array containing patron information from patronLogin along with whatever hold form fields were activated through the settings returned by getConfig. It is responsible for actually placing the hold and then returning a success or failure status.

There are quite a few small details to line up here, but the important thing is that the driver specifies what data is needed, provides all of that data, and then uses it to place the hold. All VuFind does is pass the messages from one place to another!

…and the rest

If you understand how holds work, the other new features are very similar, only slightly less complicated. A quick summary:

• To cancel holds, implement getCancelHoldDetails (which generates an identifier string using data passed to it from getMyHolds) and cancelHolds (which actually cancels holds based on patron data and an array of strings generated by getCancelHoldDetails).
• To renew items, implement getRenewDetails (which generates an identifier string using data passed to it from getMyTransactions) and renewMyItems (which actually renews items based on patron data and an array of strings generated by getRenewDetails). Also be sure that getMyTransactions includes an appropriate “renewable” key in its return array.

A Final Word on Object Orientation

That covers how to make holds work… but there’s one more detail that may affect driver authors. It is often the case that an ILS requires a version upgrade or a for-pay API plug-in to support these advanced features. In these situations, some users may want the full functionality, while others may require a more stripped-down version that only supports basic features. This is certainly the case for Voyager, where Voyager 6 users will have to settle for the old getHoldLink functionality while Voyager 7 users may have access to a RESTful API that allows every imaginable bell and whistle. Fortunately, PHP’s object-oriented model offers a simple solution: implement minimal functionality as a base class, then override and add methods in a child class to expand functionality.

The Voyager.php and VoyagerRestful.php drivers are an example of this technique in action. Similar work has been done for Horizon users with and without access to its XML API.

One useful design pattern you may notice if you look at the code for these existing drivers is that large chunks of key methods have been broken out into support methods: one that generates SQL in an abstracted associative array format and one that processes the database response. This makes it relatively easy for a child class to inject a couple of new fields into a query or process data slightly differently without having to copy and paste a large, complex method from the parent class. This design pattern is not only useful for implementing holds functionality; it’s also very handy for making minor local customizations to drivers.

Like

Using Dismax for VuFind’s Advanced Search

  • Posted by: Demian Katz
  • Posted Date: April 15, 2011
  • Filed Under: VuFind

The Problem

One of the complexities of dealing with Solr searching is the fact that it has multiple query parsers with different strengths and weaknesses. The “Standard” query parser (sometimes referred to as the “Lucene” parser) offers traditional features like wildcards and boolean operators, but it doesn’t always do a good job when you need to search multiple index fields at the same time. The “Dismax” query parser uses fancy logic to do cross-fielded keyword searching that often seems to work like magic, but it lacks support for all the operators found in the Standard parser. VuFind currently uses a blend of these two mechanisms — most of the time, it relies on the Dismax handler, since that tends to yield the best results… but when a search contains features that Dismax can’t cope with (like a boolean AND or a * wildcard), it fails over to the Standard handler.

One of the big limitations of this situation was that VuFind’s advanced search screen always generated a Standard query, since the advanced search form forces the use of boolean operators, and Dismax doesn’t support booleans. This meant that advanced searches were often slightly inconsistent with basic searches, not to mention being slightly less effective in some cases. Fortunately, due to some little-known and little-documented Solr features, the next VuFind release will address this problem.

The Solution

As it turns out, the Standard query parser supports a pseudo-field called “_query_” which allows you to combine multiple non-Standard queries using Standard operators. You can specify the parser to use in each subquery through the {!parser} syntax. As a result, as long as each individual field of the advanced search form can be handled by Dismax, it is possible to use the Dismax parser for the separate chunks of the advanced search while still combining the chunks together using the Standard parser’s boolean capabilities!

For example, suppose you wanted to combine a Dismax author search with a Dismax title search. You could do it through this Standard search:

_query_:”{!dismax qf=\”author^100 author2^50\”}charles dickens” AND _query_:”{!dismax qf=\”title^100 alt_title^50\”}tale of two cities”

This will perform two Dismax searches (note that you can specify qf boosts inline) and then return only the results that match both of them. It’s not pretty thanks to the need to escape quotes inside the subquery, but it works… and attractiveness doesn’t really matter when it’s all generated automatically by code. Admittedly, VuFind’s search generation logic is fairly convoluted right now, but adding support for this capability only required the addition of a few more lines, as you can see from the patch posted in JIRA, and the benefits are significant.

The Future

Hopefully things can be improved even further in the near future. The latest release of Solr (version 3.1) adds an “extended Dismax” parser which combines many of the best features of the Standard and Dismax parsers. This should greatly reduce the number of situations in which we need to use Standard instead of Dismax, and it may even eliminate the need for the current nest of recursive code that builds cross-field-capable Standard queries. Once I find time to upgrade VuFind’s Solr instance to the new version, I will begin investigating how much of the search logic can be simplified through the use of this new feature.

Like

Java Tuning Made Easier

  • Posted by: Demian Katz
  • Posted Date: March 31, 2011
  • Filed Under: VuFind

If you run a constantly-growing Solr index (as many VuFind users do), chances are that sooner or later, you will need to do some Java tuning in order to solve performance problems. There are some good resources already on the web about this topic (for example, Sun’s Java Tuning White Paper), but they tend to be somewhat dense and technical. This article is intended to give a shorter introduction to the problem and the most basic strategies for solving it. If you need more details, by all means refer to more technical sources; I just wanted to offer an easier starting point.

Why Java Needs Tuning

The main reason Java needs tuning has to do with how it handles memory management. I was studying computer science when my university switched its curriculum from C++ to Java, so I’m very familiar with Java’s distinctive approach to this subject. In C++, the programmer is responsible for all the fine details of memory management — you have to request all of the memory that you plan to use, then return it to the operating system when you are done with it. Failing to do this properly leads to the dreaded “memory leak.” Java relieves this burden by taking an entirely different approach: the programmer uses memory without worrying about where it came from, and Java uses something called a “garbage collector” to figure out which pieces of memory are no longer needed and free them up for others to use. The C++ to Java transition caused many lessons to abruptly change from “memory management is of vital importance to all of your work” to “don’t worry about memory management; the magic box will do it for you.”

Usually, sparing the programmer from worrying about memory is a great improvement — it removes a lot of tedium from the work of writing code, and most of the time, the garbage collector just does its job, and nobody has to think about it. The problem is that for complex, memory-hungry applications like Solr, the garbage collector sometimes can’t keep up. The longer the program runs, the more time Java spends on garbage collection and the less time it spends on actually running the program. In extreme situations, a Java program can become completely unresponsive, devoting all of its effort to cleaning up after itself. If you run into problems with VuFind searches becoming extremely slow and find that the problem goes away after you restart VuFind, the cause is almost certainly the garbage collector. A restart frees up all memory and gives Java a clean start, so it’s usually an easy fix to performance problems… but it’s only a matter of time before garbage once again accumulates to a critical level and the problem returns!

Possible Solutions

There are basically three answers to the Java garbage collection problem, and you don’t have to pick just one. Using multiple strategies at the same time often makes sense.

• Regularly restart your Java application — call it cheating or postponing the inevitable if you like, but it’s a very simple approach: if it takes several days for your application to start performing poorly, just schedule it to automatically restart in the middle of the night every night to get a clean slate and consistent stability.

• Give Java more memory — intense garbage collection is triggered by high memory use, so the more memory you have available, the longer it will take for a program to fill it all. Adding memory reduces (or at least postpones) the need for garbage collection, and it’s as simple as changing a couple of parameters (see details in the VuFind wiki). Generally, more memory is always better… but there is one important caveat: don’t give Java all of your system’s available memory, since that can crowd out your operating system and cause other performance problems — always leave a bit of a buffer.

• Change garbage collector behavior — Java has several different garbage collection strategies available, and some of them have additional tuneable parameters. This is where things start to get complicated, but Lucid Imagination’s Java Garbage Collection Boot Camp offers a good run-down of the available choices (not to mention providing some more detailed technical background). Even if you don’t understand all the gory details, knowing the available options means you can do some trial and error.

Testing Your Strategy

Trial and error is an inevitable part of solving Java tuning problems. The biggest shortcoming I found in other articles on the subject is that they don’t offer a simple strategy for doing this. Fortunately, it’s not too hard to test your progress using some simple tools.

Java is capable of recording a log of all of its garbage collection behavior, telling you how often it performs garbage collection and how long each collection takes to complete. While the exact parameter for generating a log may vary depending on the Java Virtual Machine that you are using, for VuFind’s preferred OpenJDK version, you can add something like this to your Java options:

-Xloggc:/tmp/garbage.log

As you can probably guess, this outputs the garbage collection data to a log file called /tmp/garbage.log. If you want something fancier, you could do this instead:

-Xloggc:$VUFIND_HOME/solr/jetty/logs/gc-`/bin/date +%F-%H-%M`.log

Through the magic of the Unix shell, this version stores logs inside VuFind’s solr/jetty/logs folder, naming each log file with the date and time that VuFind started up so that you can track behavior across multiple restarts.

So far, so good… except that these log files are really hard to read. Fortunately, an excellent tool exists to help visualize the data: gcviewer. With gcviewer, you can see a graph of your memory usage and the time spent on garbage collection, plus there are a number of handy statistics available (average collection time, total collection time, longest collection time, etc.). If gcviewer doesn’t meet your needs, there is also an IBM tool called PMAT which is slightly less convenient to download but which supports a broader range of log formats.

By logging data for several days between each tweak to your Java settings and using gcviewer or PMAT to analyze your logs, you can usually get a pretty good sense of whether you’ve made things better or worse… and how long it takes for your application to fall into the pit of inefficient garbage collection.

Conclusion

Java tuning is never going to be an easy subject to understand deeply, but that doesn’t mean you need to be afraid of it. There are several simple strategies available to help solve your problems even if you don’t know all the details of what is going on under the hood, and there are readily available tools to help you support your inevitable trial and error with empirical data. In fact, even if you are experiencing perfect performance today, it might not be a bad idea to examine garbage collection logs occasionally to see if you can prevent future problems before they become noticeable! Magic problem-solving boxes are great most of the time, but a bit of knowledge is always helpful for those times when they let you down.

Like

Highlighting and Snippets in VuFind 1.1

  • Posted by: Demian Katz
  • Posted Date: March 23, 2011
  • Filed Under: VuFind

One of the perils of keyword-based searching is that sometimes it is not totally clear why certain results show up after performing a search. Fortunately, two common conventions help ease this problem: highlighting matching keywords and displaying snippets of text to show matches in context. The Solr index engine has supported both of these features for a long time, but VuFind has only provided robust support for them starting in version 1.1.

Activating Highlighting and Snippets in VuFind

As a VuFind administrator, if you want to take advantage of these new features, all you have to do is upgrade to VuFind 1.1 and they will be turned on by default. If you want to turn them off or adjust some of the behavior, you can make a few adjustments to your searches.ini file as described in the VuFind wiki. Unless you are interested in the technical workings behind the scenes, that is all you need to know. Have fun! Solr power users, VuFind developers and other interested techies, please read on….

Highlighting and Snippets at the Solr Level

Solr’s support for highlighting and snippets is straightforward. By means of some search parameters (set in the solrconfig.xml configuration file and/or as part of the search request), you tell Solr whether or not to apply highlighting, which fields to highlight, how to mark highlighted words, and so forth. When highlighting is requested, Solr adds a new section to its search response listing all of the highlighted phrases found in all of the documents in the search response. The highlighting information is completely separate from the main list of search results, so highlighting does not actually alter the main part of the Solr response — the details need to be merged in by the calling code.

Problem #1: Marking Highlighted Text

One of the first problems that needs to be addressed is how to mark highlighted words in the Solr response. Solr provides hl.simple.pre and hl.simple.post parameters which can be used to specify text to mark the beginning and ending of highlighted words. The obvious first temptation is to simply stick some HTML in here — "<em>" and "</em>", for example. This can lead to pitfalls, however — if you are escaping your output, the HTML won’t make it through, and the end user will actually see the HTML code. If you are not escaping your output, then text between or around the emphasis tags may get misinterpreted as HTML, leading to garbled displays (never assume you won’t have angle brackets somewhere in your records!).

VuFind’s solution to this problem is fairly obvious — it uses markers that are extremely unlikely to show up in record text (“{{{{START_HILITE}}}}” and “{{{{END_HILITE}}}}”) and defines a special escaping routine used only for highlighted text. When displaying something that it knows has been highlighted, it first escapes any possible HTML entities, and THEN it replaces the highlighting markers with HTML code that achieves the actual highlighting logic. You can see the Smarty modifier that achieves this work here. Note that the Smarty code contains some extra logic for finding and highlighting words, since it is also designed for use by other modules of VuFind that are unable to rely on Solr’s highlighting capabilities — this logic is ignored when Solr results are being displayed.

Problem #2: Merging Highlighting Data with Records

As mentioned above, Solr provides highlighting information completely separately from its search result list. This can be rather inconvenient since it requires code to look in two different places during record processing. The first temptation when encountering this problem is to write code that merges everything together, overwriting fields in the main response with highlighted versions found elsewhere in the response. However, as with many first temptations, that’s a bad idea. First of all, you will very likely lose data if you do this. In a multi-valued field, it is possible that only certain values will be highlighted and others omitted entirely. Also, unless the hl.fragsize parameter is set to 0, snippets will be truncated to only show a few words around the highlighted term. Additionally, data loss aside, it is often convenient to have both highlighted and non-highlighted versions of fields available; for example, if you want to create a link to a page about an author, you want to use the non-highlighted text for inclusion in the target URL, but you want to use the highlighted version to display the link text.

Again, VuFind works through these issues in a fairly straightforward way. For convenience, it does merge the highlighting data with the search results so that code doesn’t need to look in two completely separate arrays for information about each record. However, it doesn’t overwrite any fields; instead, it creates a fake “_highlighting” field within the body of the record and stores all of the highlighting details in there. Whenever VuFind displays a field that might be subject to highlighting, it looks in two places — first it checks the _highlighting array and displays properly processed, highlighted text if it finds any. If no highlighted version exists, it fails over to the standard, non-highlighted text. Admittedly, this adds a bit more complexity to the display templates, but it seems a reasonable price to pay to ensure data integrity. It also helps to remind template designers where they need to use the Smarty highlight modifier described above, greatly reducing the risk of any “{{{START_HILITE}}}” tags accidentally slipping through to the end user’s display.

Problem #3: Highlighted Text May Be Truncated

As discussed above, highlighted text may be truncated in some circumstances (by default, snippets are limited to about 100 characters). This is reasonable, since search results should be brief and easy to read. Indeed, even before it supported highlighting, VuFind already had code to trim down super-long titles in search results. The critical difference between the old title-trimming code and the new reliance on Solr snippets is that the old code always showed the beginning of a title, while Solr snippets occasionally come from the middle of a title, yielding strange-looking results. Setting the hl.fragsize parameter to 0 is an option, though that will lead to very long titles in search results. VuFind’s solution relies on another new Smarty modifier (modifier.addEllipsis.php) which compares highlighted text against non-highlighted text and adds periods of ellipsis on each end if truncation is detected. This may not be a perfect solution, but at least it adds a little more visual context to the truncated text.

There is one additional caveat that should be noted: multi-valued fields are still a problem. If a field contains five values and only two of them match search terms, then the highlighting data will only contain (at most) two values. VuFind does not currently contain any mechanisms for matching up partial highlighted results with longer lists of non-highlighted results. The problem is avoided in the simplest way possible: the highlighted fields currently used in VuFind’s search result templates (title and primary author) are single-valued. Multi-valued fields are only displayed as snippets (see below).

Problem #4: Displaying Snippets

As discussed above, there are certain Solr fields which VuFind will always display in search results: most importantly, title and author. However, keyword matches may fall outside of these displayed fields. For that reason, it is helpful to display snippets showing matches in other fields. Since there may be many snippets, and the search result listing should be kept reasonably brief, it makes sense to try to display just one snippet, preferably the most relevant one.

Snippet selection is handled by the IndexRecord record driver, the base class that handles display of all records retrieved from the Solr index. This class contains two arrays: $preferredSnippetFields, an array of fields that are very likely to have good snippet data and should be checked first, and $forbiddenSnippetFields, an array of fields with bad or redundant data that should never be considered for use as a snippet. By default, $preferredSnippetFields contains subject headings and table of contents entries, since these tend to offer valuable information, while $forbiddenSnippetFields contains author and title fields (unnecessary for snippets since they are always displayed elsewhere in the template), ID values (obviously uninformative) and the spelling field (a jumble of data duplicated from other fields, necessary for spell checking but misleading as a snippet). The getHighlightedSnippet method uses these arrays to pick a single best snippet, first checking the preferred fields and then taking the first available non-forbidden field if necessary. Since the method and its related arrays are all protected, it is possible to extend the IndexRecord class and create custom behavior as needed on a driver-by-driver basis.

One further detail helps make things more clear: some snippets make little sense out of context, so searches.ini contains a [Snippet_Captions] section where Solr fields can be assigned labels that will be used as captions in front of snippets. Snippets for fields not listed in this section will display as stand-alone, uncaptioned lines in the search results.

Conclusions

Highlighting and snippets really aren’t too difficult to work with, but as with almost anything, they turn out to be a little more complicated than expected once you look at all of the details. I hope this post has helped point out the most obvious pitfalls and explain the reasoning behind VuFind’s implementation. There is still plenty more that could be done — some of the behavior could be made even smarter, and more of Solr’s power could be exposed through VuFind configuration settings. If you have ideas or questions, please feel free to share them as comments on this post or via the vufind-tech mailing list.

Like

Welcome to the Villanova Library Technology Blog

  • Posted by: Demian Katz
  • Posted Date: March 22, 2011
  • Filed Under: Uncategorized

Since I often read and enjoy Jonathan Rochkind’s blog, where he goes into great detail about the complexities of life as a library programmer, I was pleased when he asked me to write a bit about some of the new features in VuFind 1.1.  That post will be coming up shortly.  In the meantime, thank you, Jonathan, for prompting the creation of this blog.  I hope this will become a useful resource for keeping up with the latest developments from Villanova’s library technology team and that the information here will be interesting and informative whether or not you use our software.  Stay tuned for periodic posts about how we have approached various problems during the course of our work on VuFind, the forthcoming VuDL digital library package, and other library-related technologies.

Like

« Previous Page

 


Last Modified: March 22, 2011