You are exploring: VU > Library > Blogs > Library Technology Development

Interactive Map Building using the Raphaël JavaScript Library


With the dual challenges of a complex building and constant resource and personnel movement due to construction, Falvey Memorial Library at Villanova University required an easy-to-update method to both guide our patrons around the building and help them find the resources they need. To do this, the Library Technology Team developed an interactive map system built using the Raphaël JavaScript libraries that allows for fast updates and easy map construction, while still allowing the map to be dynamic and interactive. JavaScript and the Raphaël libraries were chosen over other technologies to maximize accessibility by our community.


Nowadays, when people talk about challenges facing the modern library, they most refer to issues and challenges in the digital world. Here at Falvey Library though, one of our daily challenges is still in the brick-and-mortar realm – specifically, the actual brick and mortar that makes up our library building. Starting in summer of 2011 (Monday, August 22 to be exact), Falvey Library began renovations to the building to improve the working space of the library, with one of the goals to merge the different disparate sections of the library building into one (mostly) unified space. This work will go far in reducing the challenge of the physical building which, previous to the construction, has been a patchwork of different additions and extensions, resulting in the library building operating as if it were two separate spaces.

During this process, many of the stacks and collections are being moved around to make way for the construction teams, with many of our collections crossing over from one of the previously mentioned operating spaces into the other. As well, many members of our staff, including our library Director, have moved to temporary offices, with future plans for various staff members to move into newly constructed spaces. A new learning commons is also planned, so whole departments within the library will be moving to take advantage of this new space (including our campus writing center and campus math center, amongst others).

As you can see from above, our library is currently in a state of flux, and with a complex “gestalt” building and constantly moving locations of both collections and people, we have a real challenge on our hands making sure patrons can find the people and resources they need in a timely and efficient manner.

So how can the tech department help?

The Challenge

The quick answer of course is we need a map and directory that can keep up with the changes and can quickly inform patrons on where to find the person or resource they need. This system needs to be flexible enough to handle periodic changes both to the locations of people and resources as well as changes to the physical building itself. Furthermore, it needs to be accessible by the widest possible audience. Finally, since we’re taking the time to do this, why not make the map interactive and code it to work directly with our catalog, pointing patrons directly to the shelf on which a book or resource currently resides?

From above, it looks like we want more than a static map, and thus code is involved. But what method to use to implement our new interactive map?

The first and most obvious choice would be to use Adobe Flash, and a quick spin around the Internet confirms this as the majority of interactive maps are built using this technology. The reason for this is simple – Flash use is widespread in the web universe and Flash itself is relatively cheap to implement. On the other hand, the webverse is starting to turn on Flash as being a dated technology on its way out the door. Now, though I believe the rumors of the demise of Flash to be a bit premature, there’s no denying that the loss of accessibility to the map on the Apple mobile devices is a big loss to our patrons. As much as I don’t like my technology use dictated by the whims of a corporate giant, excising Apple users from the interactive map is too big a loss in community share for me to be comfortable with this choice. Besides, I do share some of Steve Jobs’ criticism of Flash in its bulkiness, incessant need for upgrades, and general sluggishness of pages running Flash applications. So, what else do we have?

Always out to clone and usurp hot technologies, Microsoft has its Flash competitor in Silverlight, our next contestant in the “Interactive Map Technology Showdown”. Now, having been a WPF programmer awhile back (Silverlight having started its life as WPF/Everywhere, so the technologies are very similar) I have to admit to being slightly partial to this choice. Personally, I find WPF/Silverlight coding the most fun of any language I’ve run into in a long time – unfortunately, I have to admit that not only are the issues from Flash above also prevalent in Silverlight, Silverlight has the added huge negative not being nearly as widely adopted as Flash. Therein, Silverlight also isn’t a great choice for the map with our requirement of maximum accessibility.

One more technology of note in this challenge is HTML5, and though HTML5 looks highly promising, but it technically doesn’t exist yet (as of this writing, it’s expected to publish in 2014).

So what’s the winner?

The Solution


More specifically, the backbone of the graphics is powered by the Raphaël JavaScript Library, a free, open source (we like those at Falvey Library) vector graphics library. This library allows me to draw and display any shape using the SVG standard and to add event handlers to these shapes (i.e., adding an event that turns the shape red when the mouse moves over the shape). It also includes a simple library to add animations to shapes (including motion and fading), as well as image manipulation and display.

So, why JavaScript (and the Raphaël library) over the competitors? For one, JavaScript and the Raphaël libraries don’t require a plug-in, whereas Flash and Silverlight do. In terms of accessibility, this puts JavaScript on top of the list (and no annoying software update or broken players). Furthermore, JavaScript is much more lightweight than Flash and Silverlight – for example, JavaScript loads much faster on page load that either Flash or Silverlight (both of which feel sluggish to me in general on page load). Finally, compared to Flash at least, I found that working with JavaScript and Raphaël made making animations much easier. Most of all, by using JavaScript, I am hoping to allow the widest range of users to access the interactive map, as JavaScript is the most universal and dynamic web language available today.

The Implementation

The interactive map is currently implemented on Falvey Library’s website here.

Like JavaScript itself, the whole interactive map system is very lightweight – the system consists of only a few JavaScript files, a JSON file to hold the map data, and a few image files. All the highlighted areas you see (front desk, shelves, etc) are shapes created via Raphaël, but hidden from view until highlighted. The “floors” as well are Raphaël image objects piled on top of each other — to switch pages I just change the order of the image objects in the Raphaël object stack, while pushing the shapes associated with that page on top of the image, so that mousing over them will trigger their event (and since other floors are piled under the image object moved to the top of the stack, you don’t get interference from these objects).

I used the free, open source SVG tool InkScape to determine the graphics coordinates of my shapes. To do this, I simply loaded a floor map into Inkscape, drew each shape using the SVG freedraw tool, saved the image, then finally copied the coordinates of the shape (pulled from the save file) into a JSON file. This process is a bit labor intense (especially since I did every stack row) though it didn’t take me more than a day since each successive shape gets easier once you get the rhythm of the process down. Once this is done, the JSON is updated to hold info on any person, department, or call number range held in each shape — this way, when the map info changes, all I have to do is change a setting in the JSON file and I can quickly manipulate the information on the map.

The final step is to add a set of JavaScript functions to the page header that read the JSON file and build the page. All the extra JavaScript functionality (determining which range an entered call number is in, etc.) is also stored in the header.

The Way Forward

Right now a lot of my JavaScript code is tied into our CMS (Concrete5). I hope to refactor the code soon to break this dependency and allow easier code adaptation by other institutions. As well, there are currently way too many hard coded items in the code – these need to be moved out into a config file or JSON file to allow for much greater flexibility in updating maps (I’ve begun this process but have a long way to go). Paradoxically, I hope in the near future to packagize and submit the map software to the Concrete5 marketplace (under the GPL or MIT licence) to allow the general Concrete5 community to take advantage of this code

If you’re interested in more technical details, or are considering adopting this system in your own website, please don’t hesitate to contact me at david.uspal@villanova.edu and I’d be happy to field your comments or questions.


Expanded ILS Functionality in VuFind

  • Posted by: Demian Katz
  • Posted Date: June 2, 2011
  • Filed Under: VuFind

VuFind uses simple PHP classes called ILS drivers to communicate with external integrated library systems in order to obtain information and perform actions that are outside the scope of its own index and database. This includes things like listing a patron’s checked out items or determining whether books are currently on the shelf. In the past, VuFind’s drivers have been fairly week with regard to important patron activities like placing holds and renewing books. Several libraries have implemented local customizations to support these features, but the native support involved, at best, linking off to a page in a third-party OPAC.

With the forthcoming VuFind 1.2 release (date not yet determined, but probably late summer or early fall), all that will change. The VuFind driver model has been updated with robust support for expanded patron functionality (thanks largely to the tireless efforts of Luke O’Sullivan, who has been collaborating with me for months on this problem). The ILS Driver Specification has already been updated to reflect the new features, but since this is somewhat complicated, I thought a more narrative explanation of how the new features work might be beneficial.

This article is designed to explain exactly what you need to do to add hold, recall and renewal functionality to your ILS driver. It will also touch on some of the infrastructure changes in VuFind needed to support these new features, and some general best practices for extending drivers. As always, if you want more detail on anything, you are free to contact me through comments on the blog or the VuFind mailing lists.

Basic Principles

One of the complicated things about implementing a generic system for dealing with things like holds and renewals is that different systems have different capabilities and rely on different data in order to achieve these actions. Our design tries to keep as much logic inside the ILS driver as possible. VuFind interacts with the driver in two key ways:

• It queries the driver (by checking for the existence of certain methods and/or using the getConfig method) to determine which features are available. Unsupported capabilities will simply be hidden from the end user.
• It tries to feed the driver with its own data as much as possible. In many cases, the inputs to some methods are outputs from other methods. VuFind makes no assumptions about the contents of the data — it just pushes it to the appropriate places. Associative arrays and delimited strings are the driver author’s friends — these can be used to encapsulate whatever data the driver needs, and VuFind will make sure they end up in the right places. This should all become more clear when you see some examples below!

The Least Common Denominator

As mentioned earlier, the simplest way to support advanced ILS features is to simply link to the ILS’ native OPAC. This does not generally provide a good user experience, but sometimes it is the only option. There are several methods you can implement if you want (or need) to settle for this minimal level of functionality:


While getHoldLink has been around for a long time, the other two methods are new… and both of them demonstrate the “driver using its own data” principle discussed above. getCancelHoldLink is fed with an entry from the array returned by getMyHolds, while getRenewLink is similarly fed from getMyTransactions. This is very convenient: when you’re retrieving information from the ILS about current holds or checkouts, it’s easy enough to pull whatever details are needed to link to the system’s OPAC… then you simply assemble it into a URL in the getLink method and you’re done!

Placing Holds Inside VuFind

Obviously, the ideal solution is not linking to a legacy system; it’s filling out a form within VuFind itself. Fortunately, this is now achievable. It requires a few methods to be implemented:

getConfig – Before offering holds functionality, VuFind will call the getConfig method with a parameter of “Holds”. As the driver spec describes in more detail, the method needs to return an associative array containing entries VuFind uses to render the hold form correctly. It is up to you whether to hard-code these values in your ILS driver or pass them along from the driver’s .ini file. The most critical key is the “HMACKeys” value, which tells VuFind which form fields to use in generating an HMAC message authentication code that helps prevent users from placing holds on items that they are not supposed to. If you omit HMACKeys, VuFind will assume that native holds are disabled and will fail over to the getHoldLink approach.
getHolding – Chances are you already have a getHolding method in your driver, but you may need to augment it with some extra fields in the return array if you need extra data to place holds (for example, a “hold” vs. “recall” status, or an item ID in place of a bib ID). Fortunately, you can include any field of the getHolding return array as part of getConfig’s HMACKeys list in order to ensure that it is passed along to the placeHold method below. This allows you to pass any or all necessary data without VuFind having to know exactly what is needed! If possible, you should also make sure that your getHolding array includes the “addlink” key indicating whether or not the current user is allowed to place a hold on the current item — this key makes it possible to use the “driver” option in config.ini’s Catalog:holds_mode setting, which is usually the smartest way for VuFind to present links.
placeHold – This method receives an associative array containing patron information from patronLogin along with whatever hold form fields were activated through the settings returned by getConfig. It is responsible for actually placing the hold and then returning a success or failure status.

There are quite a few small details to line up here, but the important thing is that the driver specifies what data is needed, provides all of that data, and then uses it to place the hold. All VuFind does is pass the messages from one place to another!

…and the rest

If you understand how holds work, the other new features are very similar, only slightly less complicated. A quick summary:

• To cancel holds, implement getCancelHoldDetails (which generates an identifier string using data passed to it from getMyHolds) and cancelHolds (which actually cancels holds based on patron data and an array of strings generated by getCancelHoldDetails).
• To renew items, implement getRenewDetails (which generates an identifier string using data passed to it from getMyTransactions) and renewMyItems (which actually renews items based on patron data and an array of strings generated by getRenewDetails). Also be sure that getMyTransactions includes an appropriate “renewable” key in its return array.

A Final Word on Object Orientation

That covers how to make holds work… but there’s one more detail that may affect driver authors. It is often the case that an ILS requires a version upgrade or a for-pay API plug-in to support these advanced features. In these situations, some users may want the full functionality, while others may require a more stripped-down version that only supports basic features. This is certainly the case for Voyager, where Voyager 6 users will have to settle for the old getHoldLink functionality while Voyager 7 users may have access to a RESTful API that allows every imaginable bell and whistle. Fortunately, PHP’s object-oriented model offers a simple solution: implement minimal functionality as a base class, then override and add methods in a child class to expand functionality.

The Voyager.php and VoyagerRestful.php drivers are an example of this technique in action. Similar work has been done for Horizon users with and without access to its XML API.

One useful design pattern you may notice if you look at the code for these existing drivers is that large chunks of key methods have been broken out into support methods: one that generates SQL in an abstracted associative array format and one that processes the database response. This makes it relatively easy for a child class to inject a couple of new fields into a query or process data slightly differently without having to copy and paste a large, complex method from the parent class. This design pattern is not only useful for implementing holds functionality; it’s also very handy for making minor local customizations to drivers.


Using Dismax for VuFind’s Advanced Search

  • Posted by: Demian Katz
  • Posted Date: April 15, 2011
  • Filed Under: VuFind

The Problem

One of the complexities of dealing with Solr searching is the fact that it has multiple query parsers with different strengths and weaknesses. The “Standard” query parser (sometimes referred to as the “Lucene” parser) offers traditional features like wildcards and boolean operators, but it doesn’t always do a good job when you need to search multiple index fields at the same time. The “Dismax” query parser uses fancy logic to do cross-fielded keyword searching that often seems to work like magic, but it lacks support for all the operators found in the Standard parser. VuFind currently uses a blend of these two mechanisms — most of the time, it relies on the Dismax handler, since that tends to yield the best results… but when a search contains features that Dismax can’t cope with (like a boolean AND or a * wildcard), it fails over to the Standard handler.

One of the big limitations of this situation was that VuFind’s advanced search screen always generated a Standard query, since the advanced search form forces the use of boolean operators, and Dismax doesn’t support booleans. This meant that advanced searches were often slightly inconsistent with basic searches, not to mention being slightly less effective in some cases. Fortunately, due to some little-known and little-documented Solr features, the next VuFind release will address this problem.

The Solution

As it turns out, the Standard query parser supports a pseudo-field called “_query_” which allows you to combine multiple non-Standard queries using Standard operators. You can specify the parser to use in each subquery through the {!parser} syntax. As a result, as long as each individual field of the advanced search form can be handled by Dismax, it is possible to use the Dismax parser for the separate chunks of the advanced search while still combining the chunks together using the Standard parser’s boolean capabilities!

For example, suppose you wanted to combine a Dismax author search with a Dismax title search. You could do it through this Standard search:

_query_:”{!dismax qf=\”author^100 author2^50\”}charles dickens” AND _query_:”{!dismax qf=\”title^100 alt_title^50\”}tale of two cities”

This will perform two Dismax searches (note that you can specify qf boosts inline) and then return only the results that match both of them. It’s not pretty thanks to the need to escape quotes inside the subquery, but it works… and attractiveness doesn’t really matter when it’s all generated automatically by code. Admittedly, VuFind’s search generation logic is fairly convoluted right now, but adding support for this capability only required the addition of a few more lines, as you can see from the patch posted in JIRA, and the benefits are significant.

The Future

Hopefully things can be improved even further in the near future. The latest release of Solr (version 3.1) adds an “extended Dismax” parser which combines many of the best features of the Standard and Dismax parsers. This should greatly reduce the number of situations in which we need to use Standard instead of Dismax, and it may even eliminate the need for the current nest of recursive code that builds cross-field-capable Standard queries. Once I find time to upgrade VuFind’s Solr instance to the new version, I will begin investigating how much of the search logic can be simplified through the use of this new feature.


Java Tuning Made Easier

  • Posted by: Demian Katz
  • Posted Date: March 31, 2011
  • Filed Under: VuFind

If you run a constantly-growing Solr index (as many VuFind users do), chances are that sooner or later, you will need to do some Java tuning in order to solve performance problems. There are some good resources already on the web about this topic (for example, Sun’s Java Tuning White Paper), but they tend to be somewhat dense and technical. This article is intended to give a shorter introduction to the problem and the most basic strategies for solving it. If you need more details, by all means refer to more technical sources; I just wanted to offer an easier starting point.

Why Java Needs Tuning

The main reason Java needs tuning has to do with how it handles memory management. I was studying computer science when my university switched its curriculum from C++ to Java, so I’m very familiar with Java’s distinctive approach to this subject. In C++, the programmer is responsible for all the fine details of memory management — you have to request all of the memory that you plan to use, then return it to the operating system when you are done with it. Failing to do this properly leads to the dreaded “memory leak.” Java relieves this burden by taking an entirely different approach: the programmer uses memory without worrying about where it came from, and Java uses something called a “garbage collector” to figure out which pieces of memory are no longer needed and free them up for others to use. The C++ to Java transition caused many lessons to abruptly change from “memory management is of vital importance to all of your work” to “don’t worry about memory management; the magic box will do it for you.”

Usually, sparing the programmer from worrying about memory is a great improvement — it removes a lot of tedium from the work of writing code, and most of the time, the garbage collector just does its job, and nobody has to think about it. The problem is that for complex, memory-hungry applications like Solr, the garbage collector sometimes can’t keep up. The longer the program runs, the more time Java spends on garbage collection and the less time it spends on actually running the program. In extreme situations, a Java program can become completely unresponsive, devoting all of its effort to cleaning up after itself. If you run into problems with VuFind searches becoming extremely slow and find that the problem goes away after you restart VuFind, the cause is almost certainly the garbage collector. A restart frees up all memory and gives Java a clean start, so it’s usually an easy fix to performance problems… but it’s only a matter of time before garbage once again accumulates to a critical level and the problem returns!

Possible Solutions

There are basically three answers to the Java garbage collection problem, and you don’t have to pick just one. Using multiple strategies at the same time often makes sense.

• Regularly restart your Java application — call it cheating or postponing the inevitable if you like, but it’s a very simple approach: if it takes several days for your application to start performing poorly, just schedule it to automatically restart in the middle of the night every night to get a clean slate and consistent stability.

• Give Java more memory — intense garbage collection is triggered by high memory use, so the more memory you have available, the longer it will take for a program to fill it all. Adding memory reduces (or at least postpones) the need for garbage collection, and it’s as simple as changing a couple of parameters (see details in the VuFind wiki). Generally, more memory is always better… but there is one important caveat: don’t give Java all of your system’s available memory, since that can crowd out your operating system and cause other performance problems — always leave a bit of a buffer.

• Change garbage collector behavior — Java has several different garbage collection strategies available, and some of them have additional tuneable parameters. This is where things start to get complicated, but Lucid Imagination’s Java Garbage Collection Boot Camp offers a good run-down of the available choices (not to mention providing some more detailed technical background). Even if you don’t understand all the gory details, knowing the available options means you can do some trial and error.

Testing Your Strategy

Trial and error is an inevitable part of solving Java tuning problems. The biggest shortcoming I found in other articles on the subject is that they don’t offer a simple strategy for doing this. Fortunately, it’s not too hard to test your progress using some simple tools.

Java is capable of recording a log of all of its garbage collection behavior, telling you how often it performs garbage collection and how long each collection takes to complete. While the exact parameter for generating a log may vary depending on the Java Virtual Machine that you are using, for VuFind’s preferred OpenJDK version, you can add something like this to your Java options:


As you can probably guess, this outputs the garbage collection data to a log file called /tmp/garbage.log. If you want something fancier, you could do this instead:

-Xloggc:$VUFIND_HOME/solr/jetty/logs/gc-`/bin/date +%F-%H-%M`.log

Through the magic of the Unix shell, this version stores logs inside VuFind’s solr/jetty/logs folder, naming each log file with the date and time that VuFind started up so that you can track behavior across multiple restarts.

So far, so good… except that these log files are really hard to read. Fortunately, an excellent tool exists to help visualize the data: gcviewer. With gcviewer, you can see a graph of your memory usage and the time spent on garbage collection, plus there are a number of handy statistics available (average collection time, total collection time, longest collection time, etc.). If gcviewer doesn’t meet your needs, there is also an IBM tool called PMAT which is slightly less convenient to download but which supports a broader range of log formats.

By logging data for several days between each tweak to your Java settings and using gcviewer or PMAT to analyze your logs, you can usually get a pretty good sense of whether you’ve made things better or worse… and how long it takes for your application to fall into the pit of inefficient garbage collection.


Java tuning is never going to be an easy subject to understand deeply, but that doesn’t mean you need to be afraid of it. There are several simple strategies available to help solve your problems even if you don’t know all the details of what is going on under the hood, and there are readily available tools to help you support your inevitable trial and error with empirical data. In fact, even if you are experiencing perfect performance today, it might not be a bad idea to examine garbage collection logs occasionally to see if you can prevent future problems before they become noticeable! Magic problem-solving boxes are great most of the time, but a bit of knowledge is always helpful for those times when they let you down.


Highlighting and Snippets in VuFind 1.1

  • Posted by: Demian Katz
  • Posted Date: March 23, 2011
  • Filed Under: VuFind

One of the perils of keyword-based searching is that sometimes it is not totally clear why certain results show up after performing a search. Fortunately, two common conventions help ease this problem: highlighting matching keywords and displaying snippets of text to show matches in context. The Solr index engine has supported both of these features for a long time, but VuFind has only provided robust support for them starting in version 1.1.

Activating Highlighting and Snippets in VuFind

As a VuFind administrator, if you want to take advantage of these new features, all you have to do is upgrade to VuFind 1.1 and they will be turned on by default. If you want to turn them off or adjust some of the behavior, you can make a few adjustments to your searches.ini file as described in the VuFind wiki. Unless you are interested in the technical workings behind the scenes, that is all you need to know. Have fun! Solr power users, VuFind developers and other interested techies, please read on….

Highlighting and Snippets at the Solr Level

Solr’s support for highlighting and snippets is straightforward. By means of some search parameters (set in the solrconfig.xml configuration file and/or as part of the search request), you tell Solr whether or not to apply highlighting, which fields to highlight, how to mark highlighted words, and so forth. When highlighting is requested, Solr adds a new section to its search response listing all of the highlighted phrases found in all of the documents in the search response. The highlighting information is completely separate from the main list of search results, so highlighting does not actually alter the main part of the Solr response — the details need to be merged in by the calling code.

Problem #1: Marking Highlighted Text

One of the first problems that needs to be addressed is how to mark highlighted words in the Solr response. Solr provides hl.simple.pre and hl.simple.post parameters which can be used to specify text to mark the beginning and ending of highlighted words. The obvious first temptation is to simply stick some HTML in here — "<em>" and "</em>", for example. This can lead to pitfalls, however — if you are escaping your output, the HTML won’t make it through, and the end user will actually see the HTML code. If you are not escaping your output, then text between or around the emphasis tags may get misinterpreted as HTML, leading to garbled displays (never assume you won’t have angle brackets somewhere in your records!).

VuFind’s solution to this problem is fairly obvious — it uses markers that are extremely unlikely to show up in record text (“{{{{START_HILITE}}}}” and “{{{{END_HILITE}}}}”) and defines a special escaping routine used only for highlighted text. When displaying something that it knows has been highlighted, it first escapes any possible HTML entities, and THEN it replaces the highlighting markers with HTML code that achieves the actual highlighting logic. You can see the Smarty modifier that achieves this work here. Note that the Smarty code contains some extra logic for finding and highlighting words, since it is also designed for use by other modules of VuFind that are unable to rely on Solr’s highlighting capabilities — this logic is ignored when Solr results are being displayed.

Problem #2: Merging Highlighting Data with Records

As mentioned above, Solr provides highlighting information completely separately from its search result list. This can be rather inconvenient since it requires code to look in two different places during record processing. The first temptation when encountering this problem is to write code that merges everything together, overwriting fields in the main response with highlighted versions found elsewhere in the response. However, as with many first temptations, that’s a bad idea. First of all, you will very likely lose data if you do this. In a multi-valued field, it is possible that only certain values will be highlighted and others omitted entirely. Also, unless the hl.fragsize parameter is set to 0, snippets will be truncated to only show a few words around the highlighted term. Additionally, data loss aside, it is often convenient to have both highlighted and non-highlighted versions of fields available; for example, if you want to create a link to a page about an author, you want to use the non-highlighted text for inclusion in the target URL, but you want to use the highlighted version to display the link text.

Again, VuFind works through these issues in a fairly straightforward way. For convenience, it does merge the highlighting data with the search results so that code doesn’t need to look in two completely separate arrays for information about each record. However, it doesn’t overwrite any fields; instead, it creates a fake “_highlighting” field within the body of the record and stores all of the highlighting details in there. Whenever VuFind displays a field that might be subject to highlighting, it looks in two places — first it checks the _highlighting array and displays properly processed, highlighted text if it finds any. If no highlighted version exists, it fails over to the standard, non-highlighted text. Admittedly, this adds a bit more complexity to the display templates, but it seems a reasonable price to pay to ensure data integrity. It also helps to remind template designers where they need to use the Smarty highlight modifier described above, greatly reducing the risk of any “{{{START_HILITE}}}” tags accidentally slipping through to the end user’s display.

Problem #3: Highlighted Text May Be Truncated

As discussed above, highlighted text may be truncated in some circumstances (by default, snippets are limited to about 100 characters). This is reasonable, since search results should be brief and easy to read. Indeed, even before it supported highlighting, VuFind already had code to trim down super-long titles in search results. The critical difference between the old title-trimming code and the new reliance on Solr snippets is that the old code always showed the beginning of a title, while Solr snippets occasionally come from the middle of a title, yielding strange-looking results. Setting the hl.fragsize parameter to 0 is an option, though that will lead to very long titles in search results. VuFind’s solution relies on another new Smarty modifier (modifier.addEllipsis.php) which compares highlighted text against non-highlighted text and adds periods of ellipsis on each end if truncation is detected. This may not be a perfect solution, but at least it adds a little more visual context to the truncated text.

There is one additional caveat that should be noted: multi-valued fields are still a problem. If a field contains five values and only two of them match search terms, then the highlighting data will only contain (at most) two values. VuFind does not currently contain any mechanisms for matching up partial highlighted results with longer lists of non-highlighted results. The problem is avoided in the simplest way possible: the highlighted fields currently used in VuFind’s search result templates (title and primary author) are single-valued. Multi-valued fields are only displayed as snippets (see below).

Problem #4: Displaying Snippets

As discussed above, there are certain Solr fields which VuFind will always display in search results: most importantly, title and author. However, keyword matches may fall outside of these displayed fields. For that reason, it is helpful to display snippets showing matches in other fields. Since there may be many snippets, and the search result listing should be kept reasonably brief, it makes sense to try to display just one snippet, preferably the most relevant one.

Snippet selection is handled by the IndexRecord record driver, the base class that handles display of all records retrieved from the Solr index. This class contains two arrays: $preferredSnippetFields, an array of fields that are very likely to have good snippet data and should be checked first, and $forbiddenSnippetFields, an array of fields with bad or redundant data that should never be considered for use as a snippet. By default, $preferredSnippetFields contains subject headings and table of contents entries, since these tend to offer valuable information, while $forbiddenSnippetFields contains author and title fields (unnecessary for snippets since they are always displayed elsewhere in the template), ID values (obviously uninformative) and the spelling field (a jumble of data duplicated from other fields, necessary for spell checking but misleading as a snippet). The getHighlightedSnippet method uses these arrays to pick a single best snippet, first checking the preferred fields and then taking the first available non-forbidden field if necessary. Since the method and its related arrays are all protected, it is possible to extend the IndexRecord class and create custom behavior as needed on a driver-by-driver basis.

One further detail helps make things more clear: some snippets make little sense out of context, so searches.ini contains a [Snippet_Captions] section where Solr fields can be assigned labels that will be used as captions in front of snippets. Snippets for fields not listed in this section will display as stand-alone, uncaptioned lines in the search results.


Highlighting and snippets really aren’t too difficult to work with, but as with almost anything, they turn out to be a little more complicated than expected once you look at all of the details. I hope this post has helped point out the most obvious pitfalls and explain the reasoning behind VuFind’s implementation. There is still plenty more that could be done — some of the behavior could be made even smarter, and more of Solr’s power could be exposed through VuFind configuration settings. If you have ideas or questions, please feel free to share them as comments on this post or via the vufind-tech mailing list.


Welcome to the Villanova Library Technology Blog

  • Posted by: Demian Katz
  • Posted Date: March 22, 2011
  • Filed Under: Uncategorized

Since I often read and enjoy Jonathan Rochkind’s blog, where he goes into great detail about the complexities of life as a library programmer, I was pleased when he asked me to write a bit about some of the new features in VuFind 1.1.  That post will be coming up shortly.  In the meantime, thank you, Jonathan, for prompting the creation of this blog.  I hope this will become a useful resource for keeping up with the latest developments from Villanova’s library technology team and that the information here will be interesting and informative whether or not you use our software.  Stay tuned for periodic posts about how we have approached various problems during the course of our work on VuFind, the forthcoming VuDL digital library package, and other library-related technologies.

« Previous Page


Last Modified: March 22, 2011