Digital Library upgrade provides enhanced discovery

Posted by: Joanne Quinn
Posted Date: February 18, 2014
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Open Access Repository, Software, Villanova University, vufind

Villanova University’s Digital Library has recently upgraded its discovery interface, introducing a more detailed search experience. This represents the first major upgrade of the application’s existing structure which was introduced a year ago when it was migrated to a Fedora-Commons Repository and debuted a public interface utilizing the Open Source faceted search engine VuFind.

Part 1 – Modeling the Repository

First we will discuss the systems architecture and components. Fedora (Flexible Extensible Digital Object Repository Architecture) provides the core architecture and services necessary for digital preservation, all accessible through a well-defined Application Programming Interface (API). It also provides numerous support services to facilitate harvesting, fixity, and messaging. It also supports the Resource Description Framework (RDF) by including the Mulgara triple store.

Figure 1

It is through these RDF semantic descriptions that Fedora models the relationships between the objects within the repository. An object’s RDF description contains declarative information regarding what kind of object it is. In our case we created one top-level model (CoreModel) that describes attributes commons among all objects (thumbnails, metadata, licensing information) and two second-level models that represent all basic shapes in the repository (Collections and Data). Collections represent groups of objects and Data objects represent the actual content being stored. (See Figure 1)

Figure 2

From here we further extrapolated these two models into specific types. Collections can be either Folders or Resources and Data objects can be Images, Audio files, Documents, etc. (See Figure 2)

Figure 3

Another important component found within the RDF description is the object’s relationship to other objects. It is this relationship that organizes Resources with their Parent Folder, and book pages within their parent Resource. (See Figure 3)

Look at the following RDF description for our Cuala Press Collection. You can see that it contains two “hasModel” relationships stating that it is both a Collection and Folder (Fedora does not support inheritance in favor of a mixin approach). Note also the one “isMemberOf” relationship referencing vudl:3, the top-level collection of the Digital Library.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/model#" xmlns:rel="info:fedora/fedora-system:def/relations-external#">

  <rdf:Description rdf:about="info:fedora/vudl:2001">

    <fedora:hasModel rdf:resource="info:fedora/vudl-system:CollectionModel"/>

    <fedora:hasModel rdf:resource="info:fedora/vudl-system:FolderCollection"/>

    <rel:isMemberOf rdf:resource="info:fedora/vudl:3"/>

  </rdf:Description>

</rdf:RDF>

A more detailed explanation of this data model was presented at Open Repositories 2013. Abstract

Part 2 – The Discovery Layer

Villanova’s Falvey Library is the focal point and lead development partner for VuFind, an Open Source search engine designed specifically around the discovery of bibliographic content. Its recently redesigned core provides a flexible model for searching and displaying our Digital Library, making it the perfect match for the public interface.

The backbone of VuFind is Apache Solr, a Java-based search engine. A simple explanation of how it works is that you put “records” into the Solr search index, each containing predefined fields (title, author, description, etc), and then the application can search through the contents of the index with high speed and efficiency.

Our initial index contained all Resource and Folders from the repository, which allows us to browse through collections by hierarchy, and search receiving both Resources and Folders in the results.

Figure 4

An early enhancement to the browse module made available Collections that reside in multiple locations. For example our Dime Novel collection contains sub-collections whose resources can exist in 2 places. (See Figure 4)
Look at the Buffalo Bill collection and notice how its breadcrumb trail denotes residency in multiple places. This is achieved by adding an additional “is MemberOf” relationship in its RDF description:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/model#" xmlns:rel="info:fedora/fedora-system:def/relations-external#">

  <rdf:Description rdf:about="info:fedora/vudl:279438">

    <fedora:hasModel rdf:resource="info:fedora/vudl-system:CollectionModel"/>

    <fedora:hasModel rdf:resource="info:fedora/vudl-system:FolderCollection"/>

    <rel:isMemberOf rdf:resource="info:fedora/vudl:280419"/>

    <rel:isMemberOf rdf:resource="info:fedora/vudl:280425"/>

  </rdf:Description>

</rdf:RDF>

Part 3 – The Upgrade

The existing search interface supports “full text” searching. We routinely perform Optical Character Recognition (OCR) using Google’s Tesseract application, on all scanned Resources, storing this derivative in the accompanying Data object. When the parent Resource is ingested into Solr, a loop is performed over all of the associated child Data objects, grabbing their OCR file and stuffing it into the full text field for the Resource. This works, as it will match searches from that particular page of the book and direct the patron to the parent Resource, but from there it is often difficult to determine what page matched the query.

Figure 5

A solution to this dilemma was achieved by including all Data objects in the Solr index. This would allow specific pages to be searched in the catalog, leading users to the individual pages that match the query. The first obvious problem with this idea is that the search results would then be cluttered with individual pages, and not the more useful Folders and Resources. This was ultimately overcome by taking advantage of a newer feature in Solr called Field Collapsing. This allows the result set to be grouped by a particular field in Solr. (See Figure 5) In our case we group on the parent Resource, which allows us to display the Resource in the result set and the page which was matched. (See Figure 6) A live example of this can be seen here.

Figure 6

We are pleased to make this available to the world, with the hopes that it will be helpful.

Happy searching…

Useful Links

The components of our infrastructure are all Open Source, freely available applications.

Fedora-Commons Repository
The backbone of the system

VuFind
The public Discovery interface

VuDL
The admin used to ingest objects into Fedora

File Information Tool Set (FITS)
A file metadata extraction tool

Tesseract
A OCR engine

Like

No comments yet

Responsive?

Posted by: Christopher Hallberg
Posted Date: December 11, 2013
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Blue Electrode, development, Social Media, Software, Villanova Digital Collection, vufind

If you follow Villanova’s Digital Library on Twitter, you may have seen this tweet recently:

Check out our new responsive design (thanks to @crhallberg!): http://t.co/VmWL30gonx Play around & let us know what you think! #webdesign

— VillanovaDigitalLib (@VillanovaDigLib) December 5, 2013

Proud to say that the shout-out refers to me, Chris Hallberg, and I’m going into my third year of working on the front end of the Digital Library. That probably doesn’t mean much to you though, so let’s cut to the chase.

“Responsive”?

aka. what is Chris’ job?

That’s a fancy way of saying that the design of the website adapts to any size screen that it’s viewed on. This is an evolution from the design model of having two completely different websites to handle desktop users (ie. Wolfram Alpha) and mobile users (Wolfram Alpha again, mobile edition). There’s two major problems here: developers have to design, build, test, deploy, host, and update two separate sites; and some functionality is lost.

Faster browsers, faster Internet speeds, and updated web technologies allow web builders to create more powerful web pages than ever before. Web users know this, and they don’t want to settle. They demand the complicated features of the “full” website on their phones and more and more users and mobile browsers try to use their ever increasing phone sizes to look at the desktop version. If you’ve ever tried this, you’ll know a lot of full websites look terrible on phones. This is where responsive design saves the day.

The biggest problem with smaller screens causes features normally laid horizontally, like this text and the navigation on the left, to clobber each other when the real estate vanishes or to become so tiny the crushed text within looks like something out of House of Leaves. Worst, in my opinion, is the horizontal scroll bar that turns your browser into a periscope in a vast, hidden field of content.

normal
Normal

squish
One word per line necessary to fit in these tiny columns

overlap
Clobber-ation

scroll
Bum bum bum

Responsive design actively reorganizes the page so that this doesn’t happen.

responsive
That’s better

“Play around”?

Here’s how to properly play with this blog to enjoy responsive design.

If this window is full screen, click the resize button (next to the close button) on this window so that you can see all the edges of the window. Now, drag the right edge of the window to the left, squeeze the window if you will. Come on. It’s ok, no one’s watching. If you don’t do it, the rest of this blog won’t make any sense. Thank you.

The first thing that will happen is that the navigation buttons above (called “pills”) will jump below the search bar. Then, the menus on the very top and below the search bar will collapse into buttons.

Pause a moment. You are entering the land of the Mobily-Sized Browser Window. We designed the new library and digital library web sites to reorganize itself when you look at it on any screen the size of an iPad or smaller. In this case, the menu on the left (and up) is going to move on top of this blog post and fill the available space. We spare no expense! Just keep an eye on the search bar as you squeeze the window down as far as you’d like.

That’s responsive design at work.

What’s going on here?

In order to make websites look beautiful, developers use a language of rules called CSS, short for Cascading Style Sheets. It looks like this:

button { ← What are we applying the “rules” below to?
   background: #002663; ← Villanova blue in code
   color: white; ← Color of the text
   border: 1px solid black;
   border-radius: 4px; ← Rounded corners!
   height: 45px;
   width: 90px;
   margin: 3px ← Distance from border to the next element
   padding: 14px 4px; ← Distance from border to content
} ← That’s enough rules for our buttons

That code is more or less how we made the four pills next to the search bar look so pretty.

A year and a half ago, the powers that be added a new feature to CSS: media queries. Media queries can tell us all kinds of things about how you’re looking at our web pages. We can tell whether or not you’re running your browser on a screen, mobile device, TV, projector, screen reader, and even braille reader. It can also tell if you’re holding your phone sideways or vertically, what colors it can display, and (most importantly) what the dimensions of your screen are. By putting code like our button example inside these queries, we can apply rules, like fonts, colors, backgrounds, and borders, to elements of the page depending on the context of the browser.

@media print { ← If we’re printing something
  // Hide ads and colorful content
}

@media (max-width: 768px) { ← Anything thinner than a vertical iPad
  // Show a special menu for mobile users
}

Responsive designs are built right on top of this technology.

Browsers are really good at stacking things on top of each other. This paragraph is under the previous one. This makes sense and it’s quite easy on your eyes, and your computer. With a few CSS rules, we can tell the browser to put things next to each other. The trick is to put things next to each other, until it’s impractical to do so. Being able to tell your web site where to put things and how they look depending on how and where your user is looking at it is what responsive design is all about.

The Magician’s Secret

Before media queries were invented, developers had to write some pretty serious code. This code had to constantly watch the size of the screen and then, basically, rewrite the files where the CSS rules are kept. It was very complicated, which is why it made much more sense to create two completely different sites and route users to each depending on their “user agent,” a small snippet of information that your browser sends to a server when you open a page in your web browser. The problem is, these bits of information were made for people to read for statistical reasons, so they are complicated and change every time a browser updated to a new version. It was a digital guessing game.

Some of the people behind Twitter decided to make a framework that web developers could build on. Instead of starting from scratch, developers could start with their collection of code and CSS that pre-made a lot of common elements of web sites like tabs, accordions, and toolbars, for them. They called it Bootstrap. In 2011, they added responsive design, making it easy for developers to create a site that looked good on any device. In 2012, a graduate assistant named Chris Hallberg was charged with rebuilding the Digital Library front end. In 2013, he, along with web developers all over campus, made Villanova’s web presence responsive. Without this framework, creating a responsive site would have taken much, much longer, and possibly wouldn’t have occurred at all. Not only was it an essential tool to the process, it is a broadcasting platform for the technology. Bootstrap makes responsive design possible and popular.

A Final Word

While I did the work you see over at the Digital Library, I did not create the page you are looking at. I can only take credit for the menu on the left, which I’m clearly very fond of. David Uspal was the magician who conjured this page’s design and David Lacy is the magician behind-the-scenes, organizing and delivering the thousands of books containing the tens of thousands of images we’ve scanned. We both received invaluable input from the Falvey Web Team and even viewers like you. Your feedback helped and continues to help us fix errors and typos, and (most importantly) pick the colors for our pretty new web site.

Enjoy!
– Chris Hallberg

PS. A fun example of the new power of the web is Google Gravity from the gallery of Chrome Experiments.
PPS. As a reward to offset the new habit you’ve developed of resizing every window you find, here’s an accordion to play.

Like

No comments yet

New Digital Library Front End

Posted by: Joanne Quinn
Posted Date: March 7, 2013
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Software

The Falvey Library is pleased to announce the launch of our new Digital Library interface.

The new interface features a JavaScript-only page zoom, faster hierarchical browsing, and enhanced searching that includes both item and collection descriptions in the results.

The public front end is built on VuFind 2.0, which has not yet been officially released, but is available for testing here. The backend is running the latest beta version of VuDL (release spring 2013), which has been re-architected to use a Fedora-Commons repository.

A more detailed article describing the new Fedora-Commons data model and Solr integration is forthcoming.

For now, we encourage you to explore this new site, and to provide any feedback to us directly.

Like

No comments yet

VuFind tools and search transform the Digital Library experience

Posted by: Michael Foight
Posted Date: January 24, 2011
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Software, vufind

Posted for Demian Katz, Library Tech Development Specialist:

As announced on January 20th, Villanova Digital Library content is now fully searchable through the library’s VuFind software. You can perform a search using the box at the top of the page at http://digital.library.villanova.edu/.

In addition to helping you find documents more easily, VuFind offers several other useful features to anyone with a Villanova University login:

* You can build lists of favorites and add tags in order to more easily revisit records in the future or share them with others.
* You can post comments if you want to add notes to a record.

Even without a VU account, you can still take advantage of some new features:

* You can text or email records to friends (or to yourself, for future reference).
* Suggested MLA and APA citations are provided (although you may need to make manual adjustments for some records).

This is only the beginning of the melding of VuFind and the Digital Library. Over the coming months, more features will be added and the experience will become even more seamless, allowing faster access to content with fewer clicks. Stay tuned, and feel free to ask questions and offer suggestions in the meantime. Feedback is always welcome as we work to continually improve our software.

Like

3 comments so far

New Digital Library Administration Software

Posted by: Joanne Quinn
Posted Date: December 9, 2010
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Open Access Repository, Software, Transcription

Falvey’s Digital Library has just been upgraded with new backend software that will improve its ability to continue growing and improving the online collection. The Digital Library’s first incarnation was launched in August 2006. Over the course of 4 years, the DL’s collection grew to over 9,000 items, and a substantial software functionality wish-list.

Add support for more file formats, so our collection can include a broader range of materials
Incorporate an OCR process to facilitate full-text searching of collection content.
Add support for inclusion of transcriptions with hand-written materials

Our initial software used a variety of technologies to achieve its goal of storing information about digital documents. Unfortunately, not all of these tools worked well together. While the new version of the software retains the METS metadata format and eXist-db XML database, it replaces nearly all of the other components with a suite of more closely-related technologies. The new, all-XML, all-Open-Source framework consists of the following components:

METS XML schema – Library of Congress standard for describing digital objects.
eXist-db – Native XML Database
Orbeon Forms – Java-based XForms engine
Tesseract – OCR Engine
VuFind – Online Public Access Catalog

New Key Features:

Root level Document Attachment

Catalogers now have the ability to add document-level items to each object. The most relevant use of this feature is to attach a hand-transcribed, fully annotated companion document to a digitally scanned book. More information on this feature can be found here and a live example can be found by viewing the Lane Manuscript

AJAX-based metadata editor

The Orbeon forms Java-based XForms engine integrates with the YUI JavaScript Library providing a rich user interface for metadata editing.

Document layout and file attachment configurations

The system incorporates a batch-attach routine for adding multiple files (in our case the pages of a scanned book) to a digital object as a single operation. An interface is available to customize the arrangement and location of these files, as well as adding and deleting files when appropriate.

OAI harvestable

OAI/PMH is a standard for serving and harvesting metadata. The Digital Library is now fully harvestable using this standard.

In the coming months we will extend the software to include custom drivers for a VuFind front-end and modularize the metadata editor to support a wide-range of options including Dublin Core, MODS, EAD, and PREMIS support for preservation Metadata.

Our plan is to launch the software as a simple, open-source platform for preservation and presentation of digital collections. So stay tuned! We are targeting April 2011 for the Beta Release.

We are always looking for development partners! If you are interested, please contact us at digitallibrary@villanova.edu

Like

1 comment

Transcriptions brought to life online

Posted by: Michael Foight
Posted Date: November 19, 2010
Filed Under: Blue Electrode: Sparking between Silicon and Paper
Tags: Software, Transcription

A newly released and long awaited feature in the digital library software enables the display of transcribed content. When transcribed content is present in a digital library object, a new tab, designated Docs, is displayed. This tab will present the transcribed content as readable and downloadable files. Just click on the thumbnail icon of the file type. Most transcribed content is available in both smart-PDF and Word formats. We are very proud to bring this new feature to you!

docs2

Handwritten content is often difficult to decipher and, when digitized, not conducive to OCR. In parallel to the scanning of heritage materials, individual volunteers, staff, and students have been transcribing the writings from the past. Individual transcribers’ names are included. Now these letters and diaries can be read easily online as part of Villanova University’s Digital Library.

The first transcription to be included in the online collection is the Lane Manuscript. This contains the autobiographical manuscript of Samuel Alanson Lane (1815-1905). From January until May of 1835, Lane traveled around the U.S., looking for work in numerous cities, including New Orleans, Cincinnati, Columbus, and Cleveland, until finally settling in what would become his hometown, Akron, OH, on June 29, 1835. S. A. Lane was a dedicated follower and professional lecturer of the American temperance movement as well as an avid supporter and political participant for the Republican Party, formed in 1854. Perhaps Lane’s most interesting and daring pursuit, was his active participation in the mass emigration to California in search of fortune like many other easterners during the California Gold Rush, which kept Lane from his home and family in Akron for over two years. This manuscript covers his life and contains many depictions of 19th century American frontier life. An exhibit featuring the life and times of Samuel Lane is also available online.

While only a few transcriptions are online at present, over the coming weeks and months much new transcribed content will be available to delight and fascinate. On the technical front, we are quickly working to make these materials discoverable with keyword searching. Printed texts that have been OCRed will also have enhanced findability.

I would be remiss without acknowledging all who have toiled many long hours over these often cryptic documents filled with fragmentary words and sentences. Thank you! In addition much hard labor also went into the software enhancements that make such content available, so out of the many individuals involved, I would especially like to thank David Lacy for his hard work in bringing the best to our digital library software!

If you are interested in helping to bring historical materials alive, please consider volunteering. Just reach out and email us at: digitallibrary@villanova.edu

Like

1 People Like This Post

3 comments so far

Falvey Library Blog

Digital Library upgrade provides enhanced discovery

Part 1 – Modeling the Repository

Part 2 – The Discovery Layer

Part 3 – The Upgrade

Useful Links

Responsive?

“Responsive”?

“Play around”?

What’s going on here?

The Magician’s Secret

A Final Word

New Digital Library Front End

VuFind tools and search transform the Digital Library experience

New Digital Library Administration Software

Transcriptions brought to life online