FALVEY MEMORIAL LIBRARY



You are exploring: VU > Library > Blogs > Blue Electrode: Sparking between Silicon and Paper

Digital Library upgrade provides enhanced discovery

Villanova University’s Digital Library has recently upgraded its discovery interface, introducing a more detailed search experience. This represents the first major upgrade of the application’s existing structure which was introduced a year ago when it was migrated to a Fedora-Commons Repository and debuted a public interface utilizing the Open Source faceted search engine VuFind.

Part 1 – Modeling the Repository

First we will discuss the systems architecture and components. Fedora (Flexible Extensible Digital Object Repository Architecture) provides the core architecture and services necessary for digital preservation, all accessible through a well-defined Application Programming Interface (API). It also provides numerous support services to facilitate harvesting, fixity, and messaging. It also supports the Resource Description Framework (RDF) by including the Mulgara triple store.

fig1

Figure 1

It is through these RDF semantic descriptions that Fedora models the relationships between the objects within the repository. An object’s RDF description contains declarative information regarding what kind of object it is. In our case we created one top-level model (CoreModel) that describes attributes commons among all objects (thumbnails, metadata, licensing information) and two second-level models that represent all basic shapes in the repository (Collections and Data). Collections represent groups of objects and Data objects represent the actual content being stored. (See Figure 1)

Figure 2

Figure 2

From here we further extrapolated these two models into specific types. Collections can be either Folders or Resources and Data objects can be Images, Audio files, Documents, etc. (See Figure 2)

Figure 3

Figure 3

Another important component found within the RDF description is the object’s relationship to other objects. It is this relationship that organizes Resources with their Parent Folder, and book pages within their parent Resource. (See Figure 3)

Look at the following RDF description for our Cuala Press Collection. You can see that it contains two “hasModel” relationships stating that it is both a Collection and Folder (Fedora does not support inheritance in favor of a mixin approach). Note also the one “isMemberOf” relationship referencing vudl:3, the top-level collection of the Digital Library.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/model#" xmlns:rel="info:fedora/fedora-system:def/relations-external#">
  <rdf:Description rdf:about="info:fedora/vudl:2001">
    <fedora:hasModel rdf:resource="info:fedora/vudl-system:CollectionModel"/>
    <fedora:hasModel rdf:resource="info:fedora/vudl-system:FolderCollection"/>
    <rel:isMemberOf rdf:resource="info:fedora/vudl:3"/>
  </rdf:Description>
</rdf:RDF>

A more detailed explanation of this data model was presented at Open Repositories 2013. Abstract

Part 2 – The Discovery Layer

Villanova’s Falvey Library is the focal point and lead development partner for VuFind, an Open Source search engine designed specifically around the discovery of bibliographic content. Its recently redesigned core provides a flexible model for searching and displaying our Digital Library, making it the perfect match for the public interface.

The backbone of VuFind is Apache Solr, a Java-based search engine. A simple explanation of how it works is that you put “records” into the Solr search index, each containing predefined fields (title, author, description, etc), and then the application can search through the contents of the index with high speed and efficiency.

Our initial index contained all Resource and Folders from the repository, which allows us to browse through collections by hierarchy, and search receiving both Resources and Folders in the results.

Figure 4

Figure 4

An early enhancement to the browse module made available Collections that reside in multiple locations. For example our Dime Novel collection contains sub-collections whose resources can exist in 2 places. (See Figure 4)
Look at the Buffalo Bill collection and notice how its breadcrumb trail denotes residency in multiple places. This is achieved by adding an additional “is MemberOf” relationship in its RDF description:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/model#" xmlns:rel="info:fedora/fedora-system:def/relations-external#">
  <rdf:Description rdf:about="info:fedora/vudl:279438">
    <fedora:hasModel rdf:resource="info:fedora/vudl-system:CollectionModel"/>
    <fedora:hasModel rdf:resource="info:fedora/vudl-system:FolderCollection"/>
    <rel:isMemberOf rdf:resource="info:fedora/vudl:280419"/>
    <rel:isMemberOf rdf:resource="info:fedora/vudl:280425"/>
  </rdf:Description>
</rdf:RDF>

Part 3 – The Upgrade

The existing search interface supports “full text” searching. We routinely perform Optical Character Recognition (OCR) using Google’s Tesseract application, on all scanned Resources, storing this derivative in the accompanying Data object. When the parent Resource is ingested into Solr, a loop is performed over all of the associated child Data objects, grabbing their OCR file and stuffing it into the full text field for the Resource. This works, as it will match searches from that particular page of the book and direct the patron to the parent Resource, but from there it is often difficult to determine what page matched the query.

Figure 5

Figure 5

A solution to this dilemma was achieved by including all Data objects in the Solr index. This would allow specific pages to be searched in the catalog, leading users to the individual pages that match the query. The first obvious problem with this idea is that the search results would then be cluttered with individual pages, and not the more useful Folders and Resources. This was ultimately overcome by taking advantage of a newer feature in Solr called Field Collapsing. This allows the result set to be grouped by a particular field in Solr. (See Figure 5) In our case we group on the parent Resource, which allows us to display the Resource in the result set and the page which was matched. (See Figure 6) A live example of this can be seen here.

Figure 6

Figure 6

We are pleased to make this available to the world, with the hopes that it will be helpful.

Happy searching…

Useful Links

The components of our infrastructure are all Open Source, freely available applications.

Fedora-Commons Repository
The backbone of the system

VuFind
The public Discovery interface

VuDL
The admin used to ingest objects into Fedora

File Information Tool Set (FITS)
A file metadata extraction tool

Tesseract
A OCR engine

Like

eBook available: A Book of Bryn Mawr Stories

brynmawrAnother title has been added to Project Gutenberg thanks to the efforts of Villanova’s Digital Library. The latest release is A Book of Bryn Mawr Stories, which, as the title suggests, features short fiction about our neighbor, Bryn Mawr College.

The collection was released in 1901, and it serves as an affectionate tribute to a beloved school, a record of traditions that still live on today, and a time capsule of attitudes about college-educated women.  It also contains a few brief references to “Villa Nova” and other spots in the region, just in case you are keeping score.

Putting aside local and historical significance, the book is more of a mixed bag — there aren’t a lot of stories in here that make a lasting impression on the reader who does not have a personal connection to Bryn Mawr, though there are at least a couple of tales that work reasonably well in their own right.  Given that most people aren’t approaching an obscure 1901 short story collection solely for its literary merit, these above-average tales serve as a nice little bonus.

The book can be read online or downloaded in a variety of popular eBook formats through Project Gutenberg.

Like

Villanova-Inspired Novel Available for Proofreading

Bolax CoverIn 1907, Josephine Culpeper published Bolax, Imp or Angel–Which?, a novel set in part at a fictionalized version of Villanova. This book has become the latest title to be selected as part of the Digital Library’s collaboration with the Distributed Proofreaders project. Please visit the project page if you would like to help turn this bit of university history into a full-fledged eBook. If you are unfamiliar with the Distributed Proofreaders effort, see Proofreading the Digital Library for an introduction.

Like

Villanova history comes alive in the pages of The Villanovan

Falvey Memorial Library recently completed a major digitization project to make available online all 1,713 issues of the campus newspaper, The Villanovan, published between 1893 and 1995. On Feb. 23, the Library hosted a program to celebrate this accomplishment. The celebration was dedicated to the memory of longtime Villanovan faculty adviser, June Lytel-Murphy.

The program began with introductory remarks by University Librarian Joseph Lucia and University President the Rev. Peter M. Donohue, OSA., PhD, ’75 A&S, who characterized the project as a history of “the voice of the student body.” Special Collections and Digital Library Coordinator Michael Foight, Library Technology Development Specialist Demian Katz and Research Support Librarian Susan Ottignon each addressed various aspects of the project.

Prior to 2011, The Villanovan was available only through bound volumes of issues or microfilm—neither providing an especially pleasurable experience for casual perusal….

The above paragraphs were excerpted from David Burke’s article about the event on the main library news blog. Click here to read his full article.

Since the event, we’ve seen a huge increase in use of this collection. Michael Foight reported that we had a record 1009 unique visitors to the Digital Library in the week following the event and most of those visitors were browsing the Villanovan collection.

We’ve written about the Villanovan digitization project previously. Michael Foight wrote about the initial phase of this digitization effort in December 2008. Cathleen Lu, Digital Library Intern in Fall 2010, wrote about some of the more eye-catching advertisements she found in the papers while working to improve the PDF files. And last year Laura Bang wrote about the 10,000th item to be added to the Digital Library, which happened to be the April 4, 1944 issue of the Villanovan.

These papers provide a fascinating look at not just the University’s history, but also the historical context around the University and how world events affected life at Villanova. Take a look and see what you discover!

Like

Chaos Unveiled: New Exhibit on the Origins of Villanova University

Posted for: Karla Irwin, Villanova University.

When I was presented with the opportunity to curate an online exhibition as the Fall 2011 Digital Library Intern I jumped at the chance. Through the course of my internship I had grown more familiar with the wealth of materials in the Digital Library and I was eager to explore one area in particular: materials related to rioting that occurred in Philadelphia in 1844. Before seeing the items I knew nothing about the riots which was surprising to me because I had grown up in the area and lived in Philadelphia for a number of years. After conducting a little more research I was amazed at the history of the riots and wondered how many people in the area were like me and unaware that the riots had happened. I thought the story of the riots were an important one to share and now it is my pleasure to present to you Chaos in the Streets: The Philadelphia Riots of 1844.

Philadelphia in 1844 was a hotbed of religious and ethnic prejudice, most notably toward Catholics and the Irish. This was representative of a national sentiment and the exhibition looks at a group called the Nativists, who later became the Know Nothing Party, and their role in the rioting. In May and July of 1844 these issues came to a breaking point and the city of Philadelphia saw some of its most violent days in her history. The riots would ultimately have many lasting effects and it can be said that the Philadelphia you see today is partially a result of those violent days.

The Digital Library provides access to quite a large collection relating to the riots including a collection of letters from Morton McMichael who was the sheriff at the time. His letters and personal journal provide a first-hand account of what it was to be like on the streets of Philadelphia in the mid 1840’s. Only a small portion of his entire collection is utilized in the exhibit and so I recommend taking a longer look at the letters as they offer a fascinating window into policing in Philadelphia during that time.

There was no shortage of interesting material on the riots but one aspect that proved especially dramatic to me was the role the Catholic Churches had in the rioting, particularly St. Augustine’s Church. I had visited the church many years ago in the Old City section of Philadelphia and walked by it countless times. What I did not know is that the St. Augustine’s I saw today was rebuilt from the one that had burned down during the rioting. Sadly, along with the burning of the church, a library containing an invaluable collection of theological materials was also destroyed. Imagine my amazement when I found out some of the books from that library ended up in Special Collections in Falvey Library! You will find in the exhibition how the Augustinian community in Philadelphia put major roots down in both center city Philadelphia and, of course, Villanova University. I hope you find the connection, and how it relates to the riots, as interesting as I do.

Finally, I would like to thank Michael Foight and Laura Bang for their valuable guidance, Joanne Quinn for the graphics, Susan Connor, Susan Ottignon, and Chelsea Payne for their informative transcription work, and David Lacy for his work on technical details. Without them the exhibition would never have come to fruition.

Like

 


Last Modified: December 8, 2011