Skip Navigation
Falvey Memorial Library
Advanced
You are exploring: Home > Blogs

Changing the Subject with VuFind

Background

Recent discussions related to the documentary, Changing the Subject, raised an interesting technical question: what should you do if your local needs come into conflict with national practices for describing a particular subject?

While Changing the Subject specifically addresses the conflict between using the terms “Illegal aliens” vs. “Undocumented immigrants,” we took this conversation as an opportunity to reduce user confusion over a whole host of terms using the word “Alien” to mean something other than “Extraterrestrial.”

The challenge, of course, is that this is not a problem that can be easily solved by editing records in our Integrated Library System. Not only are the tools for changing headings fairly difficult to use, but there is also the problem that new records will be constantly getting loaded into the system as we acquire new items, making maintenance an ongoing headache.

Enter VuFind: since we have an open source discovery layer to allow users to search our collection, and all of the records in our Integrated Library System are automatically loaded into VuFind through a software process, this gives us an opportunity to introduce some data transformations. By solving the problem once, we can introduce a system that will automatically keep the problem solved over time, without any ongoing record-editing maintenance.

Solution: Part 1 – Indexing Rules

The first part of the solution is to introduce some mapping into our MARC record indexing rules. We ended up adding these lines to our marc_local.properties file in VuFind’s local import directory:

topic_facet = 600x:610x:611x:630x:648x:650a:650x:651x:655x, (pattern_map.aliens)
topic = custom, getAllSubfields(600:610:611:630:650:653:656, " "), (pattern_map.aliens2)
pattern_map.aliens.pattern_0 = ^Alien criminal(.*)=>Noncitizen criminal$1
pattern_map.aliens.pattern_1 = ^Alien detention centers(.*)=>Detention centers$1
pattern_map.aliens.pattern_2 = ^Alien labor(.*)=>Noncitizen labor$1
pattern_map.aliens.pattern_3 = ^Alien property(.*)=>Foreign-owned property$1
pattern_map.aliens.pattern_4 = ^Aliens(.*)=>Noncitizens$1
pattern_map.aliens.pattern_5 = ^Children of alien laborers(.*)=>Children of noncitizen laborers$1
pattern_map.aliens.pattern_6 = ^Illegal alien children(.*)=>Undocumented immigrant children$1
pattern_map.aliens.pattern_7 = ^Illegal aliens(.*)=>Undocumented immigrants$1
pattern_map.aliens.pattern_8 = ^Children of illegal aliens(.*)=>Children of undocumented immigrants$1
pattern_map.aliens.pattern_9 = ^Women illegal aliens(.*)=>Women undocumented immigrants$1
pattern_map.aliens.pattern_10 = keepRaw
pattern_map.aliens2.pattern_0 = ^Alien criminal(.*)=>Noncitizen criminal$1
pattern_map.aliens2.pattern_1 = ^Alien detention centers(.*)=>Detention centers$1
pattern_map.aliens2.pattern_2 = ^Alien labor(.*)=>Noncitizen labor$1
pattern_map.aliens2.pattern_3 = ^Alien property(.*)=>Foreign-owned property$1
pattern_map.aliens2.pattern_4 = ^Aliens(.*)=>Noncitizens$1
pattern_map.aliens2.pattern_5 = ^Children of alien laborers(.*)=>Children of noncitizen laborers$1
pattern_map.aliens2.pattern_6 = ^Illegal alien children(.*)=>Undocumented immigrant children$1
pattern_map.aliens2.pattern_7 = ^Illegal aliens(.*)=>Undocumented immigrants$1
pattern_map.aliens2.pattern_8 = ^Children of illegal aliens(.*)=>Children of undocumented immigrants$1
pattern_map.aliens2.pattern_9 = ^Women illegal aliens(.*)=>Women undocumented immigrants$1
pattern_map.aliens2.pattern_10 = (.*)=>$1

This uses two very similar, but subtly different, pattern maps to translate terminology going to the topic_facet and topic fields.

The key difference is in pattern_10 of the two maps — for the “aliens” pattern, we use the “keepRaw” rule. This means that we translate headings when they match one of the preceding patterns, and we keep them in an unmodified form when they don’t. Thus, with this pattern, none of the headings will ever get indexed in their original forms; only in the translated versions. This is because we do not want outdated terminology to display in search facet lists.

On the other hand, in the “aliens2” pattern, we use a regular expression to always index the original terminology IN ADDITION TO any translated terminology. Even though we have opinions about which terms should be displayed, users may still have expectations about the older terminology. By indexing both versions of the terms, we make sure that searches will work correctly no matter how the user formulates their query.

Solution: Part 2 – Custom Code

Unfortunately, index rules alone do not fully solve this problem. This is because when working with MARC records, VuFind displays subject headings extracted directly from the raw MARC data instead of the reformatted values stored in the Solr index. This allows VuFind to take advantage of some of the richer markup found in the MARC, but in this situation, it means that we need to do some extra work to ensure that our records display the way we want them to.

The solution is to create a custom record driver in your local VuFind installation and override the getAllSubjectHeadings() method to do some translation equivalent to the mappings in the import rules. Here is an example of what this might look like:

<?php

namespace MyVuFind\RecordDriver;

class SolrMarc extends \VuFind\RecordDriver\SolrMarc
{
    /**
     * Translate "alien" headings.
     *
     * @param string $heading Input string
     *
     * @return string
     */
    protected function dealienize($heading)
    {
        static $regexes = [
            '/^Alien criminal(.*)/' => 'Noncitizen criminal$1',
            '/^Alien detention centers(.*)/' => 'Detention centers$1',
            '/^Alien labor(.*)/' => 'Noncitizen labor$1',
            '/^Alien property(.*)/' => 'Foreign-owned property$1',
            '/^Aliens(.*)/' => 'Noncitizens$1',
            '/^Children of alien laborers(.*)/' => 'Children of noncitizen laborers$1',
            '/^Children of illegal aliens(.*)/' => 'Children of undocumented immigrants$1',
            '/^Illegal alien children(.*)/' => 'Undocumented immigrant children$1',
            '/^Illegal aliens(.*)/' => 'Undocumented immigrants$1',
            '/^Women illegal aliens(.*)/' => 'Women undocumented immigrants$1',
        ];
        foreach ($regexes as $in => $out) {
            $heading = preg_replace($in, $out, $heading);
        }
        return $heading;
    }

    /**
     * Get all subject headings associated with this record.  Each heading is
     * returned as an array of chunks, increasing from least specific to most
     * specific.
     *
     * @param bool $extended Whether to return a keyed array with the following
     * keys:
     * - heading: the actual subject heading chunks
     * - type: heading type
     * - source: source vocabulary
     *
     * @return array
     */
    public function getAllSubjectHeadings($extended = false)
    {
        // Get extended headings from the parent:
        $headings = parent::getAllSubjectHeadings(true);

        foreach ($headings as $i => $heading) {
            if (isset($headings[$i]['heading'][0])) {
                $headings[$i]['heading'][0]
                    = $this->dealienize($headings[$i]['heading'][0]);
            }
        }

        // Reduce to non-extended format if necessary:
        if (!$extended) {
            $reduce = function ($var) {
                return serialize($var['heading']);
            };
            return array_map(
                'unserialize', array_unique(array_map($reduce, $headings))
            );
        }
        return $headings;
    }
}

You can learn more about building custom record drivers in the VuFind wiki.

Conclusions

While it requires a bit of redundancy, solving this problem with VuFind is still a great deal simpler and less painful than trying to maintain the records in a different way. If you would like to adopt a similar solution in your library and need more information beyond the code and configuration shared here, please feel free to reach out to the VuFind community through one of the methods listed on our support page.


Like
1 People Like This Post

0 Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

 


Last Modified: January 13, 2020