Skip Navigation
Falvey Library
Advanced
You are exploring: Home > Blogs

Automatically updating locally customized files with Git and diff3

The Problem

VuFind follows a fairly common software design pattern: it discourages users from making changes to core files, and instead encourages them to copy files out of the core and modify them in a local location. This has several advantages, including putting all of your changes in one place (very useful when a newcomer needs to learn how you have customized a project) and easing upgrades (you can update the core files without worrying about which ones you have changed).

There is one significant disadvantage, however: when the core files change, your copies get out of sync. Keeping your local copies synched up with the core files requires a lot of time-consuming, error-prone manual effort.

Or does it?

The Solution

One argument against modifying files in a local directory is that, if you use a version control tool like Git, the advantages of the “local customization directory” approach are diminished, since Git provides a different mechanism for reviewing all local changes to a code base and for handling software updates. If you modify files in place, then “git merge” will help you deal with updates to the core code.

Of course, the Git solution has its own drawbacks — and VuFind would lose some key functionality (the ability for a single instance to manage multiple configurations at different URLs) if we threw away our separation of local settings from core code.

Fortunately, you can have the best of both worlds. It’s just a matter of wrangling Git and a 3-way merge tool properly.

Three Way Merges

To understand the solution to the problem, you need to understand what a three-way merge is. Essentially, this is an algorithm that takes three files: an “old” file, and two “new” files that each have applied different changes to the “old” file. The algorithm attempts to reconcile the changes in both of the “new” files so that they can be combined into a single output. In cases where each “new” file has made a different change in the same place, the algorithm inserts “conflict markers” so that a human can manually reconcile the situation.

Whenever you merge a branch in Git, it is doing a three-way merge. The “old” file is the nearest common ancestor version between your branch and the branch being merged in. The “new” files are the versions of the same file at the tips of the two branches.

If we could just do a custom three-way merge, where the “old” file was the common ancestor between our local version of the file and the core version of the file, with the local/core versions as the “new” files, then we could automate much of the work of updating our local files.

Fortunately, we can.

Lining Up the Pieces

Solving this problem assumes a particular environment (which happens to be the environment we use at Villanova to manage our VuFind instances): a Git repository forked from the main VuFind public repository, with a custom theme and a local settings directory added.

Assume that we have this repository in a state where all of our local files are perfectly synched up with the core files, but that the upstream public repository has changed. Here’s what we need to do:

1.) Merge the upstream master code so that the core files are updated.

2.) For each of our locally customized files, perform a three-way merge. The old file is the core file prior to the merge; the new files are the core file after the merge and the local file.

3.) Manually resolve any conflicts caused by the merging, and commit the local changes.

Obviously step 2 is the hard part… but it’s not actually that hard. If you do the local updates immediately after the merge commit, you can easily retrieve pre-merge versions of files using the “git show HEAD~1:/path/to/file” command. That means you have ready access to all three pieces you need for three-way merging, and the rest is just a matter of automation.

The Script

The following Bash script is the one we use for updating our local instance of VuFind. The key piece is the merge_directory function definition, which accepts a local directory and the core equivalent as parameters. We use this to sync up various configuration files, Javascript code and templates. Note that for configurations, we merge local directories with core directories; for themes, we merge custom themes with their parents.

The actual logic is surprisingly simple. We use recursion to navigate through the local directory and look at all of the local files. For each file, we use string manipulation to figure out what the core version should be called. If the core version exists, we use the previously-mentioned Git magic to pull the old version into the /tmp directory. Then we use the diff3 three-way merge tool to do the heavy lifting, overwriting the local file with the new merged version. We echo out a few helpful messages along the way so users are aware of conflicts and skipped files.

#!/bin/bash

function merge_directory
{
    echo merge_directory $1 $2
    local localDir=$1
    local localDirLength=${#localDir}
    local coreDir=$2

    for current in $localDir/*
    do
        local coreEquivalent=$coreDir${current:$localDirLength}
        if [ -d "$current" ]
        then
          merge_directory "$current" "$coreEquivalent"
        else
          local oldFile="/tmp/tmp-merge-old-`basename "$coreEquivalent"`"
          local newFile="/tmp/tmp-merge-new-`basename "$coreEquivalent"`"
          if [ -f "$coreEquivalent" ]
          then
            git show HEAD~1:$coreEquivalent > $oldFile
            diff3 -m "$current" "$oldFile" "$coreEquivalent" > "$newFile"
            if [ $? == 1 ]
            then
              echo "CONFLICT: $current"
            fi
            cp $newFile $current
          else
            echo "Skipping $current; no equivalent in core code."
          fi
        fi
    done
}

merge_directory local/harvest harvest
merge_directory local/import import
merge_directory local/config/vufind config/vufind
merge_directory themes/vuboot3/templates themes/bootstrap3/templates
merge_directory themes/villanova_mobile/templates themes/jquerymobile/templates
merge_directory themes/vuboot3/js themes/bootstrap3/js
merge_directory themes/villanova_mobile/js themes/jquerymobile/js

Conclusion

I’ve been frustrated by this problem for years, and yet the solution is surprisingly simple — I’m glad it finally came to me. Please feel free to use this for your own purposes, and let me know if you have any questions or problems!


Like

2 Comments »

  1. Comment by Andre — July 23, 2015 @ 12:06 PM

    Demian, thanks for sharing this script! I have been struggling with this issue too and am delighted that there might be a solution at hand.
    Does diff3 also work fine with heavily fragmented files? I.e. by using inheritance we usually leave all the untouched core-code from customized classes in the core and extend the core-class in our custom module: if a customized method in our custom module get’s updated in the core, will this change also be recognized by diff3?
    I’ll definitely give your script a go in the next few days anyway, but I figured I would ask straight away before testing đŸ™‚
    Best,
    André

  2. Comment by Demian Katz — July 23, 2015 @ 12:28 PM

    André,

    This solution is primarily intended for files that are copied in their entirety and then modified — i.e. configuration files and display templates.

    If you are extending classes in a custom module, that is much less of a copy-and-paste operation, and thus diff3 won’t be of much help. Well-designed subclasses shouldn’t actually need this kind of help, though. Obviously, there are sometimes reasons why a subclass does have to copy and paste a big chunk of code, and occasionally an internal API changes in a way that makes your subclass incompatible. In these cases, you’re still going to be stuck with manual updates. For now, the best solution for dealing with these types of code changes is to put a watch on the changelog page of the VuFind wiki so that you can be notified of breaking changes.

    I’m sorry that I don’t have more exciting news in that department, but that’s really an area where the human brain is necessary. Hopefully by automating template and configuration updates with this script, I can free up more brain-time for the more complex and interesting problems!

    – Demian

RSS feed for comments on this post. TrackBack URI

Leave a comment

 


Last Modified: July 23, 2015

Ask Us: Live Chat
Back to Top