Find All Directories That Have Been Deleted In Git

Had an interesting question posed to me the other day from David Ruttka: Did I have a favorite way to list all of the directories that have been deleted from a folder in git? Admittedly, this took a bit of thought. No arcane git command came to mind. Nothing did. Google wasn’t that much help either. Not only did I not have a favorite way, I didn’t have a way at all! Today we fix that.

We’ve got a few things going against us from the start. First, Git doesn’t actually track directories. It knows when we add files and delete them, but directories themselves aren’t tracked. Second, just because something has been deleted in the past, doesn’t mean it isn’t there now, just that there was a moment in time where it wasn’t. Still, provided that we keep that in mind, we should be able to find our first ‘favorite way’.

Let’s start by seeing what info we actually can get1. A post on Stack Overflow starts us off in the right direction. We’ve got a one liner to find a list of files that have been removed. Git log will show us a listing of all the files modified in commits by using –summary. We can further constrain that to only deletes with –diff-filter=D. Finally, we limit it to the current directory with the trailing period.

git log --diff-filter=D --summary .
commit 69191173ee35019b3180f5f763c07c6d496cbf2a
Author: Joshua Rogers <git@joshuarogers.net>
Date:   Sat Mar 1 17:49:55 2014 -0600

    Adds support for animations.

 delete mode 100644 Hemlock/blittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.h

It’s definitely a great starting point. The information that I want is there, but so is a lot of other data. Really, all I want is the listing of which files were removed. We can get this by piping our output to grep and searching for the lines that contain ‘delete mode’.

git log --diff-filter=D --summary . | \
    grep 'delete mode'
 delete mode 100644 Hemlock/blittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.h

Much better. Every line of our output represents a possible directory that has been removed, and can easily be consumed by a regex. At this point, the next obvious move is to rewrite each line with sed. Each line starts with ' delete mode ‘, a 6 digit file permission, a space, the directory component of the file, a final slash, and then the filename. We can represent that with the regex delete mode [0-7]{6} ./..

Out of this line, the part that we want is the content after final space, but before the final slash, so we’ll need to modify the regex to capture that part of the string: delete mode [0-7]{6} (.)/.. That should do it. Now that we have the part we care about, we’re going to replace the matched content with just the directory name. Let’s modify that line again. s/delete mode [0-7]{6} (.)/./\1/ Our s/ is telling sed to do a substring replacement. We denote the pattern and replacement with slashes, so we needed to escape the slash in our pattern. Finally, the \1 is telling it to replace the entire match with the first (and in our case only) capture pattern.

One more change and we should be done with the replacement: since this is going to be run through bash, we’re going to need to escape any characters that have special meaning to it: (, ), {, and } At this point, we should have a listing of directories that have had at least one file removed from them.

git log --diff-filter=D --summary . | \
    grep 'delete mode' | \
    sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/'
Hemlock/
Nightshade/
Nightshade/

Since the Nightshade folder had two files removed from it, it appears in our output twice. We can clean up the list of candidates a bit further by running them through sort and then uniq.2

git log --diff-filter=D --summary . | \
    grep 'delete mode' | \
    sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
    sort | \
    uniq
Hemlock/
Nightshade/

At this point we have our list of all possible candidates. This is where our second problem at the beginning shows up: just because a folder has been deleted, doesn’t mean it hasn’t been recreated. Additionally, some files being deleted does not mean that the whole folder has been deleted. 3 The cleanest way to solve this is just to loop through our list and actually check that the folder does not exist on the filesystem. 4 If it actually has been deleted, we’ll echo its name, otherwise we’ll ignore it.

CANDIDATES=`git log --diff-filter=D --summary . | \
                grep 'delete mode' | \
                sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
                sort | \
                uniq`

for DIRECTORY in $CANDIDATES; do
  if [ ! -d "$DIRECTORY" ]; then
    echo "$DIRECTORY"
  fi
done
Hemlock/

While technically this is complete, I’d like to clean up the if statement a bit for style.

CANDIDATES=`git log --diff-filter=D --summary . | \
                grep 'delete mode' | \
                sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
                sort | \
                uniq`

for DIRECTORY in $CANDIDATES; do
  [ ! -d "$DIRECTORY" ] && echo "$DIRECTORY"
done
Hemlock/

There we go. This should now give us a listing of all of the folders that have been deleted over time. I believe I now how a favorite solution.

Footnotes


  1. I was really tempted to use ‘git’ instead of ‘get’, but I didn’t. There might still be hope. ↩︎

  2. uniq only looks for contiguous duplicates, thus the call to sort before it. By sorting our data, we can ensure that all duplicates are grouped together. ↩︎

  3. I forgot that little detail when I was initially conversing with David though. Sorry Ruttka! ↩︎

  4. I’m assuming that the repo isn’t bare. If it is, rather than changing our solution, I would suggest simply checking out a copy of the repo without –bare↩︎