Find All Directories That Have Been Deleted In Git

Had an interesting question posed to me the other day from David Ruttka: Did I have a favorite way to list all of the directories that have been deleted from a folder in git? Admittedly, this took a bit of thought. No arcane git command came to mind. Nothing did. Google wasn't that much help either. Not only did I not have a favorite way, I didn't have a way at all! Today we fix that.

We've got a few things going against us from the start. First, Git doesn't actually track directories. It knows when we add files and delete them, but directories themselves aren't tracked. Second, just because something has been deleted in the past, doesn't mean it isn't there now, just that there was a moment in time where it wasn't. Still, provided that we keep that in mind, we should be able to find our first 'favorite way'.

Let's start by seeing what info we actually can get1. A post on Stack Overflow starts us off in the right direction. We've got a one liner to find a list of files that have been removed. Git log will show us a listing of all the files modified in commits by using --summary. We can further constrain that to only deletes with --diff-filter=D. Finally, we limit it to the current directory with the trailing period.

  1. git log --diff-filter=D --summary .
commit 69191173ee35019b3180f5f763c07c6d496cbf2a
Author: Joshua Rogers <git@joshuarogers.net>
Date:   Sat Mar 1 17:49:55 2014 -0600
 
    Adds support for animations.
 
 delete mode 100644 Hemlock/blittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.h

It's definitely a great starting point. The information that I want is there, but so is a lot of other data. Really, all I want is the listing of which files were removed. We can get this by piping our output to grep and searching for the lines that contain 'delete mode'.

  1. git log --diff-filter=D --summary . | \
  2. grep 'delete mode'
 delete mode 100644 Hemlock/blittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.cpp
 delete mode 100644 Nightshade/sfmlblittablebase.h

Much better. Every line of our output represents a possible directory that has been removed, and can easily be consumed by a regex. At this point, the next obvious move is to rewrite each line with sed. Each line starts with ' delete mode ', a 6 digit file permission, a space, the directory component of the file, a final slash, and then the filename. We can represent that with the regex delete mode [0-7]{6} .*/.*.

Out of this line, the part that we want is the content after final space, but before the final slash, so we'll need to modify the regex to capture that part of the string: delete mode [0-7]{6} (.*)/.*. That should do it. Now that we have the part we care about, we're going to replace the matched content with just the directory name. Let's modify that line again. s/delete mode [0-7]{6} (.*)\/.*/\1/ Our s/ is telling sed to do a substring replacement. We denote the pattern and replacement with slashes, so we needed to escape the slash in our pattern. Finally, the \1 is telling it to replace the entire match with the first (and in our case only) capture pattern.

One more change and we should be done with the replacement: since this is going to be run through bash, we're going to need to escape any characters that have special meaning to it: (, ), {, and } At this point, we should have a listing of directories that have had at least one file removed from them.

  1. git log --diff-filter=D --summary . | \
  2. grep 'delete mode' | \
  3. sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/'
Hemlock/
Nightshade/
Nightshade/

Since the Nightshade folder had two files removed from it, it appears in our output twice. We can clean up the list of candidates a bit further by running them through sort and then uniq.2

  1. git log --diff-filter=D --summary . | \
  2. grep 'delete mode' | \
  3. sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
  4. sort | \
  5. uniq
Hemlock/
Nightshade/

At this point we have our list of all possible candidates. This is where our second problem at the beginning shows up: just because a folder has been deleted, doesn't mean it hasn't been recreated. Additionally, some files being deleted does not mean that the whole folder has been deleted. 3 The cleanest way to solve this is just to loop through our list and actually check that the folder does not exist on the filesystem. 4 If it actually has been deleted, we'll echo its name, otherwise we'll ignore it.

  1. CANDIDATES=`git log --diff-filter=D --summary . | \
  2. grep 'delete mode' | \
  3. sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
  4. sort | \
  5. uniq`
  6.  
  7. for DIRECTORY in $CANDIDATES; do
  8. if [ ! -d "$DIRECTORY" ]; then
  9. echo "$DIRECTORY"
  10. fi
  11. done
Hemlock/

While technically this is complete, I'd like to clean up the if statement a bit for style.

  1. CANDIDATES=`git log --diff-filter=D --summary . | \
  2. grep 'delete mode' | \
  3. sed 's/ delete mode [0-7]\{6\} \(.*\)\/.*/\1/' | \
  4. sort | \
  5. uniq`
  6.  
  7. for DIRECTORY in $CANDIDATES; do
  8. [ ! -d "$DIRECTORY" ] && echo "$DIRECTORY"
  9. done
Hemlock/

There we go. This should now give us a listing of all of the folders that have been deleted over time. I believe I now how a favorite solution.

Footnotes

1 I was really tempted to use 'git' instead of 'get', but I didn't. There might still be hope.
2 uniq only looks for contiguous duplicates, thus the call to sort before it. By sorting our data, we can ensure that all duplicates are grouped together.
3 I forgot that little detail when I was initially conversing with David though. Sorry Ruttka!
4 I'm assuming that the repo isn't bare. If it is, rather than changing our solution, I would suggest simply checking out a copy of the repo without --bare.

Adding an Ubuntu Machine to a Windows Domain

If you run a Linux server alongside Windows servers long enough, you'll eventually have the need (or request) to add that machine to a Windows domain. Thankfully, it's a rather easy process. Assuming you run Ubuntu, you can simply run the following commands, substituting the domain and a domain admin in place of EXAMPLE.COM and jsmith, respectively.

  1. sudo apt-get install likewise-open
  2. sudo domainjoin-cli join EXAMPLE.COM jsmith
  3. sudo lwconfig AssumeDefaultDomain true

Out of the above lines, the first two are probably self-explanatory, but the third is likely a bit more opaque. After running the second line, our machine is on the domain, but domain logons need to be fully-qualifed. (e.g. EXAMPLE.COM\jsmith). This last line allows Likewise to accept just the username, by assuming that we are going to be using EXAMPLE.COM for domain logins.

With this complete, the only thing that is left is to give "Domain Admins" the ability to sudo. Otherwise, admins only have the ability to login, but not the ability to actually administer. On Windows, the name of the group to add would be "Domain Admins". To figure out the name for Linux, we convert to lower case and replace spaces with carets. Thus "Domain Admins" on Windows is "domain^admins" on Linux. With this knowledge in hand, we can allow sudo privileges.

With your favorite text editor1, add the following line to /etc/sudoers

%domain^admins   ALL=(ALL:ALL) ALL

Footnotes

1 Vim. Your favorite text editor is Vim.

Tags: 

MCEdit Surface Circle Filter

Glowstone Perimeter

Just wanted to share a quick MCEdit filter that I put together. Given a selection, this replaces the top layer of the perimeter circle with glowstone. On our server we use this to mark off private land borders so that people don't accidentally interfere with one another or set up camp on top of one another. Feel free to use it as you like.

  1. from pymclevel.materials import alphaMaterials
  2. import math
  3.  
  4. displayName = "Player Boundary"
  5.  
  6. inputs = (
  7. )
  8.  
  9. replacableblocks = [
  10. alphaMaterials.Grass,
  11. alphaMaterials.Dirt,
  12. alphaMaterials.Stone,
  13. alphaMaterials.Sand,
  14. alphaMaterials.Gravel
  15. ]
  16. replacableIDs = [b.ID for b in replacableblocks]
  17.  
  18. def replacable(block):
  19. return block in replacableIDs
  20.  
  21. def borderblock(x, z, radius):
  22. distance = math.sqrt(math.pow(radius - x, 2) + math.pow(radius - z, 2))
  23. return math.fabs(radius - distance) < .5
  24.  
  25. def replaceblock(level, box, x, z):
  26. x += box.minx
  27. z += box.minz
  28. for y in xrange(box.maxy, box.miny, -1):
  29. block = level.blockAt(x, y, z)
  30. if replacable(block):
  31. level.setBlockAt(x, y, z, alphaMaterials.Glowstone.ID)
  32. return y
  33.  
  34. def perform(level, box, options):
  35. width = box.maxx - box.minx
  36. depth = box.maxz - box.minz
  37. radius = width / 2
  38.  
  39. for x in xrange(width):
  40. for z in xrange(depth):
  41. if borderblock(x, z, radius):
  42. replaceblock(level, box, x, z)

Tags: 

Find All Issues Referenced in Commit Messages

Software is constantly changing. Bugs are fixed, features are added, performance is increased, and finally binaries are built. With this constant cycle of change come people wanting to know exactly what the changes are. Between major versions, and minor versions, we normally have release notes to give us this information. However, between release builds the list of items is in constant flux with no let up in the people wanting to know what the changes are. So, how do we fill these requests? How can we tell what issues have been addressed between two arbitrary branches? 1

For our purposes, we'll try to see what changed between Release/2.1 and RC/2.1/Candidate-8. Let's start by examining the commit log.

  1. git log RC/2.1/Candidate-8

commit 4e1a63c733507921ff7be480084106816a37c9fc
Author: John Doe <jdoe@example.com>
Date:   Fri May 17 13:49:14 2013 -0500
 
    OW-397: Added export functionality to the UI.
...

Now things are looking good: the log messages follow a defined format: "PROJECT-NUMBER: Brief description of the commit." If we modify the command slightly, we can filter the results to only show log entries that exist in one branch that dont exist in the other.

  1. git log origin/Release/2.1...origin/RC/2.1/Candidate-8

commit 4e1a63c733507921ff7be480084106816a37c9fc
Author: John Doe <jdoe@example.com>
Date:   Fri May 17 13:49:14 2013 -0500
 
    OW-397: Added export functionality to the UI.
 
commit 779e41d3881703a1f8ada527de5fbcc29dd1bba0
Author: John Smith <jsmith@example.com>
Date:   Wed May 15 15:38:40 2013 -0500
 
    OW-404: Reflect changes to the model in the sidebar.

This is great information for a human and follows a well defined format, but it isn't really kind to parsers. We can fix this by adding the argument --oneline to git log.

  1. git log origin/Release/2.1...origin/RC/2.1/Candidate-8 --oneline

4e1a63c OW-397: Added export functionality to the UI.
779e41d OW-404: Reflect changes to the model in the sidebar.

This new output always follows the format "hash, space, issue id, colon, message". Thus, the issue number will always be the first string matching "alphanum-num" after the hash. This can be expressed with the regex "^[A-F0-9]+ [A-Z0-9]+-[0-9]+".

At this point, we're going to call on another tool from our command line arsenal: grep. 2 Grep will take an input stream, allow us to search it with a pattern, and return matches to us. Normally grep performs a simple substring search and returns the entire line if there is a match. We're going to modify this behavior a bit by passing grep -i -o -E. The arguments tell it to ignore case, display only the matched portion of the line, and that we want to use a regex for parsing.

  1. git log origin/Release/2.1...origin/RC/2.1/Candidate-8 --pretty=oneline | \
  2. grep -i -o -E "^[A-F0-9]+ [A-Z0-9]+-[0-9]+"

4e1a63c OW-397
779e41d OW-404

Each line now follows the pattern "hash project-number". We can trim this down even more by passing each line to awk. We'll split each line on space and then take the second item by passing each line to awk -F " " '{print $2}'.

  1. git log origin/Release/2.1...origin/RC/2.1/Candidate-8 --pretty=oneline | \
  2. grep -i -o -E "^[A-F0-9]+ [A-Z0-9]+-[0-9]+" | awk -F" " '{print $2}'

OW-397
OW-404

At this point it looks like we've got the results that we want, with one important exception: duplicates. Since we're looking at each line individually, we'll see one mention for an issue in every commit it appears in. We'll finish up by cleaning the list with sort and uniq. This will sort all of the issues and then prune any consecutive duplicates from our list.

At this point, we have a quick little script to list all of the issues addressed between two commits.

  1. git log origin/Release/2.1...origin/RC/2.1/Candidate-8 --pretty=oneline | \
  2. grep -i -o -E "^[A-F0-9]+ [A-Z0-9]+-[0-9]+" | awk -F" " '{print $2}' | sort | uniq

Footnotes

1 Ideally the issue tracker would be fully authoritative, but sometimes that just isn't the case. Someone closes an issue prematurely or forgets to close it entirely. That, however, is an issue for another post.
2 All of the tools that we use in this post come preinstalled on Linux and Mac. On Windows they are provided with MSysGit, though they can be installed manually through Cygwin if desired.

Tags: 

Pages

3032 days since since I met you..
 
 
Powered by Drupal