Thursday, December 15, 2011

De-Duping files on BTRFS.

Brave souls can test BTRFS for a couple of Fedora releases.

Removing duplicate/redundant files on filesystems is a common thing, e.g. when creating regular backups or so. On ext4 this can be realized using traditional hardlinks.
Hardlinks all point to the same blocks on the logical drive below. So if a write happens to one of the hardlinks, this also "appears" in all other hardlinks (which point to he same - modified - block).
This is no problem in a backup scenario, as you normally don't modify backuped files.

In my case I wanted to remove redundant files that might get modified and the changes shouldn't be reflected in all other copies. So what I want to achieve is to let several links (files) point to the same block for reading, but if a write happens to one block this should be just happen to the one file (link). So, copy the file on write. Wait, don't we know that as CoW? Yep.

Luckily BTRFS allows cow files using the cp --reflink command.
The following snippet replaces all copies of a file with "light weight" aka cow copies.

#!/bin/bash
# Usage: dedup.sh PATH_TO_HIER_WITH_MANY_EXPECTED_DUPES
mkdir sums
find $@ -type f -print0 | while read -d $'\0' -r F
do
  echo -n "$F : "
  FHASH=$(sha256sum "$F" | cut -d" " -f1);
  # If hashed, it's probably a dupe, compare bytewise 
  # and create a reflink (so cow)
  if [[ -f "sums/$FHASH" ]] && cmp -s "sums/$FHASH" "$F";
  then
    echo "Dup." ;
    rm "$F" ;
    cp --reflink "sums/$FHASH" "$F" ;

  # It's a new file, create a hash entry.
  else
    echo "New." ;
    cp --reflink "$F" "sums/$FHASH" ;
  fi
done
rm sums/*
rmdir sums

And in general, btrfs didn't yet eat my data, it even survived two power losses ...
Update: Updated to handle files with special characters. This script also makes some assumptions, e.g. the files should not be modified while running this script.

9 comments:

  1. Nice. Wonderful short.

    But what I would like more is dedup on the block level. That is much more common than files

    ReplyDelete
  2. Fred beat me to the question. So I take it that Btrfs doesn't do block level de-duplication. Is there any FS that does this on Linux? It would be really handy with virtualization.

    ReplyDelete
  3. I'm no expert on btrfs internals - or even in userspace, but ... Formats like qcow2 or qed, the container formats for virtual guests, already provide such a concept known as "backing image" in the qemu world: http://dummdida.blogspot.com/2011/09/qemu-and-backing-images.html .
    And back to btrfs, maybe I wasn't clear enough in the post, but if writes happen to the file, reflink ledas to "perform a lightweight copy, where the data blocks are copied only when modified." [ref. man cp] - So only modified blocks are copied, not the whole file.

    ReplyDelete
  4. I started to test btrfs on F16 but it did ate my FS, and since it don't have a fsck yet I had to reformat the and reinstall. Good thing that I just use it for the system.

    ReplyDelete
    Replies
    1. Who knows if there ever will be an fsck? :)
      You might be interested in the recovery mount-option:

      $ sudo remount -orecovery,[...] /my/btrfs/mount

      "recovery - enable autorecovery upon mount; currently it scans list of several previous tree roots and tries to use the first readable" (http://btrfs.ipv5.de/index.php?title=Getting_started)

      Delete
  5. To thee who doubt the awesome. Create a reference link.. append to the end of the new link.. remove.. the.. original.. file..

    Insert drama noises...

    It's not a backing file, it's shared file sectors. I do love block level dedupe but I'll take this in a heartbeat.

    I use a laptop and use btrfs as the root. This means I can safely dedupe files I know won't be accessed.. and for those I know will I can safely reboot into a rescue partition with the root off as an unused mount and run a dedupe script.. think of the savings! For an added bonus if you can identify a bunch of files that have a similar set of starting bytes you can initially dedupe that and the append the unique bits afterwards.

    Thanks BTRFSMOFOS!

    ReplyDelete
  6. One problem with sript. It cannot handle directory names with white spaces. For example directory with name "stochasticke modely" will act like this:

    luvarga@blackpc ~/documents/skola $ ./dedup.sh stochasticke\ modely/
    find: `stochasticke': Directory or file does not exist
    find: `modely/': Directory or file does not exist

    PS: Running on 6 GB directory containing only some svn repository reduced disk usage to 2.8 GB. Great. (svn stat and svn up working without any problem)

    ReplyDelete
    Replies
    1. Also I have found that there are some limitations when working with subvolumes. If you have more subvolumes, you have to move dedup.sh script to particular subvolume and run it there, because of sums directory have to be on same subvolume as optimized directory.

      Delete