You have to be a little careful using the du
command to compare output across systems. For example, a few weeks ago I backed up a bunch of files from our MacBook Pro to our NAS and then was running du
commands to verify that directories on the systems had the same amount of data. There is a pitfall here, though…
Looking at the Mac, I see:
marc@hyperion:~ 08:39:21 $ du -sh Pictures 50G Pictures
and then the NAS:
nas1:/c/media# du -sh Pictures 51G Pictures
Hmmmm. The NAS has more than the Mac? Why? Well maybe because the NAS already had stuff in that directory from a prior backup? A fine theory, but not likely to be correct in this case, because I used rsync
with the --delete
option, so any files that were on the destination and not on the source should’ve been removed. Perhaps a bug in rsync? Or is it something else…? (Hint, the answer is “something else”).
Drilling down, we eventually find a low-level discrepancy:
marc@hyperion:~ 08:48:47 $ du -k Pictures/iPhoto\ Library/AlbumData.xml 16372 Pictures/iPhoto Library/AlbumData.xml nas1:/c/media# du -k Pictures/iPhoto\ Library/AlbumData.xml 16400 Pictures/iPhoto Library/AlbumData.xml
(Note that in this case, the destination showed more disk usage than the source, but as you’ll understand after reading this post, this situation could be reversed if the filesystems involved happened to be configured differently).
Ah, there we go. Somehow, this file is bigger on the NAS than it is on the Mac… Except, it isn’t…
marc@hyperion:~ 08:49:04 $ ls -l Pictures/iPhoto\ Library/AlbumData.xml -rw-r--rw- 1 marc marc 16761978 Feb 12 00:02 Pictures/iPhoto Library/AlbumData.xml nas1:/c/media# ls -l Pictures/iPhoto\ Library/AlbumData.xml -rw-r--rw- 1 marc 501 16761978 2011-02-12 00:02 Pictures/iPhoto Library/AlbumData.xml
Same exact size in bytes, according to ls
, and yet du
indicates that the disk usage is different. What is going on here?
Well, the key thing to be aware of here is that du
measures disk usage, whereas ls
is measuring the size of the files. Same thing, right? Not quite.
Disk usage here is including the overhead inherent in filesystems with fixed-size blocks.
Said another way, ls
measures the logical disk usage of files, whereas du
measures the physical disk usage of files.
To illustrate this even more clearly, we can create a file with exactly 1024 bytes and see what du
reports for it on various filesystems.
Here’s our 1024 byte (1 KB file):
marc@hyperion:/tmp 08:28:07 $ dd if=/dev/zero of=/tmp/1K bs=1 count=1024 1024+0 records in 1024+0 records out 1024 bytes transferred in 0.004783 secs (214095 bytes/sec) marc@hyperion:/tmp 08:28:49 $ ls -l /tmp/1K -rw-r--r-- 1 marc wheel 1024 Feb 13 08:28 /tmp/1K
OK, a 1 KB file. What does du
have to say about it?
marc@hyperion:/tmp 08:29:08 $ du -k /tmp/1K 4 /tmp/1K
A 1 KB file is actually taking up 4 KB. That’s because the HFS+ filesystem that this file lives on uses 4 KB blocks.
If you have the du
from GNU coreutils installed (sometimes named gdu
on a BSD-based system such as OS X to distinguish it from a default BSD-derived system du
command), then you have a nifty command-line option that reports the logical size rather than the physical size:
marc@hyperion:~ 07:41:31 $ gdu -k /tmp/1K 4 /tmp/1K marc@hyperion:~ 07:41:33 $ gdu -k --apparent-size /tmp/1K 1 /tmp/1K
Now this is on OS X, which is BSD-based, and I don’t know how to make the system’s BSD du
do this, but it’s easy to install GNU coreutils on any system (e.g.: brew install coreutils
, port install coreutils
, apt-get install coreutils
, etc…). Linux systems probably already have this out of the box and probably the command is named du
rather than gdu
because Linux (or as some would prefer, GNU/Linux) systems usually use the GNU tools by default.