This article was first written in February 2005 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/206).
GZIP="--rsyncable" tar zcvf toto.tar.gz /toto
Why do you need this special option ?
Because if you compress your files before synchronising them with rsync,  a very small change in one original file may force rsync to re-transmit  the whole compressed tar.gz file, instead of just the changed portion.
The basic reason is that rsync works at the byte level : very roughly,  it compares the old copy of the file with the latest source, and transmits  every byte that is different to update the old copy and make it identical  to the new.  rsync uses a smart way of doing these comparisons, so that  in most cases only a tiny portion of the file needs to be actually transmitted.
Unfortunately, file compression algorithms which use an 
adaptative compression method (like most do), defeat the rsync logic and can cause  the whole file to be retransmitted, even if only one byte has been changed.
Why is that so ?
An 
adaptative compression method uses an analysis of the bytes  already processed, to determine how best to compress the following bytes  of the file.  For example, suppose the compression program starts at byte  0 with a certain compression method. After 1000 bytes  have been compressed,  the program will recalculate a new compression method, based on what it  found in bytes 0-999.  It will then insert a new compression table into  the file, and use this table to compress the next 1000 bytes.  Then it  recalculates it's compression table based on the bytes 0-1999, and does  the same, and so on. This means that a change of one byte in bytes 0-999,  can potentially change the compression method for the rest of the file,  and that the rest of the output bytes will be totally different.  And because  rsync compares the files byte per byte, it will not find any similar block  of bytes between the old and new file, thus will be forced to resend the  whole new compressed file.
The --rsyncable option above fixes this problem.  With this option,  gzip will regularly "reset" his compression algorithm to what it was at  the beginning of the file.  So if for example there was a change at byte  23, this change will only affect the output up to maximum (for example)  byte #9999.  Then gzip will restart 'at zero', and the rest of the compressed  output will be the same as what it was without the changed byte 23. This  means that rsync will now be able to re-synchronise between the old and  new compressed file, and can then avoid sending the portions of the file  that were unmodified.
Now, for the example above, suppose "/toto" is a directory with plenty  of small files for a total of 50 MB, thus the uncompressed tar file would  be about 50 MB.  By compressing it with gzip, we bring this down to 15  MB in the tar.gz file. Now we 'rsync' this file with a remote system.
If nothing has changed since yesterday in the /toto directory, the tar.gz  file will be the same as yesterday, rsync will detect this and the file  will not be transmitted.
On the other hand, if one single small file at the beginning of the 'tar'  has changed, then without the --rsyncable option, most of the  tar.gz file will be different, and rsync will have to transmit almost 15  MB to the remote rsync target system. In that case, it would have been  better to not compress the tar file at all !
With the --rsyncable option, it is possible that only 1000 bytes  would be different in the tar.gz file, so only 1000 bytes would be transmitted  by rsync, for the same end-result.
References :
For an rsync intro, see 
here
For a full explanation (and only for Real Programmers), see 
here
There is also a good summary of the whole rsync/gzip/debian situation 
here 
      
    
Comments
thanks for explaining this. Very easy for me to understand now. i'll be using this GZIP feature for my transfers from now on. Cheers,
Felipe
I recently stumbled on that in the man page for gzip, and I also recently started rsyncing a few hundred gzips to a host, so it was quite useful. Glad to have an explanation.
[...] is a little known patch to GNU gzip floating around that is included in Debian-based linux distributions which resolves the [...]
[...] http://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/ [...]