I have recently released a new filesystem tool of mine: hsync.
When synchronizing 2 devices where some folders have been renamed,
synchronization tools like *rsync* do not see the renames and will transfer
the full folder content instead.
I've always found it a big waste of time and resources!
Example: you have an old backup of a massive folder 'media' which you
recently renamed 'medianew' on your local drive.
If you synchronize /backup' from /local with rsync, /backup/media will be
deleted and /local/medianew will be transferred to /backup/medianew.
However, if you rename '/backup/media -> /backup/medianew' before, the
subsequent rsync run will be much faster.
Since I could not find any program that would detect renames, I decided to
write my own tool.
Your feedback is more than welcome!rsync
From 'hsync -h':
Usage: hsync SOURCE TARGET
Filesystem hierarchy synchronizer
Rename files in TARGET so that identical files found in SOURCE and TARGET
the same relative path.
The main goal of the program is to make folders synchronization faster by
sparing big file transfers when a simple rename suffices. It complements
synchronization programs that lack this capability.
hsync previews all changes by default. You can tweak each rename operation
to your liking.
If you want to mirror SOURCE to TARGET with rsync, running hsync beforehand
can dramatically speed up the process:
$ hsync SOURCE TARGET
$ rsync -livr --delete-excluded --progress -- SOURCE/ TARGET
It is written in Go and the binary is statically linked. The program should
be pretty lightweight [image: smile]
Since the goal is to provide a significant speed-up to other
synchronization tools, performance is a critical part of the program.
I've tried hard at minimizing I/O since this is the definitive bottleneck
A run over a few 100000 files (~100 GB) would finish in a few minutes on a
mildly slow hard drive on cold cache. A second run would typically finish
within seconds on hot cache.
I am currently using a rolling md5 checksum to match files. An adler32
algorithm could be a bit faster but would yield more clashes. I am not sure
which one is best.
Some parts could be parallelized, but a lot of thread synchronization would
be required. It is unclear to me if it would yield better performance in
all scenarios. I still need to work on it.
Memory-wise, hsync can eat several 100 MB from your RAM when working on
millions of files. This could probably be optimized, albeit not easily.
Comments are welcome.
By default, all changes are previewed and you can tweak the renaming
operations before proceeding.
Nevertheless, there are many edge-cases with funky filesystem
configurations that may lead to unexpected results.
A typical example would be bind mounts: most (all?) kernels do not support
cross-mount points renaming, so if the TARGET folder contain such bind
mounts, some renames might fail.
If you experience any weird behaviour or if you are a filesystem guru and
can spot some mistakes in my (very short) code, you are more than welcome
to share your knowledge!
See the Implementation details
Official web page <http://ambrevar.bitbucket.org/hsync>
Issue tracker <https://bitbucket.org/ambrevar/hsync/issues>: Please file
bug reports there, thanks!
Implementation details <https://godoc.org/bitbucket.org/ambrevar/hsync>
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/d/optout.