On 24/09/2013 00:17, David Christensen wrote:

I'm looking for a hash function and a related function or operator such

H(string1 . string2) = f(H(string1), H(string2))
H(string1 . string2) = H(string1) op H(string2)


H() is the hash function
string1 is a string
string2 is a string
. is the string concatenation operator
f() is a function
op is a binary operator
On 09/23/13 15:29, Rob Dixon wrote:
Could you explain the problem you're trying to solve?
Writing scripts that look for duplicate, similar, and/or
missing files.
I assume this is about paths and filenames. Have you considered an rsync

I also assume that you want to communicate as little as possible, so you
don't have supersets of all strings on all sides. (or it would become a
simple indexing problem)

I also assume that you are more interested in missing items, so
hash-value collisions are not a problem.

I also assume that the set of string1 is smaller than that of string2,
let's say 100 vs. 10000 different values.

For local deduplication, you would store paths as a directory name and a


And then have a list of filenames, and per filename in which path it exists.



For combining index values, use something like: ( i1 << N ) | i2.
(where N is the number of bits needed by i2)

I would not involve string concatenation: keep things separate once
separated. Use arrays.

Use (parts of) md5's of strings, if you need to compare to remote locations.

So best first explain *more* now about what you try to solve.
A single or multiple computers, connected or not?

Suppose 1 computer sends a concise email about what it has, such that
the other computer can reply with an even conciser email about what it
has, and what it needs. IOW: diff+patch.

Greetings, Ruud

