Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

Breakdown:
  1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since ( 2^5 = 32 )).
  2. 7 characters: With 7 characters, the total number of possible hashes is:
    [
 32^7 = 3,518,437,208
 ]
 This gives about 3.5 billion possible hash values.
Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
[
P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
]
Where:

  • (n) is the number of generated Twt Hashes.
  • (M = 32^7 = 3,518,437,208) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

  • For 1,000 hashes:
    [
 P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
    ]
  • For 10,000 hashes:
    [
 P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
    ]
  • For 100,000 hashes:
    [
 P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
    ]
Conclusion:
  • For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
  • However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).

⤋ Read More