We.Love.Privacy.Club @<prologic https://twtxt.net/user/prologic/twtxt.txt> "Current Twt Hash spec and probability of hash collision: The probability of a **Twt Hash collision** depends on the size of the hash and the nu ..."

twtxt.net

Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

Breakdown:

Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since ( 2^5 = 32 )).
7 characters: With 7 characters, the total number of possible hashes is:
[  32^7 = 3,518,437,208  ]  This gives about 3.5 billion possible hash values.

Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
[ P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
]
Where:

(n) is the number of generated Twt Hashes.
(M = 32^7 = 3,518,437,208) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

For 1,000 hashes:
[  P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
]
For 10,000 hashes:
[  P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
]
For 100,000 hashes:
[  P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
]

Conclusion:

For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).

⤋ Read More

Participate