@zvava@twtxt.net I might misunderstand what you wrote, but only hashing the message once and storing the hash together with the message in the database seems a way better approch to me. Itās fixed and doesnāt change, so thereās no need to recompute it during runtime over and over and over again. You just have it. And can easily look up other messages by hash.
@lyse@lyse.isobeef.org Damn. That was stupid of me. I should have posted examples using 2026-03-01 as cutoff date. š
In my actual test suite, everything uses 2027-01-01 and then I have this, hoping that thatās good enough. š„“
def test_rollover():
d = jenny.HASHV2_CUTOFF_DATE
assert len(jenny.make_twt_hash(URL, d - timedelta(days=7), TEXT)) == 7
assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=3), TEXT)) == 7
assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=2), TEXT)) == 7
assert len(jenny.make_twt_hash(URL, d - timedelta(seconds=1), TEXT)) == 7
assert len(jenny.make_twt_hash(URL, d, TEXT)) == 12
assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=1), TEXT)) == 12
assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=2), TEXT)) == 12
assert len(jenny.make_twt_hash(URL, d + timedelta(seconds=3), TEXT)) == 12
assert len(jenny.make_twt_hash(URL, d + timedelta(days=7), TEXT)) == 12
(In other words, I donāt care as long as itās before 2027-01-01. šš )
Hm, so regarding the hash change:
https://git.mills.io/yarnsocial/twtxt.dev/pulls/28
How about 2026-03-01 00:00:00 UTC as the cut-off date? š¤
All my newly added test cases failed, that movq thankfully provided in https://git.mills.io/yarnsocial/twtxt.dev/pulls/28#issuecomment-20801 for the draft of the twt hash v2 extension. The first error was easy to see in the diff. The hashes were way too long. Youāve already guessed it, I had cut the hash from the twelfth character towards the end instead of taking the first twelve characters: hash[12:] instead of hash[:12].
After fixing this rookie mistake, the tests still all failed. Hmmm. Did I still cut the wrong twelve characters? :-? I even checked the Go reference implementation in the document itself. But it read basically the same as mine. Strange, what the heck is going on here?
Turns out that my vim replacements to transform the Python code into Go code butchered all the URLs. ;-) The order of operations matters. I first replaced the equals with colons for the subtest struct fields and then wanted to transform the RFC 3339 timestamp strings to time.Date(ā¦) calls. So, I replaced the colons in the time with commas and spaces. Hence, my URLs then also all read https, //example.com/twtxt.txt.
But that was it. All test green. \o/
Linux Looks To Remove SHA1 Support For Signing Kernel Modules
Patches posted to the Linux kernel mailing list this week are seeking to remove SHA1 support for signing of kernel modules. This is part of the larger effort in the industry for moving away from SHA1 given its vulnerabilities to hash collisions and superior hashing algorithms being available⦠ā Read more
No, I was using an empty hash URL when the feed didnāt specify a url metadata. Now Iām correctly falling back to the feed URL.
Hmmm, looks like my twt hash algorithm implementation calculates incorrect values. Might be the tilde in the URL that throws something off. :-? At least yarnd and jenny agree on a different hash.
Net zero Australia LIVE updates: Hastie, Cash, Henderson speak as Liberals hash out emissions policy
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out emissions policy
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out emissions policy
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out emissions policy; Meeting set to last hours; MPs to speak in alphabetical order
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out net zero emissions policy; McCormack says Coalition should stick together regardless of decision
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out net zero emissions policy; McCormack says Coalition should stick together regardless of decision
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Net zero Australia LIVE updates: Liberals in party room meeting set to hash out net zero emissions policy; McCormack says Coalition should stick together regardless of decision
Follow along as we bring you the latest live news updates from Australia and around the world. ā Read more
Just typing twts directly into my twtxt file.
Details:
- Opening my twtxt file remotely using
vim scp://user@remote:port//path/to/twtxt.txt
- Inserting the date, time and tab part of the twt with
:.!echo "$(date -Is)\t"
- In case I need to add a new line I just
Ctrl+Shift+u, type in the2028and hitEnter
- In order to replay, you just steal a twt hash from your favorite Yarn instance.
It looks tedious, but itās fun to know I can twt no matter where I am, as long as can ssh in.
Hash Me If You CanāāāHow I Beat a 2-Second Hashing Challenge on RingZer0Team ā Read more
@zvava@twtxt.net My clients trusts the first url field it finds. If there is none, it uses the URL that Iām using for fetching the feed.
No validation, no logging.
In practice, Iāve not seen issues with people messing with this field. (What I do see, of course, is broken threads when people do legitimate edits that change the hash.)
I donāt see a way how anyone can impersonate anybody else this way. š¤ Sure, you could use my URL in your url field, but then what? You will still show up as zvava in my client or, if you also change your nick field, as movq (zvava).
@zvava@twtxt.net Yes, the specification defines the first url to be used for hashing. No matter if it points to a different feed or whatever. Just unsubscribe from malicious feeds and youāre done.
Since the first url is used for hashing, it must never change. Otherwise, it will break threading, as you already noticed. If your feed moves and you wanna keep the old messages in the same new feed, you still have to point to the old url location and keep that forever. But you can add more urls. As I said several times in the past, in hindsight, using the first url was a big mistake. It would have been much better, if the last encountered url were used for hashing onwards. This way, feed moves would be relatively straightforward. However, that ship has sailed. Luckily, feeds typically donāt relocate.
@movq@www.uninformativ.de You were seeing that mayn hash collisions for you to notice this? š±
The twtiverse appears to have shrunk. Among the 61 feeds that I follow, I donāt see any hash collisions anymore. š¤
Exactly, @zvava@twtxt.net, I agree. (Although, in my client at least, I wouldnāt use hashes anywhere.)
@alexonit@twtxt.alessandrocutolo.it Yeah I think weāre overstating the UNIX principles a bit here 𤣠I get what youāre trying to say though @zvava@twtxt.net š If I could go back in time and do it all over again, I would have gotten the Hash length correct and I would have used SHA-256 instead. But someone way smarter than me designed the Twt Hash spec, we adopted it and well here we are today, it works⢠š
@zvava@twtxt.net Going to have to hard disagree here Iām sorry. a) no-one reads the raw/plain twtxt.txt files, the only time you do is to debug something, or have a stick beak at the comments which most clients will strip out and ignore and b) Iām sorry youāve completely lost me! Iām old enough to pre-date before Linux became popular, so Iām not sure what UNIX principles you think are being broken or violated by having a Twt Subject (Subject) whose contents is a cryptographic content-addressable hash of the āthingā⢠youāre replying to and forming a chain of other replies (a thread).
Iām sorry, but the simplest thing to do is to make the smallest number of changes to the Spec as possible and all agree on a āMagic Dateā for which our clients use the modified function(s).
@bender@twtxt.net Well honestly, this is just it. My strong position on this is quite simple:
Do the simplest thing that could work.
Itās one of the age old UNIX philosphies.
Therefore, the simplest thing⢠to do here is to just increase the hash length, mark a magic⢠date/time as @lyse@lyse.isobeef.org has indicated and call it a day. Weāll then be fine for a few hundred years, at which point thereāll be no-one left alive to give a shit⢠anyway š¤£
@prologic@twtxt.net considering other alternatives we have seeing (of which I have lost track already), yes. Why donāt you guys (client makers) take a step at a time and, for now, increase the hash length to deal with the collisions. Then location-based addressing can be added⦠or not, you know. š
Of course we still have to fix the hashing algorithm and length.
I finally resolved my issues with hashing twts⦠with REGEX!
Dates in JavaScript are truly strange creatures.
SHA2 Fatal Flaw? (Hash Length Extension Attack) - Computerphile ā Read more
@lyse@lyse.isobeef.org I donāt think thereās any point in continuing the discussion of Location vs. Content based addressing.
I want us to preserve Content based addressing.
Letās improve the user experience and fix the hash commission problems.
@prologic@twtxt.net I know we wonāt ever convince each other of the otherās favorite addressing scheme. :-D But I wanna address (haha) your concerns:
I donāt see any difference between the two schemes regarding link rot and migration. If the URL changes, both approaches are equally terrible as the feed URL is part of the hashed value and reference of some sort in the location-based scheme. It doesnāt matter.
The same is true for duplication and forks. Even today, the ācannonical URLā has to be chosen to build the hash. Thatās exactly the same with location-based addressing. Why would a mirror only duplicate stuff with location- but not content-based addressing? I really fail to see that. Also, who is using mirrors or relays anyway? I donāt know of any such software to be honest.
If there is a spam feed, I just unfollow it. Done. Not a concern for me at all. Not the slightest bit. And the byte verification is THE source of all broken threads when the conversation start is edited. Yes, this can be viewed as a feature, but how many times was it actually a feature and not more behaving as an anti-feature in terms of user experience?
I donāt get your argument. If the feed in question is offline, one can simply look in local caches and see if there is a message at that particular time, just like looking up a hash. Whereās the difference? Except that the lookup key is longer or compound or whatever depending on the cache format.
Even a new hashing algorithm requires work on clients etc. Itās not that you get some backwards-compatibility for free. It just cannot be backwards-compatible in my opinion, no matter which approach we take. Thatās why I believe some magic time for the switch causes the least amount of trouble. You leave the old world untouched and working.
If these are general concerns, Iām completely with you. But I donāt think that they only apply to location-based addressing. Thatās how I interpreted your message. I could be wrong. Happy to read your explanations. :-)
Here is just a small list of things⢠that Iām aware will break, some quite badly, others in minor ways:
- Link rot & migrations: domain changes, path reshuffles, CDN/mirror use, or moving from txt ā jsonfeed will orphan replies unless every reader implements perfect 301/410 history, which they wonāt.
- Duplication & forks: mirrors/relays produce multiple valid locations for the same post; readers see several āparentsā and split the thread.
- Verification & spam-resistance: content addressing lets you dedupe and verify youāre pointing at exactly the post you meant (hash matches bytes). Location anchors can be replayed or spoofed more easily unless you add signing and canonicalization.
- Offline/cached reading: without the original URL being reachable, readers canāt resolve anchors; with hashes they can match against local caches/archives.
- Ecosystem churn: all existing clients, archives, and tools that assume content-derived IDs need migrations, mapping layers, and fallback logic. Expect long-lived threads to fracture across implementations.
Weāve been discussing the idea of changing the threading model from Content-based Addressing to Location-based addressing for years now. The problem is quite complex, but I feel I have to keep reminding yāall of the potential perils of changing this and the pros/cons of each model:
With content-addressed threading, a reply points at something thatās intrinsically identified (hash of author/feed URI + timestamp + content). That ID never changes as long as the content doesnāt. Switching to location-based anchors makes the reply target extrinsicāit now depends on where the post currently lives. In a pull-based, decentralised network, locations drift. The moment they do, thread identity fragments.
@zvava@twtxt.net There would be only one hash for a message. Some to be defined magic date selects which hash to use. If the message creation timestamp is before this epoch, hash it with v1, otherwise hammer it through v2. Eventually, support for v1 could be dropped as nobody interacts with the old stuff anymore. But Iād keep it around in my client, because why not.
If users choose a client which supports the extensions, they donāt have to mess around with v1 and v2 hashing, just like today.
As for the school of thought, personally, Iād prefer something else, too. Iām in camp location-based addressing, or whatever it is called. There more I think about it, a complete redesign of twtxt and its extensions would be necessary in my opinion. Retrofitting has its limits. Of course, this is much more work, though.
@zvava@twtxt.net I was about to suggest that you post some examples. By now, weāre pretty good at debugging hashing issues, because that happens so often. š But it looks like you figured it out on your own. āļø
@zvava@twtxt.net we have to amend the spec and increase the hash length. We just havenāt done so yet š
@zvava@twtxt.net may I recommend to change the mention format upon hitting reply to something similar to what itās used in Yarn, and perhaps hiding the hash on the post too? Looking good!
@movq@www.uninformativ.de Yeah, weāve seen how this plays out in practice 𤣠@dce@hashnix.club My advice, do what @movq@www.uninformativ.de has hinted at and donāt change the 1st # url = field in your feed. Iām not sure if you had already, but the first url field is kind of important in your feed as it is used as the āHashing URIā for threading.
@dce@hashnix.club Ah, oh, well then. š„“
My client supports that, if you set multiple url = fields in your feedās metadata (the top-most one must be the āmainā URL, that one is used for hashing).
But yeah, multi-protocol feeds can be problematic and some have considered it a mistake to support them. š¤
huh.. so not even trying to be compatible with existing hashes?
I have zero mental energy for programming at the moment. š«¤
Iāll try to implement the new hashing stuff in jenny before the ādeadlineā. But I donāt think youāll see any texudus development from me in the near future. ā¹ļø
Hash Collisions & The Birthday Paradox - Computerphile ā Read more
@ About the URL, since it no longer used for hashing there might be no need to change it. I agree that we keep all the parts that already are out there for the most parts. Instead of a contact field you could also just use links like: link = Email mailto:user@example.dk or link = Signal https://signal.me/sthF4raI5Lg_ybpJwB1sOptDla4oU7p[...]
@lyse@lyse.isobeef.org Yeah to avoid cutting off bits at the end making hashes end in either q or a š¤£
if clauses to this. My point is: Every time I see a hash, Iād like to have a hint as to where to find the corresponding twt.
The reason I think this can work so well and Iām in full support of it is that itās the least disruptive way to resolve the issue of:
where did this hash come from?