@movq@www.uninformativ.de Itβs formatted just fine π€
This is formatted poorly on twtxt.net, so hereβs a plain text file: https://movq.de/v/971c5a125d/wall-of-text.txt
Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust
BrianFagioli writes: Google has released Magika 1.0, a stable version of its AI-based file type detection tool, and rebuilt the entire engine in Rust for speed and memory safety. The system now recognizes more than 200 file types, up from about 100, and is better at distinguishing look-alike formats such as JSON vs JSONL, TS β¦ β Read more
@zvava@twtxt.net the second format (the one you think should be changed to), is it backwards compatible to whatβs currently in place? I believe the first one would be.
@lyse@lyse.isobeef.org Hm, I couldnβt trick yt-dlp into downloading the correct format. Works in the browser, though. π
@prologic@twtxt.net I know we wonβt ever convince each other of the otherβs favorite addressing scheme. :-D But I wanna address (haha) your concerns:
I donβt see any difference between the two schemes regarding link rot and migration. If the URL changes, both approaches are equally terrible as the feed URL is part of the hashed value and reference of some sort in the location-based scheme. It doesnβt matter.
The same is true for duplication and forks. Even today, the βcannonical URLβ has to be chosen to build the hash. Thatβs exactly the same with location-based addressing. Why would a mirror only duplicate stuff with location- but not content-based addressing? I really fail to see that. Also, who is using mirrors or relays anyway? I donβt know of any such software to be honest.
If there is a spam feed, I just unfollow it. Done. Not a concern for me at all. Not the slightest bit. And the byte verification is THE source of all broken threads when the conversation start is edited. Yes, this can be viewed as a feature, but how many times was it actually a feature and not more behaving as an anti-feature in terms of user experience?
I donβt get your argument. If the feed in question is offline, one can simply look in local caches and see if there is a message at that particular time, just like looking up a hash. Whereβs the difference? Except that the lookup key is longer or compound or whatever depending on the cache format.
Even a new hashing algorithm requires work on clients etc. Itβs not that you get some backwards-compatibility for free. It just cannot be backwards-compatible in my opinion, no matter which approach we take. Thatβs why I believe some magic time for the switch causes the least amount of trouble. You leave the old world untouched and working.
If these are general concerns, Iβm completely with you. But I donβt think that they only apply to location-based addressing. Thatβs how I interpreted your message. I could be wrong. Happy to read your explanations. :-)
@alexonit@twtxt.alessandrocutolo.it Personally, I find the reversed order of URL first and then timestamp more natural to reference something. Granted, URL last would be kinda consistent with the mention format. However, the timestamp doesnβt act as a link text or display text like in a mention, so, itβs some different in my opinion. But yeah.
@zvava@twtxt.net @movq@www.uninformativ.de Iβm not entirely sure about the spaces, but maybe they were omitted to simplify parsing of mentions in the form of @<nick url>. If the next token after the @<nick does not look like a URL, itβs not a mention but regular text. This is just wild guessing, though.
Looking at the regex and tests in the original twtxt reference implementation seems to confirm that theory in the sense as it relies on whitespace as the delimiter:
https://lyse.isobeef.org/tmp/screenshot-2025-09-17-21-30-25.png
Another thing about nicks is that the original twtxt reference implementation converts nicks to all lowercase:
https://lyse.isobeef.org/tmp/screenshot-2025-09-17-21-20-39.png
You probably know this already, the original twtxt file format specification can be found here: https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
As for extensions, I donβt know of anything outside of twtxt.dev that has actually been (partially) implemented. However, there is also the issue tracker of the official reference implementation. You might wanna dig through that. For example, there is an alternative suggestions of multiline messages: https://github.com/buckket/twtxt/issues/157
@kat@yarn.girlonthemoon.xyz all @prologic@twtxt.net has to do is to allow <details> and the subset under it. Granted, it could be implemented on the formatting toolbar tooβ¦
@zvava@twtxt.net may I recommend to change the mention format upon hitting reply to something similar to what itβs used in Yarn, and perhaps hiding the hash on the post too? Looking good!
Haha, fun! I browsed your gopher hole a little bit. I noticed some entries are fully justified (formatting), while others are not. I didnβt notice a pattern, though it makes sense not to use justification on entries with code. Yet, some prose entries are, and some are not. A mystery. :-)
@lyse@lyse.isobeef.org @kat@yarn.girlonthemoon.xyz I spent so much time in the past figuring out if something is a dict or a list in YAML, for example.
What are the types in this example?
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
price: 133.7
quantity: 1
items is a dict containing β¦ a list of two other dicts? Right?
It is quite hard for me to grasp the structure of YAML docs. π’
The big advantage of YAML (and JSON and TOML) is that itβs much easier to write code for those formats, than it is with XML. json.loads() and youβre done.
@movq@www.uninformativ.de @kat@yarn.girlonthemoon.xyz Any text format beats a binary configuration format. However, YAML and XML are both terrible choices in my opinion. Iβd prefer YAML over XML if I had to.
Xfce does one thing very right: It stores its settings in plain-text XML files. This allows me to easily read, track, and maybe even distribute these settings to other machines.
(Unlike GNOMEβs dconf, which uses some binary file format. Fun fact: The older and now deprecated gconf also used XML files.)
@prologic@twtxt.net Yeah, this really could use a proper definition or a βmanifestβ. π Many of these ideas are not very wide spread. And I havenβt come across similar projects in all these years.
Letβs take the farbfeld image format as an example again. I think this captures the βspiritβ quite well, because this isnβt even about code.
This is the entire farbfeld spec:
farbfeld is a lossless image format which is easy to parse, pipe and compress. It has the following format:
ββββββββββ€ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Bytes β Description β
β βββββββββͺββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β 8 β "farbfeld" magic value β
ββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β 4 β 32-Bit BE unsigned integer (width) β
ββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β 4 β 32-Bit BE unsigned integer (height) β
ββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β [2222] β 4x16-Bit BE unsigned integers [RGBA] / pixel, row-major β
ββββββββββ§ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The RGB-data should be sRGB for best interoperability and not alpha-premultiplied.
(Now, I donβt know if your screen reader can work with this. Let me know if it doesnβt.)
I think these are some of the properties worth mentioning:
- The spec is extremely short. You can read this in under a minute and fully understand it. That alone is gold.
- There are no βknobsβ: Itβs just a single version, itβs not like thereβs also an 8-bit color depth version and one for 16-bit and one for extra large images and one that supports layers and so on. This makes it much easier to implement a fully compliant program.
- Despite being so simple, itβs useful. Iβve used it in various programs, like my window manager, my status bars, some toy programs like βtuxeyesβ (an Xeyes variant), or Advent of Code.
- The format does not include compression because it doesnβt need to. Just use something like bzip2 to get file sizes similar to PNG.
- It doesnβt cover every use case under the sun, but it does cover the most important ones (imho). They have discussed using something other than RGBA and decided itβs not worth the trouble.
- They refrained from adding extra baggage like metadata. It would have needlessly complicated things.
I did a βlectureβ/βworkshopβ about this at work today. 16-bit DOS, real mode. πΎ Pretty cool and the audience (devs and sysadmins) seemed quite interested. π₯³
- People used the Intel docs to figure out the instruction encodings.
- Then they wrote a little DOS program that exits with a return code and they used uhex in DOSBox to do that. Yes, we wrote a COM file manually, no Assembler involved. (Many of them had never used DOS before.)
- DEBUG from FreeDOS was used to single-step through the program, showing what it does.
- This gets tedious rather quickly, so we switched to SVED from SvarDOS for writing the rest of the program in Assembly language. nasm worked great for us.
- At the end, we switched to BIOS calls instead of DOS syscalls to demonstrate that the same binary COM file works on another OS. Also a good opportunity to talk about bootloaders a little bit.
- (I think they even understood the basics of segmentation in the end.)
The 8086 / 16-bit real-mode DOS is a great platform to explain a lot of the fundamentals without having to deal with OS semantics or executable file formats.
Now that was a lot of fun. π₯³ Itβs very rare that we do something like this, sadly. I love doing this kind of low-level stuff.
@sorenpeter@darch.dk No because as the spec statd originally, and we didnβt change that syntax at all:
Mentions are embedded within the text in either @ or @ format
So the lextwt parser we use will simply call this an invalid mention, which it does.
@<nick url timestamp>) and having location based treading this way, might not break older clients, since they might just igonore the last value within the brackets.
@sorenpeter@darch.dk Unfortunately it does break all clients, because the original spec stated:
Mentions are embedded within the text in either @ or @ format
Z for UTC +00:00- is that allowed in your specs?
Regarding url = I would suggest to only allow one and the maybe add url_old = or url_alt = !?
I'm still not a fan of a DM feature, even thou it helps that i have now been split out into a separate feed file. Instead if would suggest a contact = field for where people can put an email or other id/link for an established chat protocol like signal or matrix.
Why are we testing, or playing with, an alternate non-fully-compatible feed format within the same feed that we use daily?
Nobody writes emails by hand using RFC 5322 anymore, nor do we manually send them through telnet and SMTP commands. The days of crafting emails in raw format and dialing into servers are long gone. Modern email clients and services handle it all seamlessly in the background, making email easier than ever to send and receiveβwithout needing to understand the protocols or formats behind it! #Email #SMTP #RFC #Automation
@andros@twtxt.andros.dev Zulu said:
what is twtxt for you?
For me Twtxt is just the underlying format that Yarn.social builds upon π
yarnd UI/UX experience (for those that use it) and as "client" features (not spec changes). The two ideas are quite simple:
@kate@yarn.girlonthemoon.xyz (as I was trying to sayβ¦), Glad you think soπ My goal with Yarn.social has always been to provide the best (best that I can anyway) truly decentralised (slow) social experience that uses the Twtxt format under the hood π
Holy hell?! When I post this:
@<kate https://yarn.girlonthemoon.xyz/user/kat/twtxt.txt> Glad you think so! π My goal with Yarn.social has always been to provide the best (_best that I can anyway!_) truly decentralised (_slow_) social experience that uses the Twtxt format under the hood π
Something is swallowing it.
@kate@yarn.girlonthemoon.xyz Glad you think so! π My goal with Yarn.social has always been to provide the best (best that I can anyway!) truly decentralised (slow) social experience that uses the Twtxt format under the hood π
yarnd UI/UX experience (for those that use it) and as "client" features (not spec changes). The two ideas are quite simple:
Glad you think so! π My goal with Yarn.social has always been to provide the best (best that I can anyway!) truly decentralised (slow) social experience that uses the Twtxt format under the hood π
yarnd UI/UX experience (for those that use it) and as "client" features (not spec changes). The two ideas are quite simple:
@kate@yarn.girlonthemoon.xyz Glad you think so! π My goal with Yarn.social has always been to provide the best (best that I can anyway!) truly decentralised (slow) social experience that uses the Twtxt format under the hood π
yarnd UI/UX experience (for those that use it) and as "client" features (not spec changes). The two ideas are quite simple:
@kate@yarn.girlonthemoon.xyz Glad you think so! π My goal with Yarn.social has always been to provide the best (best that I can anyway!) truly decentralised (slow) social experience that uses the Twtxt format under the hood π
I asked ChatGPT what it knows about Twtxt π And surprisingly itβs rather accurate:
Twtxt is a minimalist, decentralized microblogging format introduced by John Downey in 2016. It uses plain text files served over HTTPβno accounts, databases, or APIs.
In 2020, James Mills (@prologic@twtxt.net) launched Yarn.social, an extended, federated implementation with user discovery, threads, mentions, and a full web UI.
Both share the same .twtxt.txt format but differ in complexity and social features.
@prologic@twtxt.net @bmallred@staystrong.run So is restic considered stable by now? βStableβ as in βstable data formatβ, like a future version will still be able to retrieve my current backups. I mean, itβs at version β0.18β, but they donβt specify which versioning scheme they use.
Registry format is its own thing. It takes the regular feed and appends nick \t uri \t to it. Its something that existed before yarn got big. There is still a bit of work but I will put together a ui for it to make it easier to view and navigate.
I need to import my yarn cache. Itβs sitting at about 1.5G in registry format. That should make things interestingβ¦
True. Though if the idea turns out to be better.. then community will adopt it.
if you look at the subject for that twt you will see that it uses the extended hash format to include a URL address.
Some satisfying icicle-breaking in our backyard: photos.falsifian.org/video/sM7G3vfS6yuc/VID_20250217_203250.mp4
I couldnβt resist taking home a prize:
Itβs been snowy here in #Toronto.
(I tried formatting the images in markdown for the benefit of yarn and any other clients that understand it.)
trying to keep it simple but.. perhaps it can be extended to fix timestamp formats like using " " instead of "T"
Great to see another user @aelaraji@aelaraji.com - And I can confirm that my #webmentions works from your server
(I know, the formatting is messed up;)@eapl.me@eapl.me here are my replies (somewhat similar to Lyseβs and Jamesβ)
Metadata in twts: Key=value is too complicated for non-hackers and hard to write by hand. So if there is a need then we should just use #NSFS or the alt-text file in markdown image syntax
if something is NSFWIDs besides datetime. When you edit a twt then you should preserve the datetime if location-based addressing should have any advantages over content-based addressing. If you change the timestamp the its a new post. Just like any other blog cms.
Caching, Yes all good ideas, but that is more a task for the clients not the serving of the twtxt.txt files.
Discovery: User-agent for discovery can become better. Iβm working on a wrapper script in PHP, so you donβt need to go to Apaches log-files to see who fetches your feed. But for other Gemini and gopher you need to relay on something else. That could be using my webmentions for twtxt suggestion, or simply defining an email metadata field for letting a person know you follow their feed. Interesting read about why WebMetions might be a bad idea. Twtxt being much simple that a full featured IndieWeb sites, then a lot of the concerns does not apply here. But thatβs the issue with any open inbox. This is hard to solve without some form of (centralized or community) spam moderation.
Support more protocols besides http/s. Yes why not, if we can make clients that merge or diffident between the same feed server by multiples URLs
Languages: If the need is big then make a separate feed. I donβt mind seeing stuff in other langues as it is low. You got translating tool if you need to know whats going on. And again when there is a need for easier switching between posting to several feeds, then itβs about building clients with a UI that makes it easy. No something that should takes up space in the format/protocol.
Emojis: Iβm not sure what this is about. Do you want to use emojis as avatar in CLI clients or it just about rendering emojis?
description header. Or rather, how often it re-fetches it.
So, @prologic@twtxt.net, Yarn isnβt rendering the metadata as described on the format documentation. That is, ux2028 is ignored when Yarn renders the description metadata.
@Codebuzz@www.codebuzz.nl I use Jenny to add to a local copy of my twtxt.txt file, and then manually push it to my web servers. I prefer timestamps to end with βZβ rather than β+00:00β so I modified Jenny to use that format. I mostly follow conversations using Jenny, but sometimes I check twtxt.net, which could catch twts I missed.
@prologic@twtxt.net Iβm not a yarnd user, so it doesnβt matter a whole lot to me, but FWIW Iβm not especially keen on changing how I format my twts to work around yarndβs quirks.
I wonder if this kind of postprocessing would fit better between composing (via yarndβs UI) and publishing. So, if a yarnd user types ΒΌ, it could get changed to ΒΌ in the twtxt.txt file for everyone to see, not just people reading through yarnd. But when I type ΒΌ, meaning first out of four, as a non-yarnd user, the meaning wouldnβt get corrupted. I can always type ΒΌ directly if thatβs what I really intend.
(This twt might be easier to understand if you read it without any transformations :-P)
Anyway, again, Iβm not a yarnd user, so do what you will, just know you might not be seeing exactly what I meant.
@prologic@twtxt.net One could argue twtxt.netβs display formatting is a little over-eager here.
@Codebuzz@www.codebuzz.nl Speed is an issue for the client software, not the format itself, but yes I agree that it makes the most sense to append post to the end of the file. Iβm referring to the definition that itβs the first url = in the file that is the one that has to be used for the twthash computation, which is a too arbitrary way of defining something that breaks treading time and time again. And this is the case for not using url+date+message = twthash.
why can we both have a format that you can write by hand and better clients?
Some more arguments for a local-based treading model over a content-based one:
The format:
(#<DATE URL>)or(@<DATE URL>)both makes sense: # as prefix is for a hashtag like we allredy got with the(#twthash)and @ as prefix denotes that this is mention of a specific post in a feed, and not just the feed in general. Using either can make implementation easier, since most clients already got this kind of filtering.Having something like
(#<DATE URL>)will also make mentions via webmetions for twtxt easier to implement, since there is no need for looking up the#twthash. This will also make it possible to make 3th part twt-mentions services.Supporting twt/webmentions will also increase discoverability as a way to know about both replies and feed mentions from feeds that you donβt follow.
I wrote some code to try out non-hash reply subjects formatted as (replyto ), while keeping the ability to use the existing hash style.
I donβt think we need to decide all at once. If clients add support for a new method then people can use it if they like. The downside of course is that this costs developer time, so I decided to invest a few hours of my own time into a proof of concept.
With apologies to @movq@www.uninformativ.de for corrupting jennyβs beautiful code. I donβt write this expecting you to incorporate the patch, because it does complicate things and might not be a direction you want to go in. But if you like any part of this approach feel free to use bits of it; I release the patch under jennyβs current LICENCE.
Supporting both kinds of reply in jenny was complicated because each email can only have one Message-Id, and because itβs possible the target twt will not be seen until after the twt referencing it. The following patch uses an sqlite database to keep track of known (url, timestamp) pairs, as well as a separate table of (url, timestamp) pairs that havenβt been seen yet but are wanted. When one of those βwantedβ twts is finally seen, the mail file gets rewritten to include the appropriate In-Reply-To header.
Patch based on jenny commit 73a5ea81.
https://www.falsifian.org/a/oDtr/patch0.txt
Not implemented:
- Composing twts using the (replyto β¦) format.
- Probably other important things Iβm forgetting.
i feel like we should isolate a subset of markdown that makes sense and built it into lextwt. it already has support for links and images. maybe basic formatting bold, italic. possibly block quote and bullet lists. no tables or footnotes
@prologic@twtxt.net Wikipedia claims sha1 is vulnerable to a βchosen-prefix attackβ, which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like
. If thatβs not possible now maybe it will be later.git only uses sha1 because theyβre stuck with it: migrating is very hard. There was an effort to move git to sha256 but I donβt know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.
I canβt imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.
Hey, @movq@www.uninformativ.de, a tiny thing to add to jenny, a -v switch. That way when you twtxt βThatβs an older format that was used before jenny version v23.04β, I can go and run jenny -v, and βduh!β myself on the way to a git pull. :-D
@movq@www.uninformativ.de alright, fair, and interesting. I was expecting them to be all the same (format wise), but it doesnβt matter, for sure, as it works just fine. Thanks!
@prologic@twtxt.net I have some ideas:
- Add smartypants rendering, just like Yarn has.
- Add the ability to create individual twtxts, each named after their hash.
- Fix the formatting of the help. :-P
I was not suggesting to that everyone need to setup a working webfinger endpoint, but that we take the format of nick+(sub)domain as base for generating the hashed together with the message date and content.
If we omit the protocol prefix from the way we do things now will that not solve most of the problems? In the case of gemini://gemini.ctrl-c.club/~nristen/twtxt.txt they also have a working twtxt.txt at https://ctrl-c.club/~nristen/twtxt.txt β¦ damn I just notice the gemini. subdomain.
Okay what about defining a prefers protocol as part of the hash schema? so 1: https , 2: http 3: gemini 4: gopher ?