Any idea Whatās this "twtxtfeevalidator/0.0.1" UA about? I thought I could ask before throwing a 1000GB file at it šŖ¤ could it be the same āxtā thing @lyse@lyse.isobeef.org was talking about the other day?
On my blog: Free Culture Book Club ā Trans Girl Project, part 1 https://john.colagioia.net/blog/2024/12/28/trans-girl-1.html #freeculture #bookclub
Yes it work: 2024-12-01T19:38:35Z twtxt/1.2.3 (+https://eapl.mx/twtxt.txt; @eapl) :D
The .log is just a simple append each request. The idea with the .cvs is to have it tally up how many request there have been from each client as a way to avoid having the log file grow too big. And that you can open the .cvs as a spreadsheet and have an easy overview and filtering options.
Access to those files are closed to the public.
@doesnm@doesnm.p.psf.lt up to you. I have mine to rotate at 1,000 twtxts. I have vomited over 400, so far. I have some way to go till rotation. :-D
@eapl.me@eapl.me here are my replies (somewhat similar to Lyseās and Jamesā)
Metadata in twts: Key=value is too complicated for non-hackers and hard to write by hand. So if there is a need then we should just use #NSFS or the alt-text file in markdown image syntax
if something is NSFWIDs besides datetime. When you edit a twt then you should preserve the datetime if location-based addressing should have any advantages over content-based addressing. If you change the timestamp the its a new post. Just like any other blog cms.
Caching, Yes all good ideas, but that is more a task for the clients not the serving of the twtxt.txt files.
Discovery: User-agent for discovery can become better. Iām working on a wrapper script in PHP, so you donāt need to go to Apaches log-files to see who fetches your feed. But for other Gemini and gopher you need to relay on something else. That could be using my webmentions for twtxt suggestion, or simply defining an email metadata field for letting a person know you follow their feed. Interesting read about why WebMetions might be a bad idea. Twtxt being much simple that a full featured IndieWeb sites, then a lot of the concerns does not apply here. But thatās the issue with any open inbox. This is hard to solve without some form of (centralized or community) spam moderation.
Support more protocols besides http/s. Yes why not, if we can make clients that merge or diffident between the same feed server by multiples URLs
Languages: If the need is big then make a separate feed. I donāt mind seeing stuff in other langues as it is low. You got translating tool if you need to know whats going on. And again when there is a need for easier switching between posting to several feeds, then itās about building clients with a UI that makes it easy. No something that should takes up space in the format/protocol.
Emojis: Iām not sure what this is about. Do you want to use emojis as avatar in CLI clients or it just about rendering emojis?
On my blog: Real Life in Star Trek, Timeās Arrow, part 1 https://john.colagioia.net/blog/2024/11/07/time-s-arrow-1.html #scifi #startrek #closereading
1/4 to mean "first out of four".
@bender@twtxt.net I try to avoid editing. I guess I would write 5/4, 6/4, etc, and hopefully my audience would be sympathetic to my failing.
Anyway, I donāt think my eccentric decision to number my twts in the style of other social media platforms is the only context where someone might write ¼ not meaning a quarter. E.g. January 4, to Americans.
Iām happy to keep overthinking this for as long as you are :-P
@bender@twtxt.net @prologic@twtxt.net Iām not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you wonāt mind if I continue to write things like 1/4 to mean āfirst out of fourā.
What has text/markdown got to do with this? I donāt think Markdown says anything about replacing 1/4 with ¼, or other similar transformations. Itās not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.
Whatās wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4 on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldnāt do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, thatās fine too and you donāt need to change anything. My 1/4 -> ¼ thing is nothing more than a minor irritation which probably isnāt worth overthinking.
@prologic@twtxt.net Iām not a yarnd user, so it doesnāt matter a whole lot to me, but FWIW Iām not especially keen on changing how I format my twts to work around yarndās quirks.
I wonder if this kind of postprocessing would fit better between composing (via yarndās UI) and publishing. So, if a yarnd user types ¼, it could get changed to ¼ in the twtxt.txt file for everyone to see, not just people reading through yarnd. But when I type ¼, meaning first out of four, as a non-yarnd user, the meaning wouldnāt get corrupted. I can always type ¼ directly if thatās what I really intend.
(This twt might be easier to understand if you read it without any transformations :-P)
Anyway, again, Iām not a yarnd user, so do what you will, just know you might not be seeing exactly what I meant.
@prologic@twtxt.net I wrote ¼ (one slash four) by which I meant āthe first out of fourā. twtxt.net is showing it as ¼, a single character that IMO doesnāt have that same meaning (it means 0.25). Similarly, ¾ got replaced with ¾ in another twt. Itās not a big deal. It just looks a little wrong, especially beside the 2/4 and 4/4 in my other two twts.
Simplified twtxt - I want to suggest some dogmas or commandments for twtxt, from where we can work our way back to how to implement different feature like replies/treads:
Itās a text file, so you must be able to write it by hand (ie. no app logic) and read by eye. If you edit a post you change the content not the timestamp. Otherwise it will be considered a new post.
The order of lines in a twtxt.txt must not hold any significant. The file is a container and each line an atomic piece of information. You should be able to run
sorton a twtxt.txt and it should still work.Transport protocol should not matter, as long as the file served is the same. Http and https are preferred, so it is suggested that feed served via Gopher or Gemini also provide http(s).
Do we need more commandments?
On my blog: Free Culture Book Club ā Restoration Day, part 1 https://john.colagioia.net/blog/2024/10/12/restoration-day-1.html #freeculture #bookclub
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. Iāll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
@prologic@twtxt.net Regarding the new way of generating twt-hashes, to me it makes more sense to use tabs as separator instead of spaces, since the you can just copy/past a line directly from a twtxt-file that already go a tab between timestamp and message. But tabs might be hard to ātypeā when you are in a terminal, since it will activate autocompleteā¦š¤
Another thing, it seems that you sugget we only use the domain in the hash-creation and not the full path to the twtxt.txt
$ echo -e "https://example.com 2024-09-29T13:30:00Z Hello World!" | sha256sum - | awk '{ print $1 }' | base64 | head -c 12
More thoughts about changes to twtxt (as if we havenāt had enough thoughts):
- There are lots of great ideas here! Is there a benefit to putting them all into one document? Seems to me this could more easily be a bunch of separate efforts that can progress at their own pace:
1a. Better and longer hashes.
1b. New possibly-controversial ideas like edit: and delete: and location-based references as an alternative to hashes.
1c. Best practices, e.g. Content-Type: text/plain; charset=utf-8
1d. Stuff already described at dev.twtxt.net that doesnāt need any changes.
We wonāt know what will and wonāt work until we try them. So Iām inclined to think of this as a bunch of draft ideas. Maybe later when weāve seen it play out it could make sense to define a group of recommended twtxt extensions and give them a name.
Another reason for 1 (above) is: I like the current situation where all you need to get started is these two short and simple documents:
https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html
https://twtxt.readthedocs.io/en/latest/user/discoverability.html
and everything else is an extension for anyone interested. (Deprecating non-UTC times seems reasonable to me, though.) Having a big long ātwtxt v2ā document seems less inviting to people looking for something simple. (@prologic@twtxt.net you mentioned an anonymous comment āyouāve ruined twtxtā and while I donāt completely agree with that commenterās sentiment, I would feel like twtxt had lost something if it moved away from having a super-simple core.)All that being said, these are just my opinions, and Iām not doing the work of writing software or drafting proposals. Maybe I will at some point, but until then, if youāre actually implementing things, youāre in charge of what you decide to make, and Iām grateful for the work.
Some more arguments for a local-based treading model over a content-based one:
The format:
(#<DATE URL>)or(@<DATE URL>)both makes sense: # as prefix is for a hashtag like we allredy got with the(#twthash)and @ as prefix denotes that this is mention of a specific post in a feed, and not just the feed in general. Using either can make implementation easier, since most clients already got this kind of filtering.Having something like
(#<DATE URL>)will also make mentions via webmetions for twtxt easier to implement, since there is no need for looking up the#twthash. This will also make it possible to make 3th part twt-mentions services.Supporting twt/webmentions will also increase discoverability as a way to know about both replies and feed mentions from feeds that you donāt follow.
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with ā(#abc1234) Edit: ā¦ā and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether itās sha256 or whatever but thereās no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that Iām not the one implementing this stuff, and itās less work to just have everything determined up front.
Misc comments (I havenāt read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. Iād suggest gaining 11 bits with base64 instead.
āClients MUST preserve the original hashā ā do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I donāt like the MUST in āClients MUST follow the chain of reply-to referencesā¦ā. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldnāt declare the client non-conforming just because they didnāt get to all the bells and whistles.
Similarly I donāt like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (Iām again thinking again of the 40-line shell script).
For āwho followsā lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why canāt feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasnāt too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
Iām a little sad about other protocols being not recommended.
I donāt know how I feel about including markdown. I donāt mind too much that yarn users emit twts full of markdown, but Iām more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
Thereās a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isnāt divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).
So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash functionās output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.
Other than the divisible-by-5 thing, my current intuition is it doesnāt matter what part you take.
Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.
Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.
Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.
Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.
@prologic@twtxt.net I saw those, yes. I tried using yarnc, and it would work for a simple twtxt. Now, for a more convoluted one it truly becomes a nightmare using that tool for the job. I know there are talks about changing this hash, so this might be a moot point right now, but it would be nice to have a tool that:
- Would calculate the hash of a twtxt in a file.
- Would calculate all hashes on a
twtxt.txt(local and remote).
Again, something lovely to have after any looming changes occur.
@movq@www.uninformativ.de I didnāt run the command as you recommended, but, I wiped things once more, and ran jenny -f, and this time got:
david@arrakis:~$ jenny -f
Fetching archived feed https://anthony.buc.ci/user/abucci/twtxt.txt/1 (configured as abucci, https://anthony.buc.ci/user/abucci/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2024-04.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://darch.dk/twtxt-archive.txt (configured as soren, https://darch.dk/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2024-04-21_6v47cua.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/1 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2024-03.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2022-12-21_2us6qbq.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/2 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2024-02.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2022-01-14_ew5gzca.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/3 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2024-01.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-12-23_f6y65bq.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/4 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-12.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-12-04_e4x7yba.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/5 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-11.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-11-18_42tjxba.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://twtxt.net/user/prologic/twtxt.txt/6 (configured as prologic, https://twtxt.net/user/prologic/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-10.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-11-08_i2wnvaa.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-09.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-10-23_kvwn5oa.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-08.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-10-11_mljudaa.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-07.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-09-22_5mkqwua.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-06.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-07-27_xcnzmlq.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-05.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-06-16_mtedqya.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-04.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-04-29_z7lvzja.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-03.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-03-19_xjabvhq.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-02.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-02-24_te4a6oa.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2023-01.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2021-01-26_qxgigma.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-12.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://www.uninformativ.de/twtxt-old_2020-12-13_igfnala.txt (configured as movq, https://www.uninformativ.de/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-11.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-10.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-09.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-08.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-07.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-06.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-05.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-04.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-03.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-02.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2022-01.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-12.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-11.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-10.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-09.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-08.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-07.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-06.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-05.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-04.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-03.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-02.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2021-01.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Fetching archived feed https://lyse.isobeef.org/twtxt-2020-12.txt (configured as lyse, https://lyse.isobeef.org/twtxt.txt)
Notice that @prologic@twtxt.netās /6 is there. I found the twtxt then. Kind of odd it didnāt show before.
More:
Subject: The [tag URI scheme](https://en.wikipedia.org/wiki/Tag_URI_scheme) looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be
somewhat trivial to parse. But there are still the issue with what the name/id should be... Maybe it doesn't have to bee that stick? Instead of using `tag:` as the prefix/protocol, it would more it clear
what we are talking about by using `in-reply-to:` (https://indieweb.org/in-reply-to) or `replyto:` similar to `mailto:` 1. `(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)' 2.
`(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)' 2. `(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)' I know it's longer that 7-11 characters, but it's self-explaining when looking at the
twtxt.txt in the raw, and the cases above can all be caught with this regex: `\([\w-]*reply[\w-]*\:` Is this something that would work?
Subject: The [tag URI scheme](https://en.wikipedia.org/wiki/Tag_URI_scheme) looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be
somewhat trivial to parse. But there are still the issue with what the name/id should be... Maybe it doesn't have to bee that stick? Instead of using `tag:` as the prefix/protocol, it would more it clear
what we are talking about by using `in-reply-to:` (https://indieweb.org/in-reply-to) or `replyto:` similar to `mailto:` 1. `(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)` 2.
`(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)` 3. `(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)` I know it's longer that 7-11 characters, but it's self-explaining when looking at the
twtxt.txt in the raw, and the cases above can all be caught with this regex: `\([\w-]*reply[\w-]*\:` Is this something that would work?
Notice the difference? Soren edited, and broke everything.
@mckinley@twtxt.net Thanks for the feedback.
- Yeah I agrees that nick sound not be part of syntax. Any valid URL to a twtxt.txt-file should be enough and is more clear, so it is not confused with a email (one of the the issues with webfinger and fedivese handles)
- I think any valid URL would work, since we are not bound to look for exact matches. Accepting both http and https as well as a gemni and gophe could all work as long as the path to the twtxt.txt is the same.
- My idea is that you quote the timestamp as it is in the original twtxt.txt that you are referring to, so you can do it by simply copy/pasting. Also what are the change that the same human will make two different posts within the same second?!
Regarding the whole cryptographic keys for identity, to me it seems like an unnecessary layer of complexity. If you move to a new house or city you tell people that you moved - you can do the same in a twtxt.txt. Just post something like āI move to this new URL, please follow me there!ā I did that with my feeds at least twice, and you guys still seem to read my posts:)
The tag URI scheme looks interesting. I like that it human read- and writable. And since we already got the timestamp in the twtxt.txt it would be somewhat trivial to parse. But there are still the issue with what the name/id should be⦠Maybe it doesnāt have to bee that stick?
Instead of using tag: as the prefix/protocol, it would more it clear what we are talking about by using in-reply-to: (https://indieweb.org/in-reply-to) or replyto: similar to mailto:
(reply:sorenpeter@darch.dk,2024-09-15T12:06:27Z)
(in-reply-to:darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
(replyto:http://darch.dk/twtxt.txt,2024-09-15T12:06:27Z)
I know itās longer that 7-11 characters, but itās self-explaining when looking at the twtxt.txt in the raw, and the cases above can all be caught with this regex: \([\w-]*reply[\w-]*\:
Is this something that would work?
I was not suggesting to that everyone need to setup a working webfinger endpoint, but that we take the format of nick+(sub)domain as base for generating the hashed together with the message date and content.
If we omit the protocol prefix from the way we do things now will that not solve most of the problems? In the case of gemini://gemini.ctrl-c.club/~nristen/twtxt.txt they also have a working twtxt.txt at https://ctrl-c.club/~nristen/twtxt.txt ⦠damn I just notice the gemini. subdomain.
Okay what about defining a prefers protocol as part of the hash schema? so 1: https , 2: http 3: gemini 4: gopher ?
@prologic@twtxt.net Some criticisms and a possible alternative direction:
Key rotation. Iām not a security person, but my understanding is that itās good to be able to give keys an expiry date and replace them with new ones periodically.
It makes maintaining a feed more complicated. Now instead of just needing to put a file on a web server (and scan the logs for user agents) I also need to do this. What brought me to twtxt was its radical simplicity.
Instead, maybe we should think about a way to allow old urls to be rotated out? Like, my metadata could somehow say that X used to be my primary URL, but going forward from date D onward my primary url is Y. (Or, if you really want to use public key cryptography, maybe something similar could be used for key rotation there.)
Itās nice that your scheme would add a way to verify the twts you download, but https is supposed to do that anyway. If you donāt trust https to do that (maybe you donāt like relying on root CAs?) then maybe your preferred solution should be reflected by your primary feed url. E.g. if you prefer the security offered by IPFS, then maybe an IPNS url would do the trick. The fact that feed locations are URLs gives some flexibility. (But then rotation is still an issue, if I understand ipns right.)
On my blog: Free Culture Book Club ā Aumyr, part 1 https://john.colagioia.net/blog/2024/09/07/aumyr-1.html #freeculture #bookclub
yarnd that's been around for awhile and is still present in the current version I'm running that lets a person hit a constructed URL like
@prologic@twtxt.net Hereās a log entry:
Aug 27 15:59:43 buc yarnd[1200580]: [yarnd] 2024/08/27 15:59:43 (IP_REDACTED) "GET /external?nick=lovetocode999&uri=https://URL_REDACTED HTTP/1.1" 200 35442 14.554763ms
HTTP 200 status, not 404.
yarnd that's been around for awhile and is still present in the current version I'm running that lets a person hit a constructed URL like
@prologic@twtxt.net This does not seem to fix the problem for me, or Iāve done something wrong. I did the following:
- Pull the latest version from
git(I have commit7ad848, same as ontwtxt.netI believe).
make buildandmake install
- Restart
yarnd
- Refresh cache in Poderator Settings
Yet I still see these bogus /external things on my pod when I hit URLs like the one I sent you recently. When I hit such a URL with curl I think itās giving an error? But in a web browser, the (buggy) response is the same as it was before I updated.
So, this problem is not fixed for me.
HTTP/2 differs from 1.x by becoming a binary protocol, it also multiplexes multiple channels over the same connection and has the ability to prefetch related content to the browser to lower the perceived latency.
HTTP/3 moves the binary protocol from HTTP/2 over to QUIC which is based on UDP instead of TCP. This makes it better suited to mobile or unstable networks where handling of transmission errors can be handled at a higher level.
@prologic@twtxt.net The headline is interesting and sent me down a rabbit hole understanding what the paper (https://aclanthology.org/2024.acl-long.279/) actually says.
The result is interesting, but the Neuroscience News headline greatly overstates it. If Iāve understood right, they are arguing (with strong evidence) that the simple technique of making neural nets bigger and bigger isnāt quite as magically effective as people say ā if you use it on its own. In particular, they evaluate LLMs without two common enhancements, in-context learning and instruction tuning. Both of those involve using a small number of examples of the particular task to improve the modelās performance, and they turn them off because they are not part of what is called āemergenceā: āan ability to solve a task which is absent in smaller models, but present in LLMsā.
They show that these restricted LLMs only outperform smaller models (i.e demonstrate emergence) on certain tasks, and then (end of Section 4.1) discuss the nature of those few tasks that showed emergence.
Iād love to hear more from someone more familiar with this stuff. (Iāve done research that touches on ML, but neural nets and especially LLMs arenāt my area at all.) In particular, how compelling is this finding that zero-shot learning (i.e. without in-context learning or instruction tuning) remains hard as model size grows.
@prologic@twtxt.net +1 for FrankenPHP. And built into caddy is also swell.
@stigatle@yarn.stigatle.no @prologic@twtxt.net testing 1 2 3 can either of you see this?
Iām seeing GETs like this over and over again:
"GET /external?nick=lovetocode999&uri=https://vuf.minagricultura.gov.co/Lists/Informacin%20Servicios%20Web/DispForm.aspx?ID=8375144 HTTP/1.1" 200 35861 17.077914ms
always to nick=lovetocode999, but with different uris. What are these calls?
@prologic@twtxt.net There are a lot of logs being generated by yarnd, which is something I havenāt seen before too:
Jul 25 14:32:42 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:42 (162.211.155.2) "GET /twt/ubhq33a HTTP/1.1" 404 29 643.251µs
Jul 25 14:32:43 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:43 (162.211.155.2) "GET /twt/112073211746755451 HTTP/1.1" 400 12 505.333µs
Jul 25 14:32:44 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:44 (111.119.213.103) "GET /twt/whau6pa HTTP/1.1" 200 37360 35.173255ms
Jul 25 14:32:44 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:44 (162.211.155.2) "GET /twt/112343305123858004 HTTP/1.1" 400 12 455.069µs
Jul 25 14:32:44 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:44 (168.199.225.19) "GET /external?nick=lovetocode999&uri=http%3A%2F%2Fwww.palapa.pl%2Fbaners.php%3Flink%3Dhttps%3A%2F%2Fwww.dwnewstoday.com HTTP/1.1" 200 36167 19.582077ms
Jul 25 14:32:44 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:44 (162.211.155.2) "GET /twt/112503061785024494 HTTP/1.1" 400 12 619.152µs
Jul 25 14:32:46 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:46 (162.211.155.2) "GET /twt/111863876118553837 HTTP/1.1" 400 12 817.678µs
Jul 25 14:32:46 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:46 (162.211.155.2) "GET /twt/112749994821704400 HTTP/1.1" 400 12 540.616µs
Jul 25 14:32:47 buc yarnd[1911318]: [yarnd] 2024/07/25 14:32:47 (103.204.109.150) "GET /external?nick=lovetocode999&uri=http%3A%2F%2Fampurify.com%2Fbbs%2Fboard.php%3Fbo_table%3Dfree%26wr_id%3D113858 HTTP/1.1" 200 36187 15.95329ms
Iāve seen that nick=lovetocode999 a bunch.
watch -n 60 rm -rf /tmp/yarn-avatar-* in a tmux because all of a sudden, without warning, yarnd started throwing hundreds of gigabytes of files with names like yarn-avatar-62582554 into /tmp, which filled up the entire disk and started crashing other services.
@prologic@twtxt.net Iām still getting this crap:
abucci@buc:~/yarnd/yarn$ ls -lh /tmp/yarnd-avatar-*
-rw------- 1 abucci abucci 863M Jul 25 14:19 /tmp/yarnd-avatar-1594499680
-rw------- 1 abucci abucci 7.8G Jul 25 14:19 /tmp/yarnd-avatar-2144295337
-rw------- 1 abucci abucci 9.8G Jul 25 14:19 /tmp/yarnd-avatar-2334738193
-rw------- 1 abucci abucci 10G Jul 25 14:14 /tmp/yarnd-avatar-2494107777
-rw------- 1 abucci abucci 9.5G Jul 25 13:59 /tmp/yarnd-avatar-2619243454
-rw------- 1 abucci abucci 11G Jul 25 14:04 /tmp/yarnd-avatar-2922187513
-rw------- 1 abucci abucci 7.5G Jul 25 14:14 /tmp/yarnd-avatar-349775570
-rw------- 1 abucci abucci 10G Jul 25 14:09 /tmp/yarnd-avatar-3640724243
-rw------- 1 abucci abucci 901M Jul 25 14:19 /tmp/yarnd-avatar-3921595598
-rw------- 1 abucci abucci 9.5G Jul 25 13:59 /tmp/yarnd-avatar-609094539
-rw------- 1 abucci abucci 9.3G Jul 25 14:04 /tmp/yarnd-avatar-755173392
-rw------- 1 abucci abucci 7.9G Jul 25 14:09 /tmp/yarnd-avatar-984061000
Something like 100 Gbytes of this junk has accumulated since I updated and re-started the server. Iām now running the latest version of yarnd, so the update did not fix the problem. Something else is going wrong.
How are temporary files growing to 10 Gbytes in size? The name of the file is āyarn-avatarā, but why would avatars be so large?
watch -n 60 rm -rf /tmp/yarn-avatar-* in a tmux because all of a sudden, without warning, yarnd started throwing hundreds of gigabytes of files with names like yarn-avatar-62582554 into /tmp, which filled up the entire disk and started crashing other services.
@prologic@twtxt.net Alright, running yarnd 0.15.1 now. I stopped my hack so weāll see if the VPS gets clogged with junk š
watch -n 60 rm -rf /tmp/yarn-avatar-* in a tmux because all of a sudden, without warning, yarnd started throwing hundreds of gigabytes of files with names like yarn-avatar-62582554 into /tmp, which filled up the entire disk and started crashing other services.
abucci@buc:~/yarnd/yarn$ make preflight
Checking Go version ... [ ERR ]
Go 1.16+ is required, found go1.22.5
FATAL: š preflight failed
make: *** [Makefile:33: preflight] Error 1
š¤
watch -n 60 rm -rf /tmp/yarn-avatar-* in a tmux because all of a sudden, without warning, yarnd started throwing hundreds of gigabytes of files with names like yarn-avatar-62582554 into /tmp, which filled up the entire disk and started crashing other services.
@prologic@twtxt.net 0.15.1, looks like.
There are also a bunch of log messages scrolling by. Iāve never seen this much activity in the log:
Jul 25 01:37:39 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:39 (149.71.56.69) "GET /external?nick=lovetocode999&uri=https://pagez.co.uk/services/your-own-100-fully-owned-online-vi>
Jul 25 01:37:39 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:39 (162.211.155.2) "GET /twt/112135496802692324 HTTP/1.1" 400 12 826.65µs
Jul 25 01:37:40 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:40 (51.222.253.14) "GET /conv/muttriq HTTP/1.1" 200 36881 20.448309ms
Jul 25 01:37:40 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:40 (162.211.155.2) "GET /twt/112730114943543514 HTTP/1.1" 400 12 663.493µs
Jul 25 01:37:40 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:40 (27.75.213.253) "GET /external?nick=lovetocode999&uri=http%3A%2F%2Falfarah.jo%2FHome%2FChangeCulture%3FlangCode%3Den>
Jul 25 01:37:40 buc.ci yarnd[829]: time="2024-07-25T01:37:40Z" level=error msg="http://bynet.com.br/log_envio.asp?cod=335&email=%21%2AEMAIL%2A%21&url=https%3A%2F%2Fwww.almanacar.c>
Jul 25 01:37:40 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:40 (162.211.155.2) "GET /twt/111674756400660911 HTTP/1.1" 400 12 545.106µs
Jul 25 01:37:40 buc.ci yarnd[829]: time="2024-07-25T01:37:40Z" level=warning msg="feed FetchFeedRequest: @<lovetocode999 http://alfarah.jo/Home/ChangeCulture?langCode=en&returnUrl>
Jul 25 01:37:41 buc.ci yarnd[829]: [yarnd] 2024/07/25 01:37:41 (162.211.155.2) "GET /twt/112507964696096567 HTTP/1.1" 400 12 838.946µs
Something really weird is going on?
I deleted them all right before I sent my previous message, and already, a few minutes later, there are two more:
abucci@buc:~$ du -sh /tmp/yarnd-avatar-3*
1.8G /tmp/yarnd-avatar-3122347915
2.4G /tmp/yarnd-avatar-3533381443
What is this?
On my blog: Free Culture Book Club ā Aether Age Codex - Helios, part 1 https://john.colagioia.net/blog/2024/07/20/helios-1.html #freeculture #bookclub
On my blog: Real Life in Star Trek, Unification Part 1 https://john.colagioia.net/blog/2024/06/27/unification-part-1.html #scifi #startrek #closereading
@prologic@twtxt.net Firefox 126.0.1 is my primary
@prologic@twtxt.net I was wondering if my reverse proxy could cause something but itās pretty standardā¦
server {
listen 80; server_name we.loveprivacy.club;
location / {
return 301 https://$host$request_uri;
<a href="https://we.loveprivacy.club/search?q=%23proxy_pass">#proxy_pass</a> http://127.0.0.1:8000;
}
}
server {
listen 443 ssl http2;
server_name we.loveprivacy.club;
ssl_certificate /etc/letsencrypt/live/we.loveprivacy.club/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/we.loveprivacy.club/privkey.pem;
client_max_body_size 8M;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
Hmmā¦
Jun 19 23:31:38 yarn_init.sh[61567]: [yarnd] 2024/06/19 23:31:38 (127.0.0.1:40254) āPOST /post HTTP/
1.0ā 200 0 3.402208ms
[ā¦]Jun 19 23:31:39 yarn_init.sh[61567]: [yarnd] 2024/06/19 23:31:39 (127.0.0.1:40262) āGET /post HTTP/1.0ā 404 729 123.474001ms
Unfortunately not on that front. Still the same 404 posting errors and oddly occasional login errors.
Thatās why I was wondering if using Go 1.22.4 could be an issue. I donāt know how exactly. Only way to test is to rebuild it with an older version I guess, which is why I did the make clean in the first place. Old habits die hard lol.
@prologic@twtxt.net Righteo, so rookie error - I obviously had some untracked, rather important files for starting my pod and I ran a make clean. Why I originally had them in the git directory is anyoneās guess. Anyway it blew away those files including the database so thatās that. So your good self and @bender@twtxt.net etc - apologies but your profiles got nuked as well (as did my own but easily recreated).
Another thing I noticed which was the reason I ran make clean in the first place. I noticed my pod was being built with Go 1.22.4. Could this be a problem @prologic? preflight.sh actually errors out about itā¦
On my blog: Free Culture Book Club ā Nevada, part 1 https://john.colagioia.net/blog/2024/06/08/nevada-1.html #freeculture #bookclub
I think it is a good addition. Similar to how the Fraidycat RSS reader works. Fraidyc.at also support twtxt, but have not seen any updates since 2021ā¦