@movq@www.uninformativ.de OK, to be more specific: it does to the point of adding twts to the correct file.
Iāve not checked actual file rotation. With max_twts_per_rotation set to 100 and me posting ~ once a week first roatation will take place in two years ;-)
I feel like
READMEwill need a rework soon. Thereās a lot of options now. Or maybe a manpage instead.
For example that local_twtxt_dir MUST end in path separator should be mentioned somewhere ;-)
@movq@www.uninformativ.de Works like a charm!
@movq@www.uninformativ.de Great work! I wish we could make all those BIG twtxt writers to use it ;-)
Iāve a problem with local_twtxt_file not beeing supported any more. Being forced to use twtxt.txt as file name breaks at least my URL.
@movq@www.uninformativ.de I always understood it as good practice to early catch hardware errors.
@movq@www.uninformativ.de Indeed! Iām sorry for that!
@movq@www.uninformativ.de Manpage says
The user is supposed to run it manually or via a periodic system
service. The recommended period is a month but could be less.
So me doing it weekly is a bit over cautious. Itās often overseen by users that they are supposed to perform this task regularly.
Not that easy to decide when coming home from work: which site do I visit first?
@movq@www.uninformativ.de Donāt forget to btrfs scrub e.g. once a week.
Iām using btrfs scrub -B /dev/xyz and mail the result to myself.
@will@twtxt.net At work we are using KeePass with Multi Cert KeyProvider Plugin.
https://www.creative-webdesign.de/en/software/keepass-plugins/multi-cert-keyprovider
We leave master password empty. Each person needs an own certificate to access the database file.
Not using a master password makes it easy to add or remove people with access w/o changing (and sharing) a master password.
@prologic@twtxt.net Very nice board and figures. Do they actually fit in the drawer?
@movq@www.uninformativ.de Thank you very much for implementing this! Itās very useful (at least for me)!
@adi@f.adi.onl What about this one?
SRCFILES = $(wildcard *)
# remove existing *.gz (actually doubles entries)
CLEANSRC = $(SRCFILES:.gz=)
DSTFILES = $(addsuffix .gz, $(CLEANSRC))
%.gz: %
gzip -c $< > $<.gz
all: $(DSTFILES)
You must not have subdirectories in that folder, though.
@xuu@txt.sour.is Well, the point is, things do not work like this.
Actually in nano you would have to ctrl-k ctrl-k ctrl-x y to discard your reply.
@movq@www.uninformativ.de I donāt by your example (rebasing behaviour), sorry.
Writing a twt is more similiar to writing a commit message. Git does quite some checks to detect that nothing new was written and happily discards a commit if you just leave the editor. You donāt need any special action, just quit your editor. Git will take care for the rest.
But itās OK as it is. I just didnāt expect that I have to select and delete all to discard a twt. So itās C-x h C-w C-x C-c for me.
@movq@www.uninformativ.de Yes, this may be enough to check.
I only know this āfeatureā from my revision control software where I get āabort: empty Commit messageā or āAborting commit due to empty commit messageā when I do not change whatever is already in there. Can be quite some text about which files changed and so on.
@movq@www.uninformativ.de My workflow is as follows.
I hit āreplyā hotkey and my editor comes up.
With or without writing something I close my editor without saving the content.
Of course I close it by C-x C-c, not by :q! ;-)
Jenny finds the temp file unchanged, e.g. itās content is the same as it was when my editor was started. I would like that jenny discards the reply then.
Autosaving is no problem either. Real editors do this to a temporary (kind of backup) file. Only in case of a crash that file is consulted and the user is asked if she would like to continue with that stored content.
jenny -f. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let's see if this helps finding a (possible) bug.
@movq@www.uninformativ.de Your scenario would produce observed behaviour, agreed. On the other side Iām sure Iāve set very URL in lasttwt > 1630000000.0 (manually, in my editor).
But I canāt reproduce any weird behaviour right now. Iāve tried to āblackholeā twt.nfld.uk temporarily. That does not have any effect.
Iāve also tried to force twt.nfld.uk to deliver an empty twtxt. That does not have any effect either.
So I guess everything is fine with jenny.
I have wrapped jenny into some shell script to versionize ~/.cache/jenney. This way I have better data if anything unexprected is showing again.
@prologic@twtxt.net Iāve deleted eleven and utf8test, https://search.twtxt.net is the only follower. Maybe you can stop it to follow those twtxts? They were meant for testing purposes only.
Funny bug in LG TV: last Saturday I scheduled some film from yesterday for recording. Actual recording yesterday started 1 hour late. Looks like although TV knows actual time perfectly well it was not capable to ātranslateā schedule from CEST to CET.
jenny -f. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let's see if this helps finding a (possible) bug.
@movq@www.uninformativ.de Yes, it was exactly those twts. I donāt think Iāve managed to āmatchā the downtime while fetching twts. But even if I had, how can this lead to inserting old twts?
@movq@www.uninformativ.de Another feature request: sometimes I start writing a twt but then would like to discard it. It would be great if jeny could detect that I did not wrote (or saved) anything and then discards the twt instead of creating an āemptyā one.
@movq@www.uninformativ.de Today I had unexpected old twts after jenny -f. Have now jennys cache under revision control, automatically commiting changes after each fetch. Letās see if this helps finding a (possible) bug.
jenny has never failed me š. It is so neat, powerful, and streamlined, not even funny! Thank you very much, @movq for it! š
I want to second that!
@movq@www.uninformativ.de What do you think about this?
diff āgit a/jenny b/jenny
index b47c78e..20cf659 100755
ā a/jenny
+++ b/jenny
@@ -278,7 +278,8 @@ def prefill_for(email, reply_to_this, self_mentions):
def process_feed(config, nick, url, content, lasttwt):
nick_address, nick_desc = decide_nick(content, nick)
url_for_hash = decide_url_for_hash(content, url)
new_lasttwt = parse(ā1800-01-01T12:00:00+00:00ā).timestamp()
# new_lasttwt = parse(ā1800-01-01T12:00:00+00:00ā).timestamp()
new_lasttwt = None
for line in twt_lines_from_content(content):
res = twt_line_to_mail(@@ -296,7 +297,7 @@ def process_feed(config, nick, url, content, lasttwt):
twt_stamp = twt_date.timestamp() if lasttwt is not None and lasttwt >= twt_stamp: continueif twt_stamp > new_lasttwt:
if not new_lasttwt or twt_stamp > new_lasttwt:
new_lasttwt = twt_stamp mailname_new = join(config['maildir_target'], 'new', twt_hash)
@movq@www.uninformativ.de I just observed unexpected old twts coming back.
It looks like lasttwts is reset to -5364619200.0 every time no new content wasfetched for example if if-modified-since did not produce new twts?
@lyse@lyse.isobeef.org Iām seeing your response as reply to #p522joq, where it doesnāt seem to belong to. Did this happen by accident or is there a bug hiding somewhere?
@prologic@twtxt.net Iām seeing your response as reply to #p522joq, where it doesnāt seem to belong to. Did this happen by accident or is there a bug hiding somewhere?
@movq@www.uninformativ.de Ha, but when you control lastmods, lastseen and lasttwts itās easy to test.
Works like a charm!
@movq@www.uninformativ.de Not that easy to test when pods honor if-modified-since ;-)
Iāve almost only timestamps -5364619200.0ā¦
Diff looks good to me!
@movq@www.uninformativ.de
Iāll test it tomorrow. Thankās for starting this feature!
F in their name.
() I believe
glob ()is anO(n)algorithm
Yes, I see. But donāt underestimate OS caching for files and directories!
If you look up files in the same directory many times then OS may use cached results from earlier lookups.
Iām not totally sure but I believe this is how things work for both, Windows and Linux at least.
@movq@www.uninformativ.de
When I look in my twtxt maildir for duplicated messages they all have F in their name.
I see that in mail_file_exists jenny does not consider flagged messages when testing if a message already exists.
I understand that looking up only 12 combinations is faster than reading huge directories. Iām astonished that globbing would be slower. Learning something new every dayā¦
@movq@www.uninformativ.de
I just pulled it, works like a charm (as expected) ;-)
@movq@www.uninformativ.de
Iām not a Python programmer, so please bear with me.
The doc about encodings does also mention:
If you require a different encoding, you can manually set the Response.encoding property
Wouldnāt that be a one liner like (Ruby example)?
'some text'.force_encoding('utf-8')
I understand that you do not want to interfere with requests. On the other hand we know that received data must be utf-8 (by twtxt spec) and it does burden āpublishersā to somehow add charset property to content-type header. But again Iām not sure what āthe right thing to doā ⢠is.
@prologic@twtxt.net @movq@www.uninformativ.de
Exactly, you see correct UTF-8 encoded version (even with content-type: text/plain leaving out charset declaration).
After following utf8test twtxt myself I now see that jenny does not handle it as UTF-8 when charset is missing from HTTP header, just like @quark@ferengi.one has observed.
So should jenny treat twtxt files always as UTF-8 encoded? Iām not sure about this.
@lyse@lyse.isobeef.org
Sorry, I should have mentioned your twt #vjjdara where you already described the same idea.
@movq@www.uninformativ.de
Applause!
I believe Yarn assumes utf-8 anyway which is why we donāt see encoding issues
Are you sure? I think in #kj2c5oa @quark@ferengi.one mentioned exactly that problem. My logs say ājenny/latestā was fetching my twtxt for quark.
All I did to fix this was to adding AddCharset utf-8 .txt to .htaccess. Especially I did not change encoding of stackeffect.txt.
Donāt miss step 0 (I should have made this a separate point): having a meta header promising appending twts with strictly monotonically increasing timestamps.
(Also, Iād first like to see the pagination thingy implemented.)
In jenny I would like to see ādonāt process previously fetched twtsā AKA āAllow the user to archive/delete old twtsā feature implemented ;-)
What about a meta header for setting charset?
I myself stumbled upon .txt files not being delivered with charset: utf-8 by default.
I had to set/modify .htaccess to correct that.
It would have been easier if there had been a charset header entry āoverwritingā what http server is delivering.
What do you think?
My thoughts about range requests
Additionally to pagination also range request should be used to reduce traffic.
I understand that there are corner cases making this a complicated matter.
I would like to see a meta header saying that the given twtxt is append only with increasing timestamps so that a simple strategy can detect valid content fetched per range request.
- read meta part per range request
- read last fetched twt at expected range (as known from last fetch)
- if fetched content starts with expected twt then process rest of data
- if fetched content doesnāt start with expected twt discard all and fall back to fetching whole twtxt
Pagination (e.g. archiving old content in a different file) will lead to point 4.
Of course especially pods should support range requests, correct @prologic@twtxt.net?
My thoughts about pagination (paging)
Following the discussion about pagination (paging) I think thatās the right thing to do.
Fetching the same content again and again with only a marginal portion of actually new twts is unbearable and does not scale in any way. Itās not only a waste of bandwidth but with increasing number of fetchers it will also become a problem for pods to serve all requests.
Because itās so easy to implement and simple to understand, splitting twtxt file in parts with next and prev pointers seems a really amazing solution.
As in RFC5005 there should also be a meta header pointing to the main URL, e.g. current or baseurl or something like that. This way hashes can calculated correctly even for archived twts.
D~d>1m and then fetched by !jenny -f. This brings back all deleted twts. Isn't lastmods used to skip older twts?
Iām curious, what is your use case for deleting twts?
Not just deleting, also sorting into other folders is impossible.
It also doesnāt scale in the long term. When I cannot delete twts then I have a full copy of every twtxt I follow - forever. Thatās a waste of bandwidth and disk space.
@movq@www.uninformativ.de How is deletion supposed to work? In mutt I deleted by D~d>1m and then fetched by !jenny -f. This brings back all deleted twts. Isnāt lastmods used to skip older twts?
No, it would be sufficient to skip avatar discovery when metadata does contain an avatar.
@prologic@twtxt.net
Thank you, thatās the correct one.
Still I have this in my logs (first access of āelevenā by yarnd):
ip.ip.ip.ip - - [21/Oct/2021:20:05:36 +0000] āGET /eleven.txt HTTP/2.0ā 200 344 ā-ā āyarnd/0.2.0@46bea3f (Pod: twtxt.net Support: https://twtxt.net/support)ā
ip.ip.ip.ip - - [21/Oct/2021:20:05:36 +0000] āHEAD /avatar.png HTTP/2.0ā 200 0 ā-ā āyarnd/0.2.0@46bea3f (Pod: twtxt.net Support: https://twtxt.net/support)ā
And I guess without avatar.png sitting there I would have seen even more requests like /eleven.txt/avatar.png.
Iāve copied stackeffect.png to avatar.png to make yarnd happy when accessing stackeffect.txt.
So in this setup yarnd fetched eleven.txt along with avatar.png which belongs to another twtxt. This feels buggy.