Searching We.Love.Privacy.Club

Twts matching #twtxt.
Sort by: Newest, Oldest, Most Relevant
In-reply-to » @movq Today I had unexpected old twts after jenny -f. Have now jennys cache under revision control, automatically commiting changes after each fetch. Let's see if this helps finding a (possible) bug.

@movq@www.uninformativ.de Yes, it was exactly those twts. I don’t think I’ve managed to “match” the downtime while fetching twts. But even if I had, how can this lead to inserting old twts?

⤋ Read More

@movq@www.uninformativ.de What do you think about this?

diff –git a/jenny b/jenny
index b47c78e..20cf659 100755
— a/jenny
+++ b/jenny
@@ -278,7 +278,8 @@ def prefill_for(email, reply_to_this, self_mentions):
def process_feed(config, nick, url, content, lasttwt):

 nick_address, nick_desc = decide_nick(content, nick)
 url_for_hash = decide_url_for_hash(content, url)
  • new_lasttwt = parse(‘1800-01-01T12:00:00+00:00’).timestamp()

  • # new_lasttwt = parse(‘1800-01-01T12:00:00+00:00’).timestamp()

  • new_lasttwt = None

    for line in twt_lines_from_content(content):

     res = twt_line_to_mail(
    

    @@ -296,7 +297,7 @@ def process_feed(config, nick, url, content, lasttwt):

     twt_stamp = twt_date.timestamp()
     if lasttwt is not None and lasttwt >= twt_stamp:
         continue
    
  • if twt_stamp > new_lasttwt:

  • if not new_lasttwt or twt_stamp > new_lasttwt:

         new_lasttwt = twt_stamp
    
    
     mailname_new = join(config['maildir_target'], 'new', twt_hash)
    

⤋ Read More
In-reply-to » @movq When I look in my twtxt maildir for duplicated messages they all have F in their name.

@prologic@twtxt.net

() I believe glob () is an O(n) algorithm
Yes, I see. But don’t underestimate OS caching for files and directories!
If you look up files in the same directory many times then OS may use cached results from earlier lookups.
I’m not totally sure but I believe this is how things work for both, Windows and Linux at least.

⤋ Read More

@movq@www.uninformativ.de
When I look in my twtxt maildir for duplicated messages they all have F in their name.

I see that in mail_file_exists jenny does not consider flagged messages when testing if a message already exists.

I understand that looking up only 12 combinations is faster than reading huge directories. I’m astonished that globbing would be slower. Learning something new every day…

⤋ Read More

@movq@www.uninformativ.de
I’m not a Python programmer, so please bear with me.
The doc about encodings does also mention:

If you require a different encoding, you can manually set the Response.encoding property

Wouldn’t that be a one liner like (Ruby example)?

'some text'.force_encoding('utf-8')

I understand that you do not want to interfere with requests. On the other hand we know that received data must be utf-8 (by twtxt spec) and it does burden “publishers” to somehow add charset property to content-type header. But again I’m not sure what “the right thing to do” ™ is.

⤋ Read More

@prologic@twtxt.net @movq@www.uninformativ.de
Exactly, you see correct UTF-8 encoded version (even with content-type: text/plain leaving out charset declaration).

After following utf8test twtxt myself I now see that jenny does not handle it as UTF-8 when charset is missing from HTTP header, just like @quark@ferengi.one has observed.

So should jenny treat twtxt files always as UTF-8 encoded? I’m not sure about this.

⤋ Read More
In-reply-to » What about a meta header for setting charset?

@prologic@twtxt.net

I believe Yarn assumes utf-8 anyway which is why we don’t see encoding issues

Are you sure? I think in #kj2c5oa @quark@ferengi.one mentioned exactly that problem. My logs say “jenny/latest” was fetching my twtxt for quark.

All I did to fix this was to adding AddCharset utf-8 .txt to .htaccess. Especially I did not change encoding of stackeffect.txt.

⤋ Read More
In-reply-to » My thoughts about range requests

@movq@www.uninformativ.de

Don’t miss step 0 (I should have made this a separate point): having a meta header promising appending twts with strictly monotonically increasing timestamps.

(Also, I’d first like to see the pagination thingy implemented.)

In jenny I would like to see “don’t process previously fetched twts” AKA “Allow the user to archive/delete old twts” feature implemented ;-)

⤋ Read More
In-reply-to » Below a signed (https://keys.pub) message:

@prologic@twtxt.net
BEGIN SALTPACK ENCRYPTED MESSAGE. kiNJamlTJ29ZvW4 RHAOg9hm6h0OwKt iMGN9pY3oc5peJE UcRA8ysyQ7e8co9 shMfScCFgmQgU5Q 6w6XD2FT6szO1i1 N8qWqFRwJcHliqp hlaSvsTNhuwe1Fs KESywjL8ZvxNeyb ro0RVcRIip4Itpv NKvFZ822RoDR6pb hVvSqgubr3IanFT 6VAGQe2mYvErE7i G0O284HNvj0tcbC qzY0uB3ZFePu2fp l8nHOeEm9QLkH4Y PNKY2bXjqtblDGq 7pNiNHXtNJDjrpG nUoEXK9CaB6DGe7 oaF1P9sTz7fFrUo qwIgzw4Z1yqULQW 6dcFgsGwQEMc6bV mXuJHkrDWbfw35o 2Lpevp4PAVw884t 5Jf4cDLAe3QfRjG 4y6uwJg8BwIr2Lb 2pCX23ffwJ0yjGs Ptyzuaq2Alfl3QX AcMNGFzTNHjHfqY cvsoTrSMbyE3ssS A0k0zeRJQLoGOK4 DGkdltMXaQyXq9d zzbueCXCsIM1vYG vcy85vKuqM0ikoG caUNUuIVCc6FMs5 2JtadCtbVKyG8Wx Z4R672Fd71eDjCc lEtCdJlEAmEJePw ThkxVJutJt2R2Ce lKp9tEKmrx1jMWW V8hJNTaQGAfFDEB Unh8YasaV24NqAi GKSnstFWk3DYCxC lvws9js2jJ9OKeq 2mMgFmzEmCr99RW 2CrxZStPpB1iEDU d0Un7W7bnyo2KpV xqe8rCeHA6CUwVs 0XMmxPvU1Q0wp9A 0Jwxo5CY9QF5EJl yVwaXiVP2CKw2aH tqEE5yTp9OmpNF0 jFqgr8vHOjosPyL c3nke0S9QFjAxjt Dr6xwYpnASDr1l1 N96G3FB5iVYLFaz FkXGm7oQNTaDY8e OtHXQiXRhQY3PCi VIYYVhc9RExVnfX fvzgfgc5uSxUynD sPp4eq2rJXkX5. END SALTPACK ENCRYPTED MESSAGE.

Let’s see how resilient this is, or if it breaks.

⤋ Read More

My thoughts about range requests

Additionally to pagination also range request should be used to reduce traffic.

I understand that there are corner cases making this a complicated matter.

I would like to see a meta header saying that the given twtxt is append only with increasing timestamps so that a simple strategy can detect valid content fetched per range request.

  1. read meta part per range request
  2. read last fetched twt at expected range (as known from last fetch)
  3. if fetched content starts with expected twt then process rest of data
  4. if fetched content doesn’t start with expected twt discard all and fall back to fetching whole twtxt

Pagination (e.g. archiving old content in a different file) will lead to point 4.

Of course especially pods should support range requests, correct @prologic@twtxt.net?

⤋ Read More

My thoughts about pagination (paging)

Following the discussion about pagination (paging) I think that’s the right thing to do.

Fetching the same content again and again with only a marginal portion of actually new twts is unbearable and does not scale in any way. It’s not only a waste of bandwidth but with increasing number of fetchers it will also become a problem for pods to serve all requests.

Because it’s so easy to implement and simple to understand, splitting twtxt file in parts with next and prev pointers seems a really amazing solution.

As in RFC5005 there should also be a meta header pointing to the main URL, e.g. current or baseurl or something like that. This way hashes can calculated correctly even for archived twts.

⤋ Read More