While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data
From the personal blog of interface expert Bruce Ediger:
Early in March 2025, I noticed that a web crawler with a user
agent string of
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
was hitting my blogās machine at an unreasonable rate.
⦠ā Read more
I used Gemini (the Google AI) twice at work today, asking about Google Workspace configuration and Google Cloud CLI usage (because we use those a lot). Youād think that itād be well-suited for those topics. It answered very confidently, yet completely wrong. Just wrong. Made-up CLI arguments, whatever. It took me a while to notice, though, because itās so convincing and, well, you implicitly and subconsciously trust the results of the Google AI when asking about Google topics, donāt you?
Will it get better over time? Maybe. But what I really want is this:
- Good, well-structured, easy-to-read, proper documentation. Google isnāt doing too bad in this regard, actually, itās just that they have so much stuff that itās hard to find what youāre looking for. Hence ā¦
- ⦠I want a good search function. Just give me a good fuzzy search for your docs. Thatās it.
I just donāt have the time or energy to constantly second-guess this stuff. Give me something reliable. Something that is designed to do the right thing, not toy around with probabilities. āAI for everythingā is just the wrong approach.
Jelly Slider
Based on : https://x.com/cerpow/status/1964953851603358112
@movq@www.uninformativ.de Yeah, give it a shot. At worst you know that you have to continue your quest. :-)
Fun fact, during a semester break I was actually a little bored, so I just started reading the Qt documentation. I didnāt plan on using Qt for anything, though. I only looked at the docs because they were on my bucket list for some reason. Qt was probably recommended to me and coming from KDE myself, that was motivation enough to look at the docs just for fun.
The more I read, the more hooked I got. The documentation was extremely well written, something Iāve never seen before. The structure was very well thought out and I got the impression that I understood what the people thought when they actually designed Qt.
A few days in I decided to actually give it a real try. Having never done anything in C++ before, I quickly realized that this endeavor wonāt succeed. I simply couldnāt get it going. But I found the Qt bindings for Python, so that was a new boost. And quickly after, I discovered that there were even KDE bindings for Python in my package manager, so I immediately switched to them as that integrated into my KDE desktop even nicer.
I used the Python KDE bindings for one larger project, a planning software for a summer camp that we used several years. Itās main feature was to see who is available to do an activity. In the past, that was done on a large sheet of paper, but people got assigned two activities at the same time or werenāt assigned at all. So, by showing people in yellow (free), green (one activity assigned) and red (overbooked), this sped up and improved the planning process.
Another core feature was to generate personalized time tables (just like back in school) and a dedicated view for the morning meeting on site.
It was extended over the years with all sorts of stuff. E.g. I then implemented a warning if all the custodians of an activitiy with kids were underage to satisfy new the guidelines that there should be somebody of age.
Just before the pandemic I started to even add support for personalized live views on phones or tablets during the planning process (with web sockets, though). This way, people could see their own schedule or independently check at which day an activity takes place etc. For these side quests, they donāt have to check the large matrix on the projector. But the project died there.
Hereās a screenshot from one of the main views: https://lyse.isobeef.org/tmp/k3man.png
This Python+Qt rewrite replaced and improved the Java+Swing predecessor.
Actually. Looking at the template and the BeerCSS docs, I think Iām just using the wrong elements and doing the wrong thing in the template/partial structure itself š¤ Probably need to wrap text in something else other than a plain āol <p>
@prologic@twtxt.net No, this is a Linux manpage from the man-pages project: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man7/ascii.7
I do have an idea whatās going on. Could be an unfortunate interaction between the table preprocessor tbl and the man macro package. š¤
Dear dev.alessandrocutolo.it, do you really need to fetch my twtxt feed every 20-30 seconds? š
Not that itās posing a problem, but I feel like this could be optimized. For example, how about using the if-modified-since request header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/If-Modified-Since
apt manpage of Ubuntu recently, which, for some reason, uses blue text in one place:
Ah, so apparently they donāt like writing manpages anymore and instead use XML:
https://salsa.debian.org/apt-team/apt/-/blob/main/doc/apt.8.xml
And then they use XSLT on top and what not:
https://salsa.debian.org/apt-team/apt/-/blob/main/doc/manpage-style.xsl.cmake.in
Itās not even explicitly blue:
https://salsa.debian.org/apt-team/apt/-/blob/main/doc/apt.ent?ref_type=heads#L17
Abstractions upon abstractions upon abstractions.
@kat@yarn.girlonthemoon.xyz I BELIEVE IN U!!! Making it fun helps! Maybe like put images in the docs so itās cuter to look at! I did that, but with physical journaling. Except instead of pics it was receipts & leaves & dried flowers lol
Look at that, a mate just told me: What if YAML had even more security issues!? YAMLScript! https://yamlscript.org/doc/cheat/
Hereās an example of X11/Xlib being old and archaic.
X11 knows the data type ācardinalā. For example, the window property _NET_WM_ICON (which holds image data for icons) is an array of ācardinalā. I am already not really familiar with that word and Iām assuming that it comes from mathematics:
https://en.wikipedia.org/wiki/Cardinal_number
(It could also be a bird, but probably not: https://en.wikipedia.org/wiki/Cardinalidae)
We would probably call this an āintegerā today.
EWMH says that icons are arrays of cardinals and that theyāre 32-bit numbers:
https://specifications.freedesktop.org/wm-spec/latest-single/#id-1.6.13
So itās something like 0x11223344 with 0x11 being the alpha channel, 0x22 is red, and so on.
You would assume that, when you retrieve such an array from the X11 server, youād get an array of uint32_t, right?
Nope.
Xlib is so old, they use char for 8-bit stuff, short int for 16-bit, and long int for 32-bit:
That is congruent with the general C data types, so it does make sense:
https://en.wikipedia.org/wiki/C_data_types
Now the funny thing is, on modern x86_64, the type long int is actually 64 bits wide.
The result is that every pixel in a Pixmap, for example, is twice as large in memory as it would need to be. Just because Xlib uses long int, because uint32_t didnāt exist, yet.
And this is something that I wouldnāt know how to fix without breaking clients.
@lyse@lyse.isobeef.org @kat@yarn.girlonthemoon.xyz I spent so much time in the past figuring out if something is a dict or a list in YAML, for example.
What are the types in this example?
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
price: 133.7
quantity: 1
items is a dict containing ⦠a list of two other dicts? Right?
It is quite hard for me to grasp the structure of YAML docs. š¢
The big advantage of YAML (and JSON and TOML) is that itās much easier to write code for those formats, than it is with XML. json.loads() and youāre done.
After many weeks and probably at least a hundred hours of research, discussions and in-person viewing, I think Iāve finally come up with my Final Choices (shortlist) of a Hybrid Camper / Caravan that I think will suit my family and that Iāll enjoy (far less work for me to setup and teardown). The one at the top of the list Iām leaning towards os the SWAG SCT16 Family 4B
#Camping #CampersHTTP referrers are quite broken, arenāt they?
Because of that recent storm on my blog, I had a peek at them. Thereās a lot of garbage in there. For example, https://docs.freebsd.org/en/books/handbook/disks-virtual.html is supposed to refer to one of my blog posts ā¦
Whatās going on here?
Been spending a lot of time researching campers as I want to / plan to upgrade our current Camper Trailoer (forward fold) Stoney Creek XL-FF6 to a slightly larger Hybrid Camper/Caravan with ensuite, internal kitchenette, external full hitchen, pop-top roof and twin bunks.
This is the summary and whittling down of my research so far: https://wiki.mills.io/s/1103bc9c-dd75-4a98-b64b-8dadc5b0e51f/doc/comparision-Ln03Moiibq
I did a ālectureā/āworkshopā about this at work today. 16-bit DOS, real mode. š¾ Pretty cool and the audience (devs and sysadmins) seemed quite interested. š„³
- People used the Intel docs to figure out the instruction encodings.
- Then they wrote a little DOS program that exits with a return code and they used uhex in DOSBox to do that. Yes, we wrote a COM file manually, no Assembler involved. (Many of them had never used DOS before.)
- DEBUG from FreeDOS was used to single-step through the program, showing what it does.
- This gets tedious rather quickly, so we switched to SVED from SvarDOS for writing the rest of the program in Assembly language. nasm worked great for us.
- At the end, we switched to BIOS calls instead of DOS syscalls to demonstrate that the same binary COM file works on another OS. Also a good opportunity to talk about bootloaders a little bit.
- (I think they even understood the basics of segmentation in the end.)
The 8086 / 16-bit real-mode DOS is a great platform to explain a lot of the fundamentals without having to deal with OS semantics or executable file formats.
Now that was a lot of fun. š„³ Itās very rare that we do something like this, sadly. I love doing this kind of low-level stuff.
Okay, hereās a thing I like about Rust: Returning things as Option and error handling. (Or the more complex Result, but itās easier to explain with Option.)
fn mydiv(num: f64, denom: f64) -> Option<f64> {
// (Letās ignore precision issues for a second.)
if denom == 0.0 {
return None;
} else {
return Some(num / denom);
}
}
fn main() {
// Explicit, verbose version:
let num: f64 = 123.0;
let denom: f64 = 456.0;
let wrapped_res = mydiv(num, denom);
if wrapped_res.is_some() {
println!("Unwrapped result: {}", wrapped_res.unwrap());
}
// Shorter version using "if let":
if let Some(res) = mydiv(123.0, 456.0) {
println!("Hereās a result: {}", res);
}
if let Some(res) = mydiv(123.0, 0.0) {
println!("Huh, we divided by zero? This never happens. {}", res);
}
}
You canāt divide by zero, so the function returns an āerrorā in that case. (Option isnāt really used for errors, IIUC, but the basic idea is the same for Result.)
Option is an enum. It can have the value Some or None. In the case of Some, you can attach additional data to the enum. In this case, we are attaching a floating point value.
The caller then has to decide: Is the value None or Some? Did the function succeed or not? If it is Some, the caller can do .unwrap() on this enum to get the inner value (the floating point value). If you do .unwrap() on a None value, the program will panic and die.
The if let version using destructuring is much shorter and, once you got used to it, actually quite nice.
Now the trick is that you must somehow handle these two cases. You must either call something like .unwrap() or do destructuring or something, otherwise you canāt access the attached value at all. As I understand it, it is impossible to just completely ignore error cases. And the compiler enforces it.
(In case of Result, the compiler would warn you if you ignore the return value entirely. So something like doing write() and then ignoring the return value would be caught as well.)
@prologic@twtxt.net Iām trying to call some libc functions (because the Rust stdlib does not have an equivalent for getpeername(), for example, so I donāt have a choice), so I have to do some FFI stuff and deal with raw pointers and all that, which is very gnarly in Rust ā because youāre not supposed to do this. Things like that are trivial in C or even Assembler, but I have not yet understood what Rust does under the hood. How and when does it allocate or free memory ⦠is the pointer that I get even still valid by the time I do the libc call? Stuff like that.
I hope that I eventually learn this over time ⦠but I get slapped in the face at every step. Itās very frustrating and Iām always this š¤ close to giving up (only to try again a year later).
Oh, yeah, yeah, I guess I could ājustā use some 3rd party library for this. socket2 gets mentioned a lot in this context. But I donāt want to. I literally need one getpeername() call during the lifetime of my program, I donāt even do the socket(), bind(), listen(), accept() dance, I already have a fully functional file descriptor. Using a library for that is total overkill and Iād rather do it myself. (And look at the version number: 0.5.10. The library is 6 years old but theyāre still saying: āNah, weāre not 1.0 yet, we reserve the right to make breaking changes with every new release.ā So many Rust libs are still unstable ā¦)
⦠and I could go on and on and on ⦠š¤£
fn sub(foo: &String) {
println!("We got this string: [{}]", foo);
}
fn main() {
// "Hello", 0x00, 0x00, "!"
let buf: [u8; 8] = [0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0x00, 0x21];
// Create a string from the byte array above, interpret as UTF-8, ignore decoding errors.
let lossy_unicode = String::from_utf8_lossy(&buf).to_string();
sub(&lossy_unicode);
}
Create a string from a byte array, but the result isnāt a string, itās a cow š®, so you need another to_string() to convert your āstringā into a string.
- https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy
- https://doc.rust-lang.org/std/borrow/enum.Cow.html
I still have a lot to learn.
(into_owned() instead of to_string() also works and makes more sense to me, itās just that the compiler suggested to_string() first, which led to this funny example.)
@lyse@lyse.isobeef.org Rust is so different and, at the same time, so complex ā itās not far fetched to assume that I simply donāt understand whatās going on here. The docs appear to be clear, but alas ⦠is it a bugs in the docs? Is it a lack of experience on my part? Who knows.
By the way, looks like there was a bit of a discussion regarding that name:
So I was using this function in Rust:
https://doc.rust-lang.org/std/path/struct.Path.html#method.display
Note the little 1.0.0 in the top right corner, which means that this function has been āstable since Rust version 1.0.0ā. Weāre at 1.87 now, so weāre good.
Then I compiled my program on OpenBSD with Rust 1.86, i.e. just one version behind, but well ahead of 1.0.0.
The compiler said that I was using an unstable library feature.
Turns out, that function internally uses this:
https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.display
And that is only available since Rust 1.87.
How was I supposed to know this? š¤Øš«©
@prologic@twtxt.net Yeah, itās difficult, you often donāt get what youād expect. They also make heavy use of 3rd party libraries. IIUC, for random numbers, they refer to this library. Iāve read many times that the Rust stdlib is intentionally minimalistic (to make it easier to maintain and port and all that).
Iām struggling with this, using 3rd party libs for so many things isnāt really my cup of tea. Iāll probably make my own tiny little āstandard libraryā. Itās silly, but I donāt see any other options. š¤·
Documentation done right: A developerās guide
Learn why and how you should write docs for your project with the DiƔtaxis framework.
The post Documentation done right: A developerās guide appeared first on The GitHub Blog. ā Read more
@movq@www.uninformativ.de @kat@yarn.girlonthemoon.xyz @quark@ferengi.one In 2014 one person created protocol ii. Later it forked in IDEC. Why i said this? Because itās simple āfederatedā forum-like protocol where from your station fetch another every 5-10 minutes. Stations has topic-based channels like idec.talks, linux.16, haiku.os, zx.spectrum. In short itās FIDO but.. more modern? Documentation: https://github.com/idec-net/new-docs (mostly Russian, but you can use translator, also protocol already translated to english)
Confession:
Iāve never found microblogging like twtxt or the Fediverse or any other āmodernā social media to be truly fulfilling/satisfying.
The reason is that it is focused so much on people. You follow this or that person, everybody spends time making a nice profile page, the posts are all very āego-centricā. Seriously, it feels like everybody is on an ego-trip all the time (this is much worse on the Fediverse, not so much here on twtxt).
I miss the days of topic-based forums/groups. A Linux forum here, a forum about programming there, another one about a certain game. Stuff like that. That was really great ā and it didnāt even suffer from the need to federate.
Sadly, most of these forums are dead now. Especially the nerds spend a lot of time on the Fediverse now and have abandoned forums almost completely.
On Mastodon, you can follow hashtags, which somewhat emulates a topic-based experience. But itās not that great and the protocol isnāt meant to be used that way (just read the snac2 docs on this issue). And the concept of ālikesā has eliminated lots of the actual user interaction. ā¹ļø
Nobody writes emails by hand using RFC 5322 anymore, nor do we manually send them through telnet and SMTP commands. The days of crafting emails in raw format and dialing into servers are long gone. Modern email clients and services handle it all seamlessly in the background, making email easier than ever to send and receiveāwithout needing to understand the protocols or formats behind it! #Email #SMTP #RFC #Automation
Another war story: the hardest bug I ever debugged
I recently stumbled on Jacob Voytkoās Google Docs bug story and it reminded me of the weirdest bug I ever chased.
It started with a user reporting their webcam was rotated by 90° ā but only sometimes. This turned into a wild hunt across browsers, OS quirks, WebRTC, and even HTTP redirects.
@kat@yarn.girlonthemoon.xyz I skimmed through the gamja docs and they say you need an āIRC WebSocket serverā ā no idea what that is. Does gamja not speak IRC directly but essentially āIRC over HTTPā? Curious. š¤
7k words of docs on deploying a livejournal folk. you absolutely want to read 7 thousand words of me forcing dreamwidth into production shape in docker https://stash.4-walls.net/selfhostdw/
The Firefox āTerms of Useā Backlash Threatens to Destroy Whatās Left of Mozilla
Mozilla accidentally shares internal Google Doc with Lunduke (for one minute). ā Read more