I recently got an email with this byte sequence:
\xf0\x9f\x8e\x81\xf0\x9f\x95\xaf\xef\xb8\x8f
That’s U+1F381, U+1F56F, U+FE0F. The last one is a “variation selector”:
https://unicodeplus.com/U+FE0F
My toolkit renders this incorrectly – and so do tmux and GNU screen.
Unicode ain’t easy. 🥴
@bender@twtxt.net I’m already using it for tracktivity (meant for tracking activities and events, like weather, food consumption, stuff like that), which is basically a somewhat-fancy CSV editor:
https://movq.de/v/f26eb836ee/s.png
I have a couple of other projects where I could use it, because they are plain curses at the moment. Like, one of them has an “edit box”, but you can’t enter Unicode, because it was too complicated. That would benefit from the framework.
Either way, it’s the most satisfying project in a long time and I’m learning a ton of stuff.
tcell.Key constants and typing different key combinations in the terminal to see the generated tcell.EventKeys in the debug log. Until I pressed Ctrl+Alt+Backspace… :-D Yep, suddenly there went my X…
@movq@www.uninformativ.de Yeah, I know that terminals are super weird and messy. In both the KDE Konsole (identifying itself as TERM=xterm-256color) and xterm (TERM=xterm) it just works flawlessly. My urxvt (TERM=rxvt-unicode-256color) just doesn’t. I also tried messing with TERM in urxvt, but no luck so far.
More widget system progress:
https://movq.de/v/87e2bce376/vid-1767467193.mp4
I like the oldschool shadow effect. 😅 Not sure if I’ll keep it, but it’s neat.
The menu bar is still fake.
Had to spend quite a bit of time optimizing the rendering today. This can get really slow really quickly.
Unicode is Pain.
I might be able to start porting my first program (currently uses urwid) soon. 🤔
Why have these Unicode smilies never caught on, I wonder? 🤪
Well, you girls and guys are making cool things, and I have some progress to show as well. 😅
https://movq.de/v/c0408a80b1/movwin.mp4
Scrolling widgets appears to work now. This is (mostly) Unicode-aware: Note how emojis like “😅” are double-width “characters” and the widget system knows this. It doesn’t try to place a “😅” in a location where there’s only one cell available.
Same goes for that weird “ä” thingie, which is actually “a” followed by U+0308 (a combining diacritic). Python itself thinks of this as two “characters”, but they only occupy one cell on the screen. (Assuming your terminal supports this …)
This library does the heavy Unicode lifting: https://github.com/jquast/wcwidth (Take a look at its implementation to learn how horrible Unicode and human languages are.)
The program itself looks like this, it’s a proper widget hierarchy:
https://movq.de/v/1d155106e2/s.png
(There is no input handling yet, hence some things are hardwired for the moment.)
@movq@www.uninformativ.de I see. Yeah, all the Unicode stuff certainly doesn’t help here, that’s for sure.
Maybe “speedcurses” could be a name. Or just select any Palatinate curse. ;-)
@lyse@lyse.isobeef.org I’m toying with the idea of making a widget/window system on top of Python’s ncurses. I’ve never really been happy with the existing ones (like urwid, textual, pytermgui, …). I mean, they’re not horrible, it’s mostly the performance that’s bugging me – I don’t want to wait an entire second for a terminal program to start up.
Not sure if I’ll actually see it through, though. Unicode makes this kind of thing extremely hard. 🫤
Awk to take lines from Plan 9’s /lib/unicode and prepend the actual glyph and a tab: awk ‘{cmd=sprintf(“unicode %s”, $1); cmd | getline c; printf(“%s %s\n”, c, $0)}’
Unicode characters in iskeyword ⌘ Read more
Tag proposal: typesetting
For stories relating to: how text is laid out an
Discussion: should it include e.g. stories about Pango? Unicode (partially relevant to typesetting/typography I suppose)? These are a bit further from “typsetting” but is relevant to typography and encoding? Should it include stories about markup languages?
See searches for e.g. [LaTeX](https://lobste.rs/search … ⌘ Read more
How to display non-printable unicode characters? ⌘ Read more
How to display non-printable unicode characters? ⌘ Read more
@lyse@lyse.isobeef.org The underlines are a bit much, yes. It appears to be related to my font (Helvetica) … Maybe they do some Unicode trickery these days, I don’t know. 🫤
fn sub(foo: &String) {
println!("We got this string: [{}]", foo);
}
fn main() {
// "Hello", 0x00, 0x00, "!"
let buf: [u8; 8] = [0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x00, 0x00, 0x21];
// Create a string from the byte array above, interpret as UTF-8, ignore decoding errors.
let lossy_unicode = String::from_utf8_lossy(&buf).to_string();
sub(&lossy_unicode);
}
Create a string from a byte array, but the result isn’t a string, it’s a cow 🐮, so you need another to_string() to convert your “string” into a string.
- https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy
- https://doc.rust-lang.org/std/borrow/enum.Cow.html
I still have a lot to learn.
(into_owned() instead of to_string() also works and makes more sense to me, it’s just that the compiler suggested to_string() first, which led to this funny example.)
(Where is there no bass emoji in Unicode? Pah.)
Trinity Desktop Environment R14.1.4 released
The Trinity Desktop Environment, the modern-day continuation of the KDE 3.x series, has released version R14.1.4. This maintenance release brings new vector wallpapers and colour schemes, support for Unicode surrogate characters and planes above zero (for emoji, among other things), tabs in kpdf, transparency and other new visual effects for Dekorator, and much more. TDE R14.1.4 is already available for a variety of Linux distributions, and c … ⌘ Read more
Understanding surrogate pairs: why some Windows filenames can’t be read
Windows was an early adopter of Unicode, and its file APIs use UTF‑16 internally since Windows 2000-used to be UCS-2 in Windows 95 era, when Unicode standard was only a draft on paper, but that’s another topic. Using UTF-16 means that filenames, text strings, and other data are stored as sequences of 16‑bit units. For Windows, a properly formed surrogate pair is perfectly acceptable. However … ⌘ Read more
@bender@twtxt.net @prologic@twtxt.net I’m not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you won’t mind if I continue to write things like 1/4 to mean “first out of four”.
What has text/markdown got to do with this? I don’t think Markdown says anything about replacing 1/4 with ¼, or other similar transformations. It’s not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.
What’s wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic@twtxt.net, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4 on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldn’t do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.
Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, that’s fine too and you don’t need to change anything. My 1/4 -> ¼ thing is nothing more than a minor irritation which probably isn’t worth overthinking.
Unicode doesn’t distinguish between a dollar sign with one and a dollar sign with two strokes, which makes me sad.
Account Problems
⌘ Read more
[Emacs] 替數學符號設定專用字型 ⌘ Read more
Weird Unicode Math Symbols
⌘ Read more
someday i will descend upon the unicode consortium and add sub/superscript version of the whole latin alphabet
@lyse@lyse.isobeef.org What the heck? no emoji? do you even Unicode!
@prologic@twtxt.net Yeah like normally I’m just a little annoyed and just say “whatever” and shrug it off, but come on I am searching for emojis here. Do you really need to harvest my user data for what is essentially a fuzzy search in the Unicode table?
@prologic@twtxt.net lol. just testing some Unicode.
On the blog: Where Have All the Emoji Gone? https://john.colagioia.net/blog/2021/09/29/emoji.html #programming #techtips #unicode #blog
https://metacpan.org/release/WOLFSAGE/perl-5.35.4/changes#Unicode-14.0-is-supported Perl 5.35.4 版之後所對應的 Unicode 版本已經推進到 14.0.0 了。
@prologic@twtxt.net should we enable all unicode glyphs for tags? https://txt.sour.is/conv/55yrura
I wrote a ‘banner’-like program for Plan 9 (and p9p) that uses the Unicode box drawing characters: http://txtpunk.com/banner/index.html
https://www.materialui.co/unicode-characters design unicode web
huh, txtnish seems to have problems with linebreaks & unicode;.
Accented and other unicode characters in groff/troff ⌘
Teletext graphics characters among those added to Unicode – Teletext Art http://teletextart.co.uk/teletext-graphics-characters-among-those-added-to-unicode
Because of the use of ‘rune’ to refer to unicode codepoints in go, a fulthark transliteration program might have somewhat confusing source…
A Spectre is Haunting Unicode https://www.dampfkraft.com/ghost-characters.html
unum - Interconvert numbers, Unicode, and HTML/XHTML characters http://www.fourmilab.ch/webtools/unum/
Does unicode also work? 💚☎
All fonts I tried are either ugly or the unicode glyphs are so small that they become unreadable. #fonts
The pretty format is very similar to twtxt without the unicode glyphs and the relative date.