Unicode bugs #15

epitron · 2013-04-06T14:10:49Z

I just tested out editing UTF-8 in coolline, and it doesn't seem to work properly.

@pos appears to be counting bytes, not chars, which puts the cursor way off in space. Predictably, backspace removes a byte at a time. I assume everything will exhibit this behaviour. :)

Here's a good piece of Unicode for testing: ┻━┻ ︵ヽ(`Д´)ﾉ︵ ┻━┻

The text was updated successfully, but these errors were encountered:

Mon-Ouie · 2013-04-06T22:06:40Z

I'm not entirely sure, because it does work with certain characters (for example été, π, λ). I'm not sure what's the difference with your string.

All the indices should be in characters provided Ruby knows the proper encoding of the string, since string manipulation functions are character based as of 1.9.

In your case, the problematic characters seem to be ︵ヽﾉ︵.

epitron · 2013-04-07T02:32:52Z

Hmm! Okay, thanks for testing. I've narrowed the problem down -- for some reason, when I run a script as an executable using #!/usr/bin/env ruby, default_line="" becomes US-ASCII8BIT encoded. If i run ruby scriptname, it uses UTF8.

This is a bit weird -- I'm not really sure whose fault this is. :)

epitron · 2013-04-07T02:50:37Z

Wait.. OMG, this is so weird.

Okay, so, it has nothing to do with the script being executed like a binary.

When I first run readline, the encoding is UTF8. When I paste " ︵ヽﾉ︵", the encoding becomes ASCII-8BIT.

Something stinks here. :)

Here's my test script (Alt-E prints encoding/length):

#!/usr/bin/env ruby
require 'coolline'

cool = Coolline.new do |c|
  c.bind "\ee" do |c2|
    p [c2.line.size, c2.line.encoding]
  end
end

cool.readline

Mon-Ouie · 2013-04-07T07:56:17Z

I suspect the problems happen at insertion time. For example, maybe the character doesn't get inserted in one go, and when we insert part of it, the string becomes invalid as UTF-8 and the encoding gets changed.

Oddly enough, here, after pasting the same string, I get the right position and UTF-8 as an encoding, editing works, but the cursor is definitly not rendered at the right position (it appears one line below, one character to the left).

epitron · 2013-04-07T10:13:45Z

Oh man, that's weird. Now I'm getting your behaviour. Everything stays UTF8, but I get new lines.

epitron · 2013-04-07T10:14:50Z

It only happens with double-wide characters, it seems. ﾉ is fine, ︵ prints a new line.

epitron · 2013-04-07T19:21:43Z

After poking around with ANSI cursor positioning and double-wide UTF8 characters, it appears that they actually take up 2 columns on the display.

For example, if you print "ab︵c", then position the cursor on the screen using ANSI codes, the column of each character is as follows:

a = 1
b = 2
︵ = 3-4
c = 5

I'm still stumped as to why it's adding a linefeed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode bugs #15

Unicode bugs #15

epitron commented Apr 6, 2013

Mon-Ouie commented Apr 6, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013

Mon-Ouie commented Apr 7, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013

Unicode bugs #15

Unicode bugs #15

Comments

epitron commented Apr 6, 2013

Mon-Ouie commented Apr 6, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013

Mon-Ouie commented Apr 7, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013

epitron commented Apr 7, 2013