Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode bugs #15

Open
epitron opened this issue Apr 6, 2013 · 7 comments
Open

Unicode bugs #15

epitron opened this issue Apr 6, 2013 · 7 comments

Comments

@epitron
Copy link
Contributor

epitron commented Apr 6, 2013

I just tested out editing UTF-8 in coolline, and it doesn't seem to work properly.

@pos appears to be counting bytes, not chars, which puts the cursor way off in space. Predictably, backspace removes a byte at a time. I assume everything will exhibit this behaviour. :)

Here's a good piece of Unicode for testing: ┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻

@Mon-Ouie
Copy link
Owner

Mon-Ouie commented Apr 6, 2013

I'm not entirely sure, because it does work with certain characters (for example été, π, λ). I'm not sure what's the difference with your string.

All the indices should be in characters provided Ruby knows the proper encoding of the string, since string manipulation functions are character based as of 1.9.

In your case, the problematic characters seem to be ︵ヽノ︵.

@epitron
Copy link
Contributor Author

epitron commented Apr 7, 2013

Hmm! Okay, thanks for testing. I've narrowed the problem down -- for some reason, when I run a script as an executable using #!/usr/bin/env ruby, default_line="" becomes US-ASCII8BIT encoded. If i run ruby scriptname, it uses UTF8.

This is a bit weird -- I'm not really sure whose fault this is. :)

@epitron
Copy link
Contributor Author

epitron commented Apr 7, 2013

Wait.. OMG, this is so weird.

Okay, so, it has nothing to do with the script being executed like a binary.

When I first run readline, the encoding is UTF8. When I paste " ︵ヽノ︵", the encoding becomes ASCII-8BIT.

Something stinks here. :)

Here's my test script (Alt-E prints encoding/length):

#!/usr/bin/env ruby
require 'coolline'

cool = Coolline.new do |c|
  c.bind "\ee" do |c2|
    p [c2.line.size, c2.line.encoding]
  end
end

cool.readline

@Mon-Ouie
Copy link
Owner

Mon-Ouie commented Apr 7, 2013

I suspect the problems happen at insertion time. For example, maybe the character doesn't get inserted in one go, and when we insert part of it, the string becomes invalid as UTF-8 and the encoding gets changed.

Oddly enough, here, after pasting the same string, I get the right position and UTF-8 as an encoding, editing works, but the cursor is definitly not rendered at the right position (it appears one line below, one character to the left).

@epitron
Copy link
Contributor Author

epitron commented Apr 7, 2013

Oh man, that's weird. Now I'm getting your behaviour. Everything stays UTF8, but I get new lines.

@epitron
Copy link
Contributor Author

epitron commented Apr 7, 2013

It only happens with double-wide characters, it seems. ノ is fine, ︵ prints a new line.

@epitron
Copy link
Contributor Author

epitron commented Apr 7, 2013

After poking around with ANSI cursor positioning and double-wide UTF8 characters, it appears that they actually take up 2 columns on the display.

For example, if you print "ab︵c", then position the cursor on the screen using ANSI codes, the column of each character is as follows:

a = 1
b = 2
︵ = 3-4
c = 5

I'm still stumped as to why it's adding a linefeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants