Try it out! Live at https://noguera.dev/unicode-string-shortener.
The idea of this project is to compress text in terms of bytes or characters used while maintaining its human readability. Using this program you can enter more text than intended into a limited-size form field.
For the intents of this project, anything that looks close enough to a latin letter to be readable is considered acceptable, even if it may have an entirely different meaning.
Here is an example of program functionality.
Enter string to shorten: aether
Input: aether (6)
Shortest in bytes: æther (6)
Shortest in characters used: æᵺer (4)
A human-readable list of shortenings used is available in map.tsv
. The columns in that file are: unicode codepoint, unicode character, ascii equivalent strings. There can be more than one ascii equivalent string for a single unicode character, columns are added for each additional one. However, no two unicode characters can be translated to the same string (the program checks for this and will error).
To update the computer-readable list in map.bincode
, delete the map.bincode
file and run the program. A new bincode will be produced from map.tsv
.
Available online at https://noguera.dev/unicode-string-shortener. The program is compiled to webassembly and runs entirely client-side in your browser.
The program
member of this workspace compiles to an executable program, unishorten
, which by default takes a single argument that is the string to shorten. Alternatively, unishorten -i
will run in interactive mode and prompt for input.
Shortens ascii strings by substituting unicode characters that look like more than one ascii character
Usage: unishorten [OPTIONS] [input]
Arguments:
[input] string to shorten
Options:
-i, --interactive interactive mode
-h, --help Print help information
-V, --version Print version information
You can also import the program
folder of this repo as a library. This can be done with the command:
cargo add --git https://github.com/michaelnoguera/unicode-string-shortener unishorten
After running the command, your cargo.toml
file should contain:
[dependencies]
unishorten = { git = "https://github.com/michaelnoguera/unicode-string-shortener", version = "0.1.0" }
Most users will want to instantiate a StringShortener
rather than interacting with the various utility functions in the library.
use unishorten::StringShortener;
let Shortener = StringShortener::new();
let out = Shortener.shorten_by_chars(/* reference to input string */);
To customize the list of mappings, clone this repository, edit map.tsv
, then delete map.bincode
and run the command line program again. The bincode file will be regenerated from whatever the tsv file contains.
This assumes you have Rust installed.
- Clone this repo.
git clone https://github.com/michaelnoguera/unicode-string-shortener
- Run
cargo build
to install dependencies and compile. This will take a long time the first time you run it on this project. - Run
cargo run
to run the program.
If you use VS Code and have Docker installed, you can clone this repo into a devcontainer that reproduces my build.
-
Install the Dev Containers extension, if not already installed.
-
Then paste this address into your browser's address bar (Github disables vscode:// links)
vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/michaelnoguera/unicode-string-shortener
If you don't feel like clicking my strange-looking links, you can alternatively Open VS Code, open the Command Pallete (Ctrl/Cmd + Shift + P), select "Dev Containers: Clone Repository in Container Volume" and specify this repo
https://github.com/michaelnoguera/unicode-string-shortener
. -
Wait for the container to set itself up.
cargo build
will run automatically to configure dependencies. -
Run the program with
cargo run
.
To open this repository in Codespaces, use the button below.
Warning
Bad news: This could cost money.
Good news: You get 60 hours free per month (on the least powerful option), so if you are just poking around and want to check out this project for a minute or two, it will almost certainly be free.