Skip to content

Character encoding detection for ruby using chardetng

Notifications You must be signed in to change notification settings

careport/chardetng-ruby

Repository files navigation

CharDetNg

Character encoding detection using chardetng.

Installation

Add to your application's Gemfile:

gem "char_det_ng", git: "https://github.com/careport/chardetng-ruby"

And install with:

bundle install

Usage

Create a detector, feed it data, and guess at the encoding:

require "char_det_ng"

# Instantiate an EncodingDetector
detector = CharDetNg::EncodingDetector.new

# Feed it data
detector << File.read("/path/to/some/text/file")
# => true

# Guess the encoding, along with a boolean indicating
# whether the result is more likely than other encodings
detector.guess_and_assess
# => ["UTF-8", true]

# Or, if you just want the encoding:
detector.guess
# => "UTF-8"

There are also simple APIs for dealing with entire files:

require "char_det_ng"

CharDetNg::EncodingDetector.guess_and_assess_file("/path/to/file")
# => ["WINDOWS-1252", true]

and IO objects:

File.open("/path/to/file", mode: "rb") do |f|
  CharDetNg::EncodingDetector.guess_and_assess_io(f)
end
# => ["WINDOWS-1252", true]

The simple APIs read until EOF. If you want to guess without reading the whole thing, you should instantiate a detector object and feed it data, as in the first example.

About

Character encoding detection for ruby using chardetng

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published