Skip to content

Commit

Permalink
Merge pull request #27 from runk/typescript
Browse files Browse the repository at this point in the history
BREAKING CHANGE: Repo overhaul

- Deprecation of callbacks in favour of promises
- Deprecation of `detectAll`, `detectFileAll` and `detectFileAllSync` - use `analyse` fn instead.
- Typescript typings now included as part of distribution
- Modules support
- Travis CI org => com migration
- Lazy loading of `fs` module to enable usage in browser
  • Loading branch information
Dmitry Shirokov committed Mar 31, 2020
2 parents c06f4dc + 64f6ad2 commit a6fca62
Show file tree
Hide file tree
Showing 75 changed files with 12,997 additions and 4,117 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
testing.js
.DS_Store
node_modules
coverage
npm-debug.log
lib
3 changes: 3 additions & 0 deletions .prettierrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"singleQuote": true
}
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (C) 2019 Dmitry Shirokov
Copyright (C) 2020 Dmitry Shirokov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
98 changes: 53 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@

chardet [![Build Status](https://travis-ci.org/runk/node-chardet.png)](https://travis-ci.org/runk/node-chardet)
=====
# chardet [![Build Status](https://travis-ci.org/runk/node-chardet.png)](https://travis-ci.org/runk/node-chardet)

Chardet is a character detection module for NodeJS written in pure Javascript.
Module is based on ICU project http://site.icu-project.org/, which uses character
Expand All @@ -15,27 +13,31 @@ npm i chardet
## Usage

To return the encoding with the highest confidence:

```javascript
var chardet = require('chardet');
const chardet = require('chardet');

chardet.detect(Buffer.from('hello there!'));
// or
chardet.detectFile('/path/to/file', function(err, encoding) {});
chardet.detectFile('/path/to/file').then(encoding => console.log(encoding));
// or
chardet.detectFileSync('/path/to/file');
```

To return the full list of possible encodings use `analyse` method.

To return the full list of possible encodings:
```javascript
var chardet = require('chardet');
chardet.detectAll(Buffer.from('hello there!'));
// or
chardet.detectFileAll('/path/to/file', function(err, encoding) {});
// or
chardet.detectFileAllSync('/path/to/file');
const chardet = require('chardet');
chardet.analyse(Buffer.from('hello there!'));
```

//Returned value is an array of objects sorted by confidence value in decending order
//e.g. [{ confidence: 90, name: 'UTF-8'}, {confidence: 20, name: 'windows-1252', lang: 'fr'}]
Returned value is an array of objects sorted by confidence value in decending order

```javascript
[
{ confidence: 90, name: 'UTF-8' },
{ confidence: 20, name: 'windows-1252', lang: 'fr' }
];
```

## Working with large data sets
Expand All @@ -44,38 +46,44 @@ Sometimes, when data set is huge and you want to optimize performace (in tradeof
you can sample only first N bytes of the buffer:

```javascript
chardet.detectFile('/path/to/file', { sampleSize: 32 }, function(err, encoding) {});
chardet
.detectFile('/path/to/file', { sampleSize: 32 })
.then(encoding => console.log(encoding));
```

## Supported Encodings:

* UTF-8
* UTF-16 LE
* UTF-16 BE
* UTF-32 LE
* UTF-32 BE
* ISO-2022-JP
* ISO-2022-KR
* ISO-2022-CN
* Shift-JIS
* Big5
* EUC-JP
* EUC-KR
* GB18030
* ISO-8859-1
* ISO-8859-2
* ISO-8859-5
* ISO-8859-6
* ISO-8859-7
* ISO-8859-8
* ISO-8859-9
* windows-1250
* windows-1251
* windows-1252
* windows-1253
* windows-1254
* windows-1255
* windows-1256
* KOI8-R

Currently only these encodings are supported, more will be added soon.
- UTF-8
- UTF-16 LE
- UTF-16 BE
- UTF-32 LE
- UTF-32 BE
- ISO-2022-JP
- ISO-2022-KR
- ISO-2022-CN
- Shift-JIS
- Big5
- EUC-JP
- EUC-KR
- GB18030
- ISO-8859-1
- ISO-8859-2
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1254
- windows-1255
- windows-1256
- KOI8-R

Currently only these encodings are supported.

## Typescript?

Yes. Type definitions are included.
141 changes: 0 additions & 141 deletions encoding/iso2022.js

This file was deleted.

Loading

0 comments on commit a6fca62

Please sign in to comment.