Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nodejieba | 2,785 | 351 | 98 | 4 months ago | 62 | January 09, 2022 | 83 | mit | TypeScript | |
"结巴"中文分词的Node.js版本 | ||||||||||
Badger System | 96 | a year ago | 94 | mit | Python | |||||
Openxdocs | 3 | 2 years ago | other | |||||||
Docs for the openx/opensolar project hosted by GitBook | ||||||||||
Hwiki | 2 | 2 months ago | ||||||||
My personal wiki. Web version available at: |
NodeJieba
是"结巴"中文分词的 Node.js 版本实现,
由CppJieba提供底层分词算法实现,
是兼具高性能和易用性两者的 Node.js 中文分词组件。
对实现细节感兴趣的请看如下博文:
npm install nodejieba
由于默认源速度很慢并且 GitHub 访问不稳定,可以使用国内镜像,命令如下:
npm install nodejieba --registry=https://registry.npmmirror.com --nodejieba_binary_host_mirror=https://registry.npmmirror.com/-/binary/nodejieba/
import { cut } from "nodejieba";
const result = cut("南京市长江大桥");
console.log(result);
//["南京市","长江大桥"]
更详细的其他用法请看 测试案例
如果没有主动调用词典函数时,则会在第一次调用 cut
等功能函数时,自动载入默认词典。
如果要主动触发词典载入,则使用以下函数主动触发。
import { load } from "nodejieba";
load();
以上用法会自动载入所有默认词典。
如果需要载入自己的词典,而不是默认词典,你需要传递参数。
比如载入自己的用户词典:
import { load } from "nodejieba";
load({
userDict: "./test/testdata/userdict.utf8",
});
字典载入函数 load 的参数项都是可选的,如果没有对应的项则自动填充默认参数。 所以上面这段代码和下面这代代码是等价的。
import {
DEFAULT_DICT,
DEFAULT_HMM_DICT,
DEFAULT_IDF_DICT,
DEFAULT_STOP_WORD_DICT,
load,
} from "nodejieba";
load({
dict: DEFAULT_DICT,
hmmDict: DEFAULT_HMM_DICT,
userDict: "./test/testdata/userdict.utf8",
idfDict: DEFAULT_IDF_DICT,
stopWordDict: DEFAULT_STOP_WORD_DICT,
});
import { tag } from "nodejieba";
console.log(tag("红掌拨清波"));
//[ { word: '红掌', tag: 'n' },
// { word: '拨', tag: 'v' },
// { word: '清波', tag: 'n' } ]
更详细的其他用法请看 测试案例
import { extract, textRankExtract } from "nodejieba";
const topN = 4;
console.log(extract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: 'CEO', weight: 11.739204307083542 },
// { word: '升职', weight: 10.8561552143 },
// { word: '加薪', weight: 10.642581114 },
// { word: '巅峰', weight: 9.49395840471 } ]
console.log(textRankExtract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: '当上', weight: 1 },
// { word: '不用', weight: 0.9898479330698993 },
// { word: '多久', weight: 0.9851260595435759 },
// { word: '加薪', weight: 0.9830464899847804 },
// { word: '升职', weight: 0.9802777682279076 } ]
更详细的其他用法请看 test/demo.js
v16
v18
v20
性能杠杠的,应该是目前性能最好的 Node.js 中文分词库,没有之一。 详见: Jieba 中文分词系列性能评测
http://cppjieba-webdemo.herokuapp.com/ (chrome is suggested)
Email: [email protected]
This project exists thanks to all the people who contribute.
Become a financial contributor and help us sustain our community. [Contribute]
Support this project with your organization. Your logo will show up here with a link to your website. Contribute