Awesome Open Source
Awesome Open Source

Chinese Open Information Extraction (Zhopenie)

Installation

This module makes heavily use of pyltp

  1. Install pyltp
    pip install pyltp
    
  2. Download NLP model from 百度雲

Why use LTP?

LTP has an excellent semantic parsing module shown below: Alt text

Also, in general, LTP performs better than other open-source Chinese NLP libraries,like Jieba ,here's the comparison on word tokenization for SIGHAN Bakeoff 2005 PKU, 510KB dataset: Alt text

Usage

The extractor module tries to break down a Chinese sentence into a Triple relation (e1, e2, r), which can be understood by computer
e.g. 星展集团是亚洲最大的金融服务集团之一, 拥有约3千5百亿美元资产和超过280间分行, 业务遍及18个市场。
are parsed as follows:

e1:星展集团, e2:亚洲最大的金融服务集团之一, r:是
e1:星展集团, e2:约3千5百亿美元资产, r:拥有
e1:业务, e2:18个市场, r:遍及

However, this extractor is about ~70% accurate and is still under improvement at this moment. Feel free to comment and make pull request.

Credits

哈工大社会计算与信息检索研究中心研制的语言技术平台 LTP


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (51,910
nlp (1,060
chinese (194
relation-extraction (61
semantic-web (40
chinese-nlp (32

Find Open Source By Browsing 7,000 Topics Across 59 Categories