Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Cluedatasetsearch | 2,778 | 7 months ago | 6 | Python | ||||||
搜索所有中文NLP数据集,附常用英文NLP数据集 | ||||||||||
Harvesttext | 1,803 | 5 months ago | 36 | July 02, 2022 | 12 | mit | Python | |||
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法 | ||||||||||
Awesome Text Summarization | 1,179 | 6 months ago | 2 | mit | ||||||
The guide to tackle with the Text Summarization | ||||||||||
Text Analytics With Python | 1,073 | 2 years ago | apache-2.0 | Jupyter Notebook | ||||||
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer. | ||||||||||
Summarization Papers | 898 | a month ago | TeX | |||||||
Summarization Papers | ||||||||||
Textrank | 836 | 18 | 5 | 4 years ago | 10 | January 16, 2019 | 15 | mit | Python | |
TextRank implementation for Python 3. | ||||||||||
Kr Wordrank | 318 | 3 | 5 | a year ago | 11 | August 14, 2020 | other | Python | ||
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다 | ||||||||||
Text_summurization_abstractive_methods | 310 | 3 years ago | 12 | Jupyter Notebook | ||||||
Multiple implementations for abstractive text summurization , using google colab | ||||||||||
Brio | 260 | 16 days ago | 19 | Python | ||||||
ACL 2022: BRIO: Bringing Order to Abstractive Summarization | ||||||||||
Macropodus | 256 | 1 | 2 years ago | 7 | December 25, 2020 | 1 | mit | Python | ||
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number) |
Substring graph substring (min count) substring (max length) .
from krwordrank.word import KRWordRank
min_count = 5 # ( )
max_length = 10 #
wordrank_extractor = KRWordRank(min_count=min_count, max_length=max_length)
KR-WordRank PageRank graph ranking (HITS algorithm ). Substring graph node (substrig) graph ranking parameters .
beta = 0.85 # PageRank decaying factor beta
max_iter = 10
texts = [' ', ' list of str ', ... ]
keywords, rank, graph = wordrank_extractor.extract(texts, beta, max_iter)
Graph ranking (substrings) . '' () tutorials .
for word, r in sorted(keywords.items(), key=lambda x:x[1], reverse=True)[:30]:
print('%8s:\t%.4f' % (word, r))
: 229.7889
: 112.3404
: 78.4055
: 37.6247
: 37.2504
....
Python wordcloud package word cloud figure .
Figure (stopwords) passwords . dict {:} .
stopwords = {'', '', '', '', ''}
passwords = {word:score for word, score in sorted(
keywords.items(), key=lambda x:-x[1])[:300] if not (word in stopwords)}
summarize_with_keywords .
from krwordrank.word import summarize_with_keywords
keywords = summarize_with_keywords(texts, min_count=5, max_length=10,
beta=0.85, max_iter=10, stopwords=stopwords, verbose=True)
keywords = summarize_with_keywords(texts) # with default arguments
wordcloud .
pip install wordcloud
wordcloud . font_path . (width, height) (background_color) , generate_from_frequencies() .
from wordcloud import WordCloud
# Set your font path
font_path = 'YOUR_FONT_DIR/truetype/nanum/NanumBarunGothic.ttf'
krwordrank_cloud = WordCloud(
font_path = font_path,
width = 800,
height = 800,
background_color="white"
)
krwordrank_cloud = krwordrank_cloud.generate_from_frequencies(passwords)
Jupyter notebook %matplotlib inline . .py .
%matplotlib inline
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 10))
plt.imshow(krwordrank_cloud, interpolation="bilinear")
plt.show()
.
fig.savefig('./lalaland_wordcloud.png')
.
KR-WordRank >= 1.0.0
key sentence extraction . KR-WordRank TextRank . KR-WordRank keywords . , .
summarize_with_sentences texts KR-WordRank .
from krwordrank.sentence import summarize_with_sentences
texts = [] #
keywords, sents = summarize_with_sentences(texts, num_keywords=100, num_keysents=10)
keywords KR-WordRank num_keywords
dict{str:float} .
{'': 201.02402099523516,
'': 81.53699026386887,
'': 40.53709233921311,
'': 40.43446188536923,
'': 38.598509495213484,
'': 23.198810378709844,
'': 21.810147306627464,
'': 20.638511587426862,
'': 20.43744237599688,
'': 20.324710458174806,
'': 20.283994278960186,
'': 19.471356929084546,
'': 19.06433920013137,
'': 18.732801785265316,
...
}
sents num_sents
list of str .
[' 30 ',
' ',
' ',
' ost',
' ',
' ',
' 6 7 ',
' ',
' ',
' ']
. penalty . 25 80 . stopwords . . . `diversity` . `diversity` . .
penalty = lambda x:0 if (25 <= len(x) <= 80) else 1
stopwords = {'', '', '', '', ''}
keywords, sents = summarize_with_sentences(
texts,
penalty=penalty,
stopwords = stopwords,
diversity=0.5,
num_keywords=100,
num_keysents=10,
verbose=False
)
,
, `` stopwords .
{'': 40.43446188536923,
'': 38.598509495213484,
'': 23.198810378709844,
'': 21.810147306627464,
'': 20.638511587426862,
'': 20.43744237599688,
'': 20.324710458174806,
'': 20.283994278960186,
'': 19.471356929084546,
'': 18.732801785265316,
...
}
25 ~ 80 .
[' 10 ',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' ']
`` penalty
.
penalty=lambda x:0 if (25 <= len(x) <= 80 and not '' in x) else 1,
keywords, sents = summarize_with_sentences(
texts,
penalty=penalty,
stopwords = stopwords,
diversity=0.5,
num_keywords=100,
num_keysents=10,
verbose=False
)
print(sents)
[' ',
' ',
' ',
' ',
' ',
' ',
' ',
'16 ',
' ',
' ']
key sentence extraction tutorials tutorials krwordrank_keysentence.ipynb .
pip install krwordrank
tested in