Kr Wordrank

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Alternatives To Kr Wordrank
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Cluedatasetsearch2,778
7 months ago6Python
搜索所有中文NLP数据集,附常用英文NLP数据集
Harvesttext1,803
5 months ago36July 02, 202212mitPython
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Awesome Text Summarization1,179
6 months ago2mit
The guide to tackle with the Text Summarization
Text Analytics With Python1,073
2 years agoapache-2.0Jupyter Notebook
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Summarization Papers898
a month agoTeX
Summarization Papers
Textrank8361854 years ago10January 16, 201915mitPython
TextRank implementation for Python 3.
Kr Wordrank31835a year ago11August 14, 2020otherPython
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Text_summurization_abstractive_methods310
3 years ago12Jupyter Notebook
Multiple implementations for abstractive text summurization , using google colab
Brio260
16 days ago19Python
ACL 2022: BRIO: Bringing Order to Abstractive Summarization
Macropodus25612 years ago7December 25, 20201mitPython
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number)
Alternatives To Kr Wordrank
Select To Compare


Alternative Project Comparisons
Readme

KR-WordRank: Unsupervised Korean Word & Keyword Extractor

Keyword extraction

Substring graph substring (min count) substring (max length) .

from krwordrank.word import KRWordRank

min_count = 5   #     (  )
max_length = 10 #   
wordrank_extractor = KRWordRank(min_count=min_count, max_length=max_length)

KR-WordRank PageRank graph ranking (HITS algorithm ). Substring graph node (substrig) graph ranking parameters .

beta = 0.85    # PageRank decaying factor beta
max_iter = 10
texts = ['  ', '  list of str ', ... ]
keywords, rank, graph = wordrank_extractor.extract(texts, beta, max_iter)

Graph ranking (substrings) . '' () tutorials .

for word, r in sorted(keywords.items(), key=lambda x:x[1], reverse=True)[:30]:
        print('%8s:\t%.4f' % (word, r))
  :    229.7889
 :   112.3404
  :    78.4055
  :    37.6247
  :    37.2504
        ....

Python wordcloud package word cloud figure .

Figure (stopwords) passwords . dict {:} .

stopwords = {'', '', '', '', ''}
passwords = {word:score for word, score in sorted(
    keywords.items(), key=lambda x:-x[1])[:300] if not (word in stopwords)}
summarize_with_keywords    .
from krwordrank.word import summarize_with_keywords

keywords = summarize_with_keywords(texts, min_count=5, max_length=10,
    beta=0.85, max_iter=10, stopwords=stopwords, verbose=True)
keywords = summarize_with_keywords(texts) # with default arguments

wordcloud .

pip install wordcloud

wordcloud . font_path . (width, height) (background_color) , generate_from_frequencies() .

from wordcloud import WordCloud

# Set your font path
font_path = 'YOUR_FONT_DIR/truetype/nanum/NanumBarunGothic.ttf'

krwordrank_cloud = WordCloud(
    font_path = font_path,
    width = 800,
    height = 800,
    background_color="white"
)

krwordrank_cloud = krwordrank_cloud.generate_from_frequencies(passwords)

Jupyter notebook %matplotlib inline . .py .

%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10, 10))
plt.imshow(krwordrank_cloud, interpolation="bilinear")
plt.show()
. 
fig.savefig('./lalaland_wordcloud.png')

.

Key-sentence extraction

KR-WordRank >= 1.0.0 key sentence extraction . KR-WordRank TextRank . KR-WordRank keywords . , .

summarize_with_sentences texts KR-WordRank .

from krwordrank.sentence import summarize_with_sentences

texts = [] #  
keywords, sents = summarize_with_sentences(texts, num_keywords=100, num_keysents=10)

keywords KR-WordRank num_keywords dict{str:float} .

{'': 201.02402099523516,
 '': 81.53699026386887,
 '': 40.53709233921311,
 '': 40.43446188536923,
 '': 38.598509495213484,
 '': 23.198810378709844,
 '': 21.810147306627464,
 '': 20.638511587426862,
 '': 20.43744237599688,
 '': 20.324710458174806,
 '': 20.283994278960186,
 '': 19.471356929084546,
 '': 19.06433920013137,
 '': 18.732801785265316,
 ...
}

sents num_sents list of str .

['                    30      ',
 '                            ',
 '                                ',
 '                     ost',
 '                          ',
 '                          ',
 '                    6 7        ',
 '                       ',
 '                                   ',
 '              ']
 .        penalty  .   25  80    . stopwords   .      .              .  `diversity`     . `diversity`         .      .
penalty = lambda x:0 if (25 <= len(x) <= 80) else 1
stopwords = {'', '', '', '', ''}

keywords, sents = summarize_with_sentences(
    texts,
    penalty=penalty,
    stopwords = stopwords,
    diversity=0.5,
    num_keywords=100,
    num_keysents=10,
    verbose=False
)

,, `` stopwords .

{'': 40.43446188536923,
 '': 38.598509495213484,
 '': 23.198810378709844,
 '': 21.810147306627464,
 '': 20.638511587426862,
 '': 20.43744237599688,
 '': 20.324710458174806,
 '': 20.283994278960186,
 '': 19.471356929084546,
 '': 18.732801785265316,
 ...
}

25 ~ 80 .

['        10   ',
 '               ',
 '              ',
 '                ',
 '               ',
 '               ',
 '               ',
 '               ',
 '                 ',
 '           ']

`` penalty .

penalty=lambda x:0 if (25 <= len(x) <= 80 and not '' in x) else 1,
keywords, sents = summarize_with_sentences(
    texts,
    penalty=penalty,
    stopwords = stopwords,
    diversity=0.5,
    num_keywords=100,
    num_keysents=10,
    verbose=False
)

print(sents)
['               ',
 '               ',
 '               ',
 '                 ',
 '                ',
 '               ',
 '                ',
 '16               ',
 '                 ',
 '               ']

key sentence extraction tutorials tutorials krwordrank_keysentence.ipynb .

Setup

pip install krwordrank

tested in

  • python 3.5.9
  • python 3.7.7

Requirements

  • Python >= 3.5
  • numpy
  • scipy

Analytics

Popular Natural Language Processing Projects
Popular Text Summarization Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Nlp
Text Summarization