Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Arxivtimes | 3,709 | a year ago | 2,071 | mit | ||||||
repository to research & share the machine learning articles | ||||||||||
Data Science | 3,708 | 2 days ago | 5 | Jupyter Notebook | ||||||
Collection of useful data science topics along with articles, videos, and code | ||||||||||
News Please | 1,612 | 6 | 5 days ago | 118 | April 04, 2022 | 20 | apache-2.0 | Python | ||
news-please - an integrated web crawler and information extractor for news that just works | ||||||||||
Pke | 1,391 | 1 | a month ago | 1 | September 01, 2021 | 5 | gpl-3.0 | Python | ||
Python Keyphrase Extraction module | ||||||||||
Paperai | 915 | 2 months ago | 10 | March 12, 2022 | apache-2.0 | Python | ||||
📄 🤖 Semantic search and workflows for medical/scientific papers | ||||||||||
Nlp In Practice | 861 | 2 years ago | 1 | Jupyter Notebook | ||||||
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more. | ||||||||||
Inltk | 760 | a year ago | 24 | October 11, 2020 | 24 | mit | Python | |||
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need | ||||||||||
Machine Learning Articles | 519 | 4 years ago | ||||||||
Monthly Series - Top 10 Machine Learning Articles | ||||||||||
Headlines | 506 | 5 years ago | 28 | mit | Jupyter Notebook | |||||
Automatically generate headlines to short articles | ||||||||||
Building A Simple Chatbot In Python Using Nltk | 482 | a month ago | 6 | Jupyter Notebook | ||||||
Building a Simple Chatbot from Scratch in Python (using NLTK) |
pke
- python keyphrase extractionpke
is an open source python-based keyphrase extraction toolkit. It
provides an end-to-end keyphrase extraction pipeline in which each component can
be easily modified or extended to develop new models. pke
also allows for
easy benchmarking of state-of-the-art keyphrase extraction models, and
ships with supervised models trained on the
SemEval-2010 dataset.
To pip install pke
from github:
pip install git+https://github.com/boudinfl/pke.git
pke
relies on spacy
(>= 3.2.3) for text processing and requires models to be installed:
# download the english model
python -m spacy download en_core_web_sm
pke
provides a standardized API for extracting keyphrases from a document.
Start by typing the 5 lines below. For using another model, simply replace
pke.unsupervised.TopicRank
with another model (list of implemented models).
import pke
# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()
# load the content of the document, here document is expected to be a simple
# test string and preprocessing is carried out using spacy
extractor.load_document(input='text', language='en')
# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
extractor.candidate_selection()
# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting()
# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10)
A detailed example is provided in the examples/
directory.
To get your hands dirty with pke
, we invite you to try our tutorials out.
Name | Link |
---|---|
Getting started with pke and keyphrase extraction |
|
Model parameterization | |
Benchmarking models |
pke
currently implements the following keyphrase extraction models:
For comparison purposes, overall results of implemented models on commonly-used benchmark datasets are available in results.
Code for reproducing these experiments are in the benchmarking notebook
(also available on ).
If you use pke
, please cite the following paper:
@InProceedings{boudin:2016:COLINGDEMO,
author = {Boudin, Florian},
title = {pke: an open source python-based keyphrase extraction toolkit},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
month = {December},
year = {2016},
address = {Osaka, Japan},
pages = {69--73},
url = {http://aclweb.org/anthology/C16-2015}
}