RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
pip install rake-nltk
git clone https://github.com/csurfer/rake-nltk.git python rake-nltk/setup.py install
from rake_nltk import Rake # Uses stopwords for english from NLTK, and all puntuation characters by # default r = Rake() # Extraction given the text. r.extract_keywords_from_text(<text to process>) # Extraction given the list of strings where each string is a sentence. r.extract_keywords_from_sentences(<list of sentences>) # To get keyword phrases ranked highest to lowest. r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest with scores. r.get_ranked_phrases_with_scores()
If you see a stopwords error, it means that you do not have the corpus
stopwords downloaded from NLTK. You can download it using command below.
python -c "import nltk; nltk.download('stopwords')"
This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley
Please use issue tracker for reporting bugs or feature requests.
pip install poetry.
poetry installto create project's virtual environment.
poetry run tox(Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.
pip install pre-commitand run
pre-commit run --all-filesto do lint checks.
poetry run sphinx-build -b html docs/ docs/_build/html.
requirements.txtfor automated testing using
poetry export --dev --without-hashes -f requirements.txt > requirements.txt.
If you found the utility helpful you can buy me a cup of coffee using