About this project

This project extracts the text from an article using Python Article Library and uses NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article

Tools

Newspaper3k: tool to scrape article
NLTK: tool to process text

Steps

Scrape articles with newspaper3k

from newspaper import Article

url = 'https://mystudentvoices.com/it-took-me-2-years-to-get-1000-followers-life-lessons-ive-learned-throughout-the-journey-9bc44f2959f0'
article = Article(url)

article.download()

Find the publish date

article.publish_date

Extract image
Find the author
Find the keywords
Find the summary
Preprocessing with NLTK
- Tokenize text
- Lowercase and remove stopwords
Visualization the frequency of words with Matplotlib

Tutorial blog

Find the Medium article for this repository here

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
Find common words in article-2.ipynb		Find common words in article-2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

Find common words in article-2.ipynb

Find common words in article-2.ipynb

README.md

README.md

Repository files navigation

About this project

Tools

Steps

Tutorial blog

About

Releases

Packages

Languages

khuyentran1401/Extract-text-from-article

Folders and files

Latest commit

History

Repository files navigation

About this project

Tools

Steps

Tutorial blog

About

Topics

Resources

Stars

Watchers

Forks

Languages