Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for article corpus
article
x
corpus
x
16 search results found
Fakenewscorpus
⭐
184
A dataset of millions of news articles scraped from a curated list of data sources.
Sentence Compression
⭐
90
Large corpus of uncompressed and compressed sentences from news articles.
Socc
⭐
79
SFU Opinion and Comments Corpus
Curation Corpus
⭐
77
Code for obtaining the Curation Corpus abstractive text summarisation dataset
Russian_news_corpus
⭐
76
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Nlp For Hindi
⭐
59
State of the Art Language models and Classifier for Hindi language (spoken in Indian sub-continent)
Allofplos
⭐
53
Repository for the allofplos project.
Oa Stm Corpus
⭐
44
Corpus of Open Access articles from multiple fields in Science, Technology, and Medicine.
News Analyze
⭐
40
Analyze topics and trends in news with NLP
News Media Reliability
⭐
32
Kptimes
⭐
25
Repository for KPTimes corpus
Transfer Learning Bner Bioinformatics 2018
⭐
25
This repository contains supplementary data, and links to the model and corpora used for the paper: Transfer learning for biomedical named entity recognition with neural networks.
Media_frames_corpus
⭐
20
A set of media framing annotations, along with scripts for obtaining the corresponding news articles
Topic Modelling On Wiki Corpus
⭐
19
It uses Latent Dirichlet Allocation algorithm to discover hidden topics from the articles. It is trained on 60,000 articles taken from simple wikipedia english corpus. Finally, It can extract the topic of the given input text article.
Text Scraping Document Clustering Topic Modeling
⭐
19
The objective of this project is to scrape a corpus of news articles from a set of web pages, pre-process the corpus, and then to apply unsupervised clustering algorithms to explore and summarise the contents of the corpus. Part 1. Text Data Scraping This part of the project should be implemented as a Python script 1. Identify the URLs for all news articles listed on the website: http://mlg.ucd.ie/modules/COMP41680/news/index.htm 2. Retrieve all web pages corresponding to these article URLs.
Wikicorpusextractor
⭐
19
Extracts text from WikiMedia XML Dump files
Malayalam Newspaper Article Dataset
⭐
18
The project scraps articles from a malayalam newspaper website to create a corpus. A set of queries is created and corresponding ground truth answers is retrieved. This can be used as a dataset that can check new tools in future like malaylam stemmer, stopwords removal, lemmatizers, etc...
Quanteda.corpora
⭐
15
A collection of corpora for quanteda
Prachathai 67k
⭐
13
News Article Corpus from Prachathai.com
Elang
⭐
13
Word Embedding utilities for Language Models (English & Indonesian)
Opiec
⭐
12
Reading the data from OPIEC - an Open Information Extraction corpus
Uscorpus
⭐
12
Urdu Summary Corpus and Software Tools Version 1.0
Textsummarizer
⭐
11
A text summarization tool for Marathi implemented as a project for course Adavanced NLP (CSCI 544)
Similar Posts
⭐
11
Pelican plugin to list similar posts to articles, based on a vector space model.
Texttiling
⭐
10
Implementation of the TextTiling algorithm for CS187
Codenames
⭐
10
AI for CodeNames
Emnlp2016 Empirical Convincingness
⭐
9
Code and data for EMNLP2016 article "What makes a convincing argument? Empirical analysis and detecting attributes of convincingness in Web argumentation" by Ivan Habernal and Iryna Gurevych
Biomedical Corpora
⭐
9
A collection of annotated biomedical corpora, which can be used for training supervised machine learning methods for various tasks in biomedical text-mining and information extraction.
Corpus_golunov_articles
⭐
9
Свободу Ивану Голунову! http://gg.gg/golunov-petition
Newsgen
⭐
7
Newsgen: The Fake News Generator
Hacker_news_article_topics
⭐
7
Generate topic models, keywords, and word clouds for HN articles
Oa_nlp
⭐
7
Tools to access peer reviewed research published under the Creative Commons License.
Polnear
⭐
7
Corpus of Attribution-Annotated news articles covering the campaigns during the year leading up to the 2016 US Presidential election.
Financial News Data
⭐
6
Construct a structured DataFrame from the Reuters news corpus
Covid19 Corpus
⭐
6
COVID-19 corpus with annotated biomedical entities.
Kowikitext
⭐
6
Towards Reliable Bioner
⭐
5
This repository contains the corpora and supplementary data, along with instructions for recreating the experiments, for our paper: "Towards reliable named entity recognition in the biomedical domain".
Kloop Corpus
⭐
5
Synthesis Database Public
⭐
5
Codebase for compiling a database of materials syntheses
Related Searches
Javascript Article (2,975)
Python Corpus (2,447)
Python Article (2,391)
Html Article (1,105)
Php Article (1,078)
Natural Language Processing Corpus (510)
Dataset Corpus (342)
Java Corpus (308)
Language Corpus (261)
1-16 of 16 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.