Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset wikipedia
dataset
x
wikipedia
x
44 search results found
Gensim Data
⭐
492
Data repository for pretrained NLP models and NLP corpora.
Narrativeqa
⭐
362
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
Chakin
⭐
313
Simple downloader for pre-trained word vectors
Wikitables
⭐
279
Import tables from any Wikipedia article as a dataset in Python
Datasets
⭐
192
Interesting datasets you could use with Algolia
Qb
⭐
160
QANTA Quiz Bowl AI
Legislator
⭐
90
Interface to the Comparative Legislators Database
Ambigqa
⭐
86
An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"
Awesome Wikipedia
⭐
76
A curated list of awesome Wikipedia-related frameworks, libraries, software, datasets and references.
Text Segmentation
⭐
73
Implementation of the paper: Text Segmentation as a Supervised Learning Task
Wiki Split
⭐
72
One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.
Toolbox
⭐
67
A collection of tools, APIs and other resources to use in creative coding web projects.
Wikipedia_ner
⭐
56
📖 Labeled examples from wiki dumps in Python
Tagme Reproducibility
⭐
55
Reproducibility of the TAGME entity linking system
Deep Exponential Families
⭐
53
Deep exponential families (DEFs)
Wiki Atomic Edits
⭐
47
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.
Twitter100k
⭐
43
Describing_a_knowledge_base
⭐
42
Code for Describing a Knowledge Base
Historical Populations
⭐
39
Historical US City populations
Small Open Datasets
⭐
35
A collection of automatically-updated, ready-to-use and open-licensed datasets
Web Scraping Using Python
⭐
33
This project scrapes Wikipedia for its articles using BeautifulSoup to create a dataset and then draws analysis on the collected data.
Wiki Cs Dataset
⭐
32
Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks
Wikipedia Summary Dataset
⭐
27
This dataset contains all titles and summaries (or introductions) of English Wikipedia articles, extracted in september of 2017. It could be useful if one wants to use the smaller, more concise, and more definitional summaries in their research. Or if one just wants to use a smaller but still diverse dataset for efficient training with resource constraints.
Wikir
⭐
25
A python tool for building large scale Wikipedia-based Information Retrieval datasets
Opencompare
⭐
25
Re Nlg Dataset
⭐
25
T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples
Chimpmark
⭐
16
ChimpMARK-2010 is a collection of massive real-world datasets, interesting real-world problems, and simple example code to solve them. Learn Big Data processing, benchmark your cluster, or compete on implementation!
Head Qa
⭐
15
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Documentclip
⭐
14
Cite Classifications Wiki
⭐
14
Citation Classification using hybrid neural network model for Wikipedia References
Infotabs Code
⭐
14
Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.
Selqa
⭐
13
Selection-based Question Answering
Textent
⭐
13
Representation Learning of Entities and Documents from Knowledge Base Descriptions
Salie
⭐
13
Salient Open Information Extraction
Opiec
⭐
12
Reading the data from OPIEC - an Open Information Extraction corpus
Word2vec
⭐
11
Re-implementation of Word2Vec using Tensorflow v2 Estimators and Datasets
Wikibrain
⭐
11
Wikipedia graph mining: dynamic structure of collective memory
Wikipedia Title Dataset
⭐
10
Dataset used for Learning Character-level Compositionality with Visual Features (ACL2017)
Pywikimm
⭐
9
Collects a multimodal dataset of Wikipedia articles and their images
Extract Wec
⭐
8
Extract links from Wikipedia pages to create a cross-document coreference dataset (multilingual support)
Aspect Based Summarization
⭐
8
EMNLP 2020 - Summarizing Text on Any Aspects
Expanda
⭐
8
The universal integrated corpus-building environment.
Are The Bots Really Fighting
⭐
8
A research project exploring revert patterns between bots on Wikipedia.
Wikisection
⭐
7
Dataset for Coherent Topic Segmentation and Classification
Wikipedia Categories
⭐
7
Cleansing Wikipedia Categories using Centrality
Datasets2sqlite
⭐
7
Scripts to convert some datasets to SQLite format.
Tarantool Wiki Lookup
⭐
7
Wikipedia category graph lookup
Pioner
⭐
6
Named-entity datasets and GloVe models for the Armenian language
Wikiloop Datasets
⭐
6
Java Datasets
⭐
5
Java library for parsing various datasets: ENRON email dataset, Wikipedia web pages, DBLP papers, Reuters news ...
Toxic Comments Classification
⭐
5
The dataset contains Wikipedia comments which have been labeled by human raters for toxic behavior.
Nationalitylist
⭐
5
Nationalities list in English and French
Susamuru
⭐
5
Aligning Reddit And Wikipedia
⭐
5
Ufc Data
⭐
5
Related Searches
Python Dataset (14,792)
Jupyter Notebook Dataset (6,824)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
Dataset Convolutional Neural Networks (1,264)
Python Wikipedia (1,264)
Dataset Paper (1,252)
1-44 of 44 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.