Awesome Open Source

Programming Languages

Search results for dataset wikipedia

44 search results found

Gensim Data ⭐ 492

Data repository for pretrained NLP models and NLP corpora.

Narrativeqa ⭐ 362

This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

Simple downloader for pre-trained word vectors

Wikitables ⭐ 279

Import tables from any Wikipedia article as a dataset in Python

Datasets ⭐ 192

Interesting datasets you could use with Algolia

QANTA Quiz Bowl AI

Legislator ⭐ 90

Interface to the Comparative Legislators Database

An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"

Awesome Wikipedia ⭐ 76

A curated list of awesome Wikipedia-related frameworks, libraries, software, datasets and references.

Text Segmentation ⭐ 73

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Wiki Split ⭐ 72

One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.

A collection of tools, APIs and other resources to use in creative coding web projects.

Wikipedia_ner ⭐ 56

📖 Labeled examples from wiki dumps in Python

Tagme Reproducibility ⭐ 55

Reproducibility of the TAGME entity linking system

Deep Exponential Families ⭐ 53

Deep exponential families (DEFs)

Wiki Atomic Edits ⭐ 47

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

Twitter100k ⭐ 43

Describing_a_knowledge_base ⭐ 42

Code for Describing a Knowledge Base

Historical Populations ⭐ 39

Historical US City populations

Small Open Datasets ⭐ 35

A collection of automatically-updated, ready-to-use and open-licensed datasets

Web Scraping Using Python ⭐ 33

This project scrapes Wikipedia for its articles using BeautifulSoup to create a dataset and then draws analysis on the collected data.

Wiki Cs Dataset ⭐ 32

Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks

Wikipedia Summary Dataset ⭐ 27

This dataset contains all titles and summaries (or introductions) of English Wikipedia articles, extracted in september of 2017. It could be useful if one wants to use the smaller, more concise, and more definitional summaries in their research. Or if one just wants to use a smaller but still diverse dataset for efficient training with resource constraints.

A python tool for building large scale Wikipedia-based Information Retrieval datasets

Opencompare ⭐ 25

Re Nlg Dataset ⭐ 25

T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples

Chimpmark ⭐ 16

ChimpMARK-2010 is a collection of massive real-world datasets, interesting real-world problems, and simple example code to solve them. Learn Big Data processing, benchmark your cluster, or compete on implementation!

HEAD-QA: A Healthcare Dataset for Complex Reasoning

Documentclip ⭐ 14

Cite Classifications Wiki ⭐ 14

Citation Classification using hybrid neural network model for Wikipedia References

Infotabs Code ⭐ 14

Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

Selection-based Question Answering

Representation Learning of Entities and Documents from Knowledge Base Descriptions

Salient Open Information Extraction

Reading the data from OPIEC - an Open Information Extraction corpus

Word2vec ⭐ 11

Re-implementation of Word2Vec using Tensorflow v2 Estimators and Datasets

Wikibrain ⭐ 11

Wikipedia graph mining: dynamic structure of collective memory

Wikipedia Title Dataset ⭐ 10

Dataset used for Learning Character-level Compositionality with Visual Features (ACL2017)

Collects a multimodal dataset of Wikipedia articles and their images

Extract Wec ⭐ 8

Extract links from Wikipedia pages to create a cross-document coreference dataset (multilingual support)

Aspect Based Summarization ⭐ 8

EMNLP 2020 - Summarizing Text on Any Aspects

The universal integrated corpus-building environment.

Are The Bots Really Fighting ⭐ 8

A research project exploring revert patterns between bots on Wikipedia.

Wikisection ⭐ 7

Dataset for Coherent Topic Segmentation and Classification

Wikipedia Categories ⭐ 7

Cleansing Wikipedia Categories using Centrality

Datasets2sqlite ⭐ 7

Scripts to convert some datasets to SQLite format.

Tarantool Wiki Lookup ⭐ 7

Wikipedia category graph lookup

Named-entity datasets and GloVe models for the Armenian language

Wikiloop Datasets ⭐ 6

Java Datasets ⭐ 5

Java library for parsing various datasets: ENRON email dataset, Wikipedia web pages, DBLP papers, Reuters news ...

Toxic Comments Classification ⭐ 5

The dataset contains Wikipedia comments which have been labeled by human raters for toxic behavior.

Nationalitylist ⭐ 5

Nationalities list in English and French

Aligning Reddit And Wikipedia ⭐ 5

Related Searches

Python Dataset (14,792)

Jupyter Notebook Dataset (6,824)

Deep Learning Dataset (2,364)

Machine Learning Dataset (2,279)

Dataset Pytorch (1,847)

Dataset Tensorflow (1,583)

Dataset Classification (1,500)

Dataset Convolutional Neural Networks (1,264)

Python Wikipedia (1,264)

Dataset Paper (1,252)

1-44 of 44 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.