Producttitlesummarizationcorpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Alternatives To Producttitlesummarizationcorpus
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Sotawhat1,154
3 years ago14Python
Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.
Summarization Papers879
16 hours agoTeX
Summarization Papers
Text Summarization Papers245
3 years agoHTML
An Exhaustive Paper List for Text Summarization
Ml4se206
23 days ago1
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
Scisumm Corpus187
a year agocc-by-4.0
Scientific Document Summarization Corpus and Annotations from the WING NUS group.
Text Summarization Repo184
a year agobsd-3-clause
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.
Video Summarization With Lstm150
6 years ago10otherMatlab
Implementation of our ECCV 2016 Paper (Video Summarization with Long Short-term Memory)
Neusum118
4 years ago3Python
Code for the ACL 2018 paper "Neural Document Summarization by Jointly Learning to Score and Select Sentences"
Hiersumm116
4 years ago10apache-2.0Python
Code for paper Hierarchical Transformers for Multi-Document Summarization in ACL2019
Sa Papers108
5 years ago
📄 Deep Learning 中 Sentiment Analysis 論文統整與分析 😀😡☹️😭🙄🤢
Alternatives To Producttitlesummarizationcorpus
Select To Compare


Alternative Project Comparisons
Readme

Product Title Summarization(PTS) Corpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"

Description

Each line in corpus.txt consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited (two tabs) with the following format:

<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>

File

  • corpus: the dataset used in the cikm 2018 paper, the length of short title < 11.

  • big_corpus: much larger dataset, the length of short title < 13.

    We split the file into 5 files with prefix big_corpus.tar.gz_ due to the limitation on github.com (less than 100m).

    The way to reconstruct the big_corpus file:

    cd big_corpus
    cat big_corpus.tar.gz_* > big_corpus.tar.gz
    tar zxvf big_corpus.tar.gz
    

Note:

brand may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.

Citation

@inproceedings{Sun:CIKM2018,
author = {Fei Sun and Peng Jiang and Hanxiao Sun and Changhua Pei and Wenwu Ou and Xiaobo Wang},
title = {{Multi-Source Pointer Network for Product Title Summarization}},
booktitle = {CIKM},
year = 2018
}
Popular Summarization Projects
Popular Paper Projects
Popular Text Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Dataset
Paper
Corpus
Summarization
Text Summarization