Das

Code and datasets for EMNLP2018 paper ‘‘Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification’’.
Alternatives To Das
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Label Studio12,3583an hour ago159June 16, 2022458apache-2.0Python
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Awesome Project Ideas6,856
10 days ago1mit
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Face_classification5,312
4 months ago54mitPython
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.
Data Competition Topsolution2,847
2 years ago5
Data competition Top Solution 数据竞赛top解决方案开源整理
Cnn_sentence1,873
5 years ago42Python
CNNs for sentence classification
Fma1,773
3 months ago10mitJupyter Notebook
FMA: A Dataset For Music Analysis
Universal Data Tool1,612
a year ago173mitJavaScript
Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
3d Pointcloud1,374
5 days ago2Python
Papers and Datasets about Point Cloud.
Closerlookfewshot901
a year ago52otherPython
source code to ICLR'19, 'A Closer Look at Few-shot Classification'
K Bert793
7 months ago49Python
Source code of K-BERT (AAAI2020)
Alternatives To Das
Select To Compare


Alternative Project Comparisons
Readme

Domain Adaptive Semi-supervised learning (DAS)

Codes and dataset for EMNLP2018 paper ‘‘Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification’’. (pdf)

Dataset & pretrained word embeddings

You can download the datasets (small-scale, large-scale, and amazon-benchmark) at [Download]. The zip file should be decompressed and put in the root directory.

Download the pretrained Glove vectors [glove.840B.300d.zip]. Decompress the zip file and put the txt file in the root directory.

Train & evaluation

You can find arguments and hyper-parameters defined in train_batch.py with default values.

Under code/, use the following command for training any source-target pair from small-scale dataset:

CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset $dataset \
--source $source \
--target $target \

where --emb is the path to the pre-trained word embeddings. $dataset in ['small_1', 'small_2'] denotes the experimental setting 1 and 2 respectively on the small-scale dataset. $source and $target are domains from the small-scale dataset, both in ['book', 'electronics', 'beauty', 'music']. All other hyper-parameters are left as their defaults.

To train on any source-target pair from the large-scale dataset, use:

CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset large \
--source $source \
--target $target \
-b 250 \
--weight-entropy 0.2 \
--weight-discrepancy 500 \

where $source and $target are domains from the large-scale dataset, both in ['imdb', 'yelp2014', 'cell_phone', 'baby']. The batch_size -b is set to 250. The weights of target entropy loss and discrepancy loss are set to 0.2 and 500 respectively. All other hyper-parameters are left as their defaults.

To train on any source-target pair from the amazon benchmark, use:

CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset amazon \
--source $source \
--target $target \
--n-class 2 \

where $source and $target are domains from the amazon benchmark, both in ['book', 'dvd', 'electronics', 'kitchen']. --n-class denoting the number of output classes is set to 2 as we only consider binary classification (positive or negative) on this dataset. All other hyper-parameters are left as their defaults.

During training, the model's performance will be evaluated on development set at the end of each epoch. Accuracy and macro-F1 score on test set are recorded at the epoch where the model achieves the best classification accuracy on development set.

About the adaptation results

You can find the numerical results in Appendix Table 3 and Table 4. The current version of code is improved in batch sampling for the large-scale dataset. By running this code, an average of 2% macro-F1 improvements can be observed across all source-target pairs on the larget-scale dataset compared to results in Table 4 (c). The results on the small-scale dataset and amazon benchmark are not affected.

Dependencies

The code was only tested under the environment below:

  • Python 2.7
  • Keras 2.1.2
  • tensorflow 1.4.1
  • numpy 1.13.3

Cite

If you use the code, please cite the following paper:

@InProceedings{he-EtAl:2018,
  author    = {He, Ruidan  and  Lee, Wee Sun  and  Ng, Hwee Tou  and  Dahlmeier, Daniel},
  title     = {Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  publisher = {Association for Computational Linguistics}
}
Popular Classification Projects
Popular Dataset Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Dataset
Classification
Benchmark
Domain Adaptation