Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Label Studio | 12,358 | 3 | an hour ago | 159 | June 16, 2022 | 458 | apache-2.0 | Python | ||
Label Studio is a multi-type data labeling and annotation tool with standardized output format | ||||||||||
Awesome Project Ideas | 6,856 | 10 days ago | 1 | mit | ||||||
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas | ||||||||||
Face_classification | 5,312 | 4 months ago | 54 | mit | Python | |||||
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV. | ||||||||||
Data Competition Topsolution | 2,847 | 2 years ago | 5 | |||||||
Data competition Top Solution 数据竞赛top解决方案开源整理 | ||||||||||
Cnn_sentence | 1,873 | 5 years ago | 42 | Python | ||||||
CNNs for sentence classification | ||||||||||
Fma | 1,773 | 3 months ago | 10 | mit | Jupyter Notebook | |||||
FMA: A Dataset For Music Analysis | ||||||||||
Universal Data Tool | 1,612 | a year ago | 173 | mit | JavaScript | |||||
Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app. | ||||||||||
3d Pointcloud | 1,374 | 5 days ago | 2 | Python | ||||||
Papers and Datasets about Point Cloud. | ||||||||||
Closerlookfewshot | 901 | a year ago | 52 | other | Python | |||||
source code to ICLR'19, 'A Closer Look at Few-shot Classification' | ||||||||||
K Bert | 793 | 7 months ago | 49 | Python | ||||||
Source code of K-BERT (AAAI2020) |
Codes and dataset for EMNLP2018 paper ‘‘Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification’’. (pdf)
You can download the datasets (small-scale, large-scale, and amazon-benchmark) at [Download]. The zip file should be decompressed and put in the root directory.
Download the pretrained Glove vectors [glove.840B.300d.zip]. Decompress the zip file and put the txt file in the root directory.
You can find arguments and hyper-parameters defined in train_batch.py with default values.
Under code/, use the following command for training any source-target pair from small-scale dataset:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset $dataset \
--source $source \
--target $target \
where --emb is the path to the pre-trained word embeddings. $dataset in ['small_1', 'small_2'] denotes the experimental setting 1 and 2 respectively on the small-scale dataset. $source and $target are domains from the small-scale dataset, both in ['book', 'electronics', 'beauty', 'music']. All other hyper-parameters are left as their defaults.
To train on any source-target pair from the large-scale dataset, use:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset large \
--source $source \
--target $target \
-b 250 \
--weight-entropy 0.2 \
--weight-discrepancy 500 \
where $source and $target are domains from the large-scale dataset, both in ['imdb', 'yelp2014', 'cell_phone', 'baby']. The batch_size -b is set to 250. The weights of target entropy loss and discrepancy loss are set to 0.2 and 500 respectively. All other hyper-parameters are left as their defaults.
To train on any source-target pair from the amazon benchmark, use:
CUDA_VISIBLE_DEVICES="0" python train_batch.py \
--emb ../glove.840B.300d.txt \
--dataset amazon \
--source $source \
--target $target \
--n-class 2 \
where $source and $target are domains from the amazon benchmark, both in ['book', 'dvd', 'electronics', 'kitchen']. --n-class denoting the number of output classes is set to 2 as we only consider binary classification (positive or negative) on this dataset. All other hyper-parameters are left as their defaults.
During training, the model's performance will be evaluated on development set at the end of each epoch. Accuracy and macro-F1 score on test set are recorded at the epoch where the model achieves the best classification accuracy on development set.
You can find the numerical results in Appendix Table 3 and Table 4. The current version of code is improved in batch sampling for the large-scale dataset. By running this code, an average of 2% macro-F1 improvements can be observed across all source-target pairs on the larget-scale dataset compared to results in Table 4 (c). The results on the small-scale dataset and amazon benchmark are not affected.
The code was only tested under the environment below:
If you use the code, please cite the following paper:
@InProceedings{he-EtAl:2018,
author = {He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel},
title = {Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
publisher = {Association for Computational Linguistics}
}