Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Snscrape | 3,009 | 5 | 8 days ago | 11 | January 06, 2022 | 60 | gpl-3.0 | Python | ||
A social networking service scraper in Python | ||||||||||
Monkeyking | 2,719 | 90 | 10 months ago | 46 | June 14, 2021 | 17 | mit | Swift | ||
MonkeyKing helps you to post messages to Chinese Social Networks. | ||||||||||
Sinaweibopy | 1,240 | 27 | 1 | 2 years ago | 9 | June 24, 2013 | 14 | apache-2.0 | Python | |
新浪微博Python SDK | ||||||||||
Social Share Button | 569 | 924 | 5 | 2 years ago | 45 | April 05, 2021 | 30 | mit | CoffeeScript | |
Helper for add social share feature in your Rails app. Twitter, Facebook, Weibo, Douban ... | ||||||||||
Golden Horse | 318 | 3 years ago | 5 | Python | ||||||
Named Entity Recognition for Chinese social media (Weibo). From EMNLP 2015 paper. | ||||||||||
Weibo_spider | 274 | 7 years ago | 5 | Python | ||||||
graduate project, a weibo spider to find some interesting information such as "In social network , people tend to be happy or sad." | ||||||||||
Sfm Ui | 138 | 4 months ago | 95 | mit | Python | |||||
Social Feed Manager user interface application. | ||||||||||
Social Auth Simulator | 109 | 6 years ago | 4 | Python | ||||||
模拟登陆->授权->获取access_token,目前支持人人、新浪微博、腾讯微博 | ||||||||||
Sharemanager | 90 | 7 years ago | 10 | December 13, 2015 | 2 | mit | Objective-C | |||
A SNS(Social Networking Services) Share Manager for ios, support Instagram, Facebook, Twitter, Weibo, QQ and Wechat. | ||||||||||
Rssit | 52 | a year ago | 5 | mit | Python | |||||
RSS Feed Generator |
This repository contains:
Data: Named Entity Recognition (NER) for Chinese Social Media (Weibo). This dataset contains messages selected from Weibo and annotated according to the DEFT ERE annotation guidelines. Annotations include both name and nominal mentions. The corpus contains 1,890 messages sampled from Weibo between November 2013 and December 2014.
golden-horse: A neural based NER tool for Chinese Social Media.
We fixed some inconsistancies in the data, especially the annotations for the nominal mentions. We thank Hangfeng He for his contribution to the major cleanup and revision of the annotations.
The original and revised annotated data are both made available in the data/ directory, with prefixes weiboNER.conll and weiboNER_2nd_conll, respectively.
We include updated results of our models on the revised version of the data in supplementary material: golden_horse_supplement.pdf. If you want to compare with our models on the revised data, please refer to this supplementary material. Thanks!
Please note that the updated version provided
If you use the revised dataset, please kindly cite the following bibtex in addition to the citation of our papers:
@article{HeS16,
author={Hangfeng He and Xu Sun},
title={F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media.},
journal={CoRR},
volume={abs/1611.04234},
year={2016}
}
The implementation of the papers:
Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings
Nanyun Peng and Mark Dredze
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015
and
Improving Named Entity Recognition for Chinese Social Media
with Word Segmentation Representation Learning
Nanyun Peng and Mark Dredze
Annual Meeting of the Association for Computational Linguistics (ACL), 2016
If you use the code, please kindly cite the following bibtex:
@inproceedings{peng2015ner,
title={Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings},
author={Peng, Nanyun and Dredze, Mark},
booktitle={Processings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={548–-554},
year={2015},
File={https://www.aclweb.org/anthology/D15-1064/},
}
@inproceedings{peng2016improving,
title={Improving named entity recognition for Chinese social media with word segmentation representation learning},
author={Peng, Nanyun and Dredze, Mark},
booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL)},
volume={2},
pages={149--155},
year={2016},
File={https://www.aclweb.org/anthology/P16-2025/},
}
This is an theano implementation; it requires installation of python module:
Theano
jieba (a Chinese word segmentor)
Both of them can be simply installed by pip moduleName.
The lstm layer was adapted from http://deeplearning.net/tutorial/lstm.html and the feature extraction part was adapted from crfsuite: http://www.chokkan.org/software/crfsuite/
python theano_src/crf_ner.py --nepochs 30 --neval_epochs 1 --training_data data/weiboNER.conll.train --valid_data data/weiboNER.conll.dev --test_data data/weiboNER.conll.test --emb_file embeddings/weibo_charpos_vectors --emb_type charpos --save_model_param weibo_best_parameters --emb_init true --eval_test False
python theano_src/crf_ner.py --nepochs 30 --neval_epochs 1 --training_data data/weiboNER_2nd_conll.train --valid_data data/weiboNER_2nd_conll.dev --test_data data/weiboNER_2nd_conll.test --emb_file embeddings/weibo_charpos_vectors --emb_type char --save_model_param weibo_best_parameters --emb_init true --eval_test False
In the above example, the output will be written at output_dir/weiboNER.conll.test.prediction. If you also want to see the evaluation (you must have labeled test data), you can add flag --eval_test True.
python theano_src/crf_ner.py --test_data data/weiboNER.conll.test --only_test true --output_dir data/ --save_model_param weibo_best_parameters
python theano_src/jointSegNER.py --cws_train_path data/pku_training.utf8 --cws_valid_path data/pku_test_gold.utf8 --cws_test_path data/pku_test_gold.utf8 --ner_train_path data/weiboNER_2nd_conll.train --ner_valid_path data/weiboNER_2nd_conll.dev --ner_test_path data/weiboNER_2nd_conll.test --emb_init file --emb_file embeddings/weibo_charpos_vectors --lr 0.05 --nepochs 30 --train_mode joint --cws_joint_weight 0.7 --m1_wemb1_dropout_rate 0.1
The last three parameters and the learning rate can be tuned. In our experiments, we found that for named mention, the best combination is (joint, 0.7, 0.1); for nonimal mention, the best combination is (alternative, 1.0, 0.1)
We noticed that several factors could affect the replicatability of experiments:
Note: the data we provide contains both named and nominal mentions, you can get the dataset with only named entities by simply filtering out the nominal mentions.
The annotations in this repository are released according to the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA 3.0). The messages themselves are selected from Weibo and follow Weibo's terms of service.