Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nlp_chinese_corpus | 8,344 | 4 months ago | 20 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Bert Ner | 1,000 | 3 years ago | 71 | mit | Python | |||||
Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset). | ||||||||||
Cluecorpus2020 | 517 | a year ago | 8 | mit | ||||||
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料 | ||||||||||
Patents Public Data | 441 | 3 months ago | 36 | apache-2.0 | Jupyter Notebook | |||||
Patent analysis using the Google Patents Public Datasets on BigQuery | ||||||||||
Mseg Semantic | 366 | 2 years ago | 9 | mit | Python | |||||
An Official Repo of CVPR '20 "MSeg: A Composite Dataset for Multi-Domain Segmentation" | ||||||||||
Attention_is_all_you_need | 293 | 6 years ago | 1 | bsd-3-clause | Jupyter Notebook | |||||
Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer. | ||||||||||
Opentopodata | 211 | 6 months ago | 1 | mit | Python | |||||
Open alternative to the Google Elevation API! | ||||||||||
Landmark2019 1st And 3rd Place Solution | 198 | 3 years ago | 5 | apache-2.0 | Python | |||||
The 1st Place Solution of the Google Landmark 2019 Retrieval Challenge and the 3rd Place Solution of the Recognition Challenge. | ||||||||||
Quickdraw | 192 | a year ago | 10 | mit | Python | |||||
A simple implementation of Google's Quick, Draw Project for humans. 🖌️ 🖼️ | ||||||||||
Frvsr | 141 | 4 years ago | ||||||||
Frame-Recurrent Video Super-Resolution (official repository) |
A comparison of 5 different packages:
For a more detailed description of the process and results, please refer to the following blog post.
The benchmark was run using Google's Compute n1-standard-16 instance (16vCPU Haswell 2.3GHz, 60 GB memory).
Each algorithm was run 100 times on the Amazon and Google dataset and 10 times on the Pokec dataset, with the exception of Networkx.
The median run time is shown in the table below. Due to differences in profiling techniques and code implementation, the results may differ. Please refer to the respective code bases for implementation details.
Setup and installation instructions can be found in setup.md
.
Datasets are downloaded from https://snap.stanford.edu/data/ and is stored in the data folder. Amazon refers to amazon0302, google to web-Google and pokec to soc-Pokec. A download_data.sh
script is provided in the data folder to automate the download and pre-processing of the SNAP datasets.
Profiling code are located in the code folder. A particular benchmark code can be run using the helper bash script run_profiler.sh [profiling code] [dataset path] [number of repetitions] [output path]
. For example, to replicate the igraph benchmark on the amazon dataset with 100 repetitions run run_profiler.sh code/igraph_profile.py data/amazon0302.txt 100 output/igraph_amazon.txt
.