Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Corus | 254 | 9 months ago | 10 | July 24, 2023 | 66 | mit | Jupyter Notebook | |||
Links to Russian corpora + Python functions for loading and parsing | ||||||||||
Ted Multilingual Parallel Corpus | 152 | 8 years ago | 6 | |||||||
TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages. | ||||||||||
Ud_russian Syntagrus | 77 | 5 months ago | 16 | other | Perl | |||||
Russian data from the SynTagRus corpus. | ||||||||||
Russian_news_corpus | 76 | 7 years ago | 1 | apache-2.0 | ||||||
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ | ||||||||||
Gpt 2 Training | 65 | 3 years ago | 7 | Python | ||||||
Training GPT-2 on a Russian language corpus | ||||||||||
Taiga_site | 54 | 4 years ago | 6 | CSS | ||||||
Nerus | 51 | 9 months ago | 7 | April 09, 2020 | mit | Python | ||||
Large silver standart Russian corpus with NER, morphology and syntax markup | ||||||||||
Morphorueval 2017 | 41 | 6 years ago | 13 | other | Python | |||||
Russian Ulmfit | 27 | 4 years ago | Jupyter Notebook | |||||||
AWD-LSTM language model trained on newspaper corpora with fast.ai | ||||||||||
Spacy_russian_tokenizer | 26 | 5 years ago | 1 | Python | ||||||
Custom Russian tokenizer for spaCy |