Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for shell corpus
corpus
x
shell
x
52 search results found
Tools_for_corpus_of_people_daily
⭐
200
人民日报语料处理工具集 | Tools for Corpus of People's Daily
Kaldi Tuda De
⭐
165
Scripts for training general-purpose large vocabulary German acoustic models for ASR with Kaldi.
Sejong Corpus
⭐
103
Korean sejong corpus download and simple analysis
Gwordlist
⭐
68
All the words from Google Books, sorted by frequency
Finbert
⭐
61
BERT model trained from scratch on Finnish
Jparacrawl Finetune
⭐
57
An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.
Odsqa
⭐
41
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
Laborotvspeech
⭐
37
Voxceleb
⭐
34
mirror of VoxCeleb dataset - a large-scale speaker identification dataset
Mecab Ko Dic Msvc
⭐
28
이전됨 - https://github.com/Pusnow/mecab-ko-msvc
Speech.ko
⭐
26
Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language
Ldc_downloader
⭐
25
Script to download corpora from the Linguistic Data Consortium (LDC)
Awesome Azeri Nlp
⭐
24
Azerbaijani language processing software, models and datasets.
Easy Kaldi
⭐
23
Use your data to create a speech recognition system in Kaldi. Fast.
Easyme
⭐
16
An implementation of Maximum Entropy model
Clarinstudiokaldi
⭐
15
A baseline Automatic Speech Recognition system for Polish based on Kaldi.
Bitmab2 Tutorials
⭐
14
Workshop materials for the Second Benthic Invertebrates, Metagenomics, and Bioinformatics Workshop at the TAMUCC Harte Institute in Corpus Christi, TX (January 15-19, 2018)
Fade
⭐
12
A Simulation Framework for Auditory Discrimination Experiments
Arabic Speech Recognition
⭐
12
This repository contains my attempt to use two famous speech recognition frameworks (Kaldi, CMU Sphinx4) for Arabic Language using the publicly-available dataset "Arabic Corpus of Isolated Words"
Arabert
⭐
12
Arabic Language Model based on Bert
Google Ngrams
⭐
12
Shell scripts to assist downloading & processing the Google n-grams corpora
Awesome Kyrgyz Nlp
⭐
12
Kyrgyz language processing software, models and datasets.
Kaldifordummies
⭐
11
Simple automatic speech recognition system based on digits corpora (Polish language), created in Kaldi toolkit. Despite of the language difference, this is an effect of 'Kaldi for dummies' tutorial published in kaldi-help discussion group. No audio data - this is just an example.
Torgo_asr
⭐
10
A Kaldi recipe for training automatic speech recognition systems on the Torgo corpus of dysarthric speech
Chinese Asr
⭐
10
Chinese-ASR built on kaldi
Uaspeech
⭐
10
Baseline kaldi script for UA-SPEECH corpus
Wat2017
⭐
10
Scripts for re-building NTT neural machine translation systems for WAT 2017
Tedlium
⭐
9
Vagrant VM with full Kaldi TEDLIUM corpus
Docs
⭐
9
UCCA Documentation
Habeas Corpus
⭐
8
Command-line corpus tools
Kaldi Avsr
⭐
8
Kaldi-based audio-visual speech recognition
Asr Recipes
⭐
8
Mtaac_gold_corpus
⭐
8
Creating Enron Spam Corpus From Raw Data
⭐
7
Using raw data of Enron spam datasets to create a corpus using python, nltk and shell script.
Mtaac_work
⭐
7
MTAAC work packages
Keyakitreebank
⭐
7
Keyaki Treebank Parsed Corpus
Sentiment Classifier Service
⭐
7
Self-contained service utilizing the NLTK for sentiment classification.
Akec
⭐
6
Arabic Keyphrase Extraction Corpus
Boyd Wnut2018
⭐
6
Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)
Pyshellitems
⭐
6
Python library and tools for handling shell items / property lists and stores / and extension blocks
Wasmtime Libfuzzer Corpus
⭐
6
libFuzzer corpus for our wasmtime fuzz targets
Tlingit Corpus
⭐
6
Text corpus the of Tlingit language for linguistic research.
Conll2012 Preprocess Parsing
⭐
6
Scripts for pre-processing the CoNLL-2012 dataset for syntactic dependency parsing.
Latvian Twitter Eater Corpus
⭐
6
Contains the latvian tweet eater corpus.
Fuzz Nix
⭐
6
Fuzzing the Nix interpreter with afl-fuzz
Ceph Erasure Code Corpus
⭐
5
Objects erasure encoded by Ceph
Cluster Preprocessing
⭐
5
preprocessing of large corpora to induce various cluster types
Iban
⭐
5
Moore_and_lewis_data_selection
⭐
5
Eark Ip Test Corpus
⭐
5
Test corpus of E-ARK information packages to test validator functionality against the specification.
Substring
⭐
5
The SubString package is an open-source set of Unix Shell scripts used for substring reduction and frequency consolidation of word n-grams of different length. In the process, the frequencies of substrings are reduced by the frequencies of their superstrings and a consolidated list with n-grams of different lengths is produced without an inflation of the overall word count. The functions performed by SubString will primarily be of interest to linguists working on formulaic language, multi-word s
Alvis Docker
⭐
5
Dockerizing Alvis and its components
Related Searches
Shell Docker (20,660)
Shell Script (15,351)
Shell Bash (10,338)
Shell Command Line (6,542)
Shell Dotfiles (5,338)
Shell Git (4,715)
Shell Ansible (4,427)
Shell Server (3,563)
Shell Ssh (3,562)
Shell Docker Image (3,406)
1-52 of 52 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.