Awesome Open Source

Programming Languages

Search results for shell corpus

52 search results found

Tools_for_corpus_of_people_daily ⭐ 200

人民日报语料处理工具集 | Tools for Corpus of People's Daily

Kaldi Tuda De ⭐ 165

Scripts for training general-purpose large vocabulary German acoustic models for ASR with Kaldi.

Sejong Corpus ⭐ 103

Korean sejong corpus download and simple analysis

Gwordlist ⭐ 68

All the words from Google Books, sorted by frequency

BERT model trained from scratch on Finnish

Jparacrawl Finetune ⭐ 57

An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.

ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET

Laborotvspeech ⭐ 37

Voxceleb ⭐ 34

mirror of VoxCeleb dataset - a large-scale speaker identification dataset

Mecab Ko Dic Msvc ⭐ 28

이전됨 - https://github.com/Pusnow/mecab-ko-msvc

Speech.ko ⭐ 26

Korean read speech corpus (about 120 hours, 17GB) from National Institute of Korean Language

Ldc_downloader ⭐ 25

Script to download corpora from the Linguistic Data Consortium (LDC)

Awesome Azeri Nlp ⭐ 24

Azerbaijani language processing software, models and datasets.

Easy Kaldi ⭐ 23

Use your data to create a speech recognition system in Kaldi. Fast.

An implementation of Maximum Entropy model

Clarinstudiokaldi ⭐ 15

A baseline Automatic Speech Recognition system for Polish based on Kaldi.

Bitmab2 Tutorials ⭐ 14

Workshop materials for the Second Benthic Invertebrates, Metagenomics, and Bioinformatics Workshop at the TAMUCC Harte Institute in Corpus Christi, TX (January 15-19, 2018)

A Simulation Framework for Auditory Discrimination Experiments

Arabic Speech Recognition ⭐ 12

This repository contains my attempt to use two famous speech recognition frameworks (Kaldi, CMU Sphinx4) for Arabic Language using the publicly-available dataset "Arabic Corpus of Isolated Words"

Arabic Language Model based on Bert

Google Ngrams ⭐ 12

Shell scripts to assist downloading & processing the Google n-grams corpora

Awesome Kyrgyz Nlp ⭐ 12

Kyrgyz language processing software, models and datasets.

Kaldifordummies ⭐ 11

Simple automatic speech recognition system based on digits corpora (Polish language), created in Kaldi toolkit. Despite of the language difference, this is an effect of 'Kaldi for dummies' tutorial published in kaldi-help discussion group. No audio data - this is just an example.

Torgo_asr ⭐ 10

A Kaldi recipe for training automatic speech recognition systems on the Torgo corpus of dysarthric speech

Chinese Asr ⭐ 10

Chinese-ASR built on kaldi

Uaspeech ⭐ 10

Baseline kaldi script for UA-SPEECH corpus

Scripts for re-building NTT neural machine translation systems for WAT 2017

Vagrant VM with full Kaldi TEDLIUM corpus

UCCA Documentation

Habeas Corpus ⭐ 8

Command-line corpus tools

Kaldi Avsr ⭐ 8

Kaldi-based audio-visual speech recognition

Asr Recipes ⭐ 8

Mtaac_gold_corpus ⭐ 8

Creating Enron Spam Corpus From Raw Data ⭐ 7

Using raw data of Enron spam datasets to create a corpus using python, nltk and shell script.

Mtaac_work ⭐ 7

MTAAC work packages

Keyakitreebank ⭐ 7

Keyaki Treebank Parsed Corpus

Sentiment Classifier Service ⭐ 7

Self-contained service utilizing the NLTK for sentiment classification.

Arabic Keyphrase Extraction Corpus

Boyd Wnut2018 ⭐ 6

Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)

Pyshellitems ⭐ 6

Python library and tools for handling shell items / property lists and stores / and extension blocks

Wasmtime Libfuzzer Corpus ⭐ 6

libFuzzer corpus for our wasmtime fuzz targets

Tlingit Corpus ⭐ 6

Text corpus the of Tlingit language for linguistic research.

Conll2012 Preprocess Parsing ⭐ 6

Scripts for pre-processing the CoNLL-2012 dataset for syntactic dependency parsing.

Latvian Twitter Eater Corpus ⭐ 6

Contains the latvian tweet eater corpus.

Fuzzing the Nix interpreter with afl-fuzz

Ceph Erasure Code Corpus ⭐ 5

Objects erasure encoded by Ceph

Cluster Preprocessing ⭐ 5

preprocessing of large corpora to induce various cluster types

Moore_and_lewis_data_selection ⭐ 5

Eark Ip Test Corpus ⭐ 5

Test corpus of E-ARK information packages to test validator functionality against the specification.

Substring ⭐ 5

The SubString package is an open-source set of Unix Shell scripts used for substring reduction and frequency consolidation of word n-grams of different length. In the process, the frequencies of substrings are reduced by the frequencies of their superstrings and a consolidated list with n-grams of different lengths is produced without an inflation of the overall word count. The functions performed by SubString will primarily be of interest to linguists working on formulaic language, multi-word s

Alvis Docker ⭐ 5

Dockerizing Alvis and its components

Related Searches

Shell Docker (20,660)

Shell Script (15,351)

Shell Bash (10,338)

Shell Command Line (6,542)

Shell Dotfiles (5,338)

Shell Git (4,715)

Shell Ansible (4,427)

Shell Server (3,563)

Shell Ssh (3,562)

Shell Docker Image (3,406)

1-52 of 52 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.