Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for c plus plus corpus
c-plus-plus
x
corpus
x
54 search results found
Vespa
⭐
5,115
AI + Data, online. https://vespa.ai
Autophrase
⭐
978
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
Pisa
⭐
820
PISA: Performant Indexes and Search for Academia
Commoncrawl
⭐
466
Common Crawl support library to access 2008-2012 crawl archives (ARC files)
Fast_align
⭐
377
Simple, fast unsupervised word aligner
Lava
⭐
338
LAVA: Large-scale Automated Vulnerability Addition
Node Spellchecker
⭐
263
SpellChecker Node Module
Vfuzz
⭐
170
vfuzz
Colibri Core
⭐
122
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Preprocess
⭐
82
Corpus preprocessing
Word2vec
⭐
81
word2vec++ is a Distributed Representations of Words (word2vec) library and tools implementation, written in C++11 from the scratch
Flucoma Core
⭐
64
Core algorithms and objects for the Fluid Corpus Manipulation Library
Flucoma Sc
⭐
60
Fluid Corpus Manipulation plugins for Supercollider
Tree2seq
⭐
50
C++ code of "Tree-to-Sequence Attentional Neural Machine Translation (tree2seq ANMT)"
Citations
⭐
45
Most cited papers by keyword
Bicvm
⭐
42
BiCVM Code
Dpe
⭐
40
Autocorpus
⭐
38
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
Medianofninthers
⭐
27
Zipporah
⭐
26
Dnn
⭐
22
Cmultivec
⭐
22
Fast C++ implementation of multiple prototype word representation training based on Huang Socher 2012
Snowball
⭐
22
Snowball Poem Generator - Generates snowball poems from raw English text, using Markov chains and dictionary lookups to validate input words
Fuzzcover
⭐
19
test suite generation for C++
Ticcl
⭐
19
Text-Induced Corpus Clean-up
Singular
⭐
18
Enhanced Suffix Arrays
⭐
17
A flexible implementation of enhanced suffix arrays in template based C++. Supports single and multi-position wildcard. Fast queries thanks to the implicit suffix tree structure.
Jointreps
⭐
16
Learning word representation jointly using a corpus and a knowledge base (KB)
Symgiza Pp
⭐
15
Symmetrized word alignment models, based on mgizapp and GIZA++
Poemy2
⭐
15
poemy (a poetry generator) rewritten in C++
Dact
⭐
13
Decaffeinated Alpino Corpus Tool
Ticcltools
⭐
13
Tools for TICCL
Query Suggestion
⭐
12
This a query auto-completion system can be used in any searching scenario
Distributed Translation Infrastructure
⭐
11
The distributed statistical machine translation infrastructure consisting of load balancing, text pre/post-processing and translation services. Written in C++ 11 and utilises multicore CPUs by employing multi-threading, allows for secure SSL/TLS communications.
Salm
⭐
10
SALM: Suffix Array and its Applications in Empirical Language Processing by Joy
Pcfg Em
⭐
9
Implementation of the em-algorithm for PCFGs (inside-outside-algorithm)
Vecdcs
⭐
9
Gargantua
⭐
9
Tensorflow Swivel
⭐
9
Catecophony
⭐
8
Real-time concatenative re-synthesis VST. Very WIP. Much broken.
High Dimensional Explorer
⭐
8
An implementation of a high-dimensional model of lexical co-occurrence.
Cdswordseg
⭐
7
Investigation into word segmentation for child- versus adult-directed speech transcriptions
Glovecpp
⭐
7
Modern C++ Implementation of the GloVe Natural Language Processing algorithm
Defminer
⭐
7
Automatic Definition Extraction System
Python Unitex
⭐
7
Python bindings for the Unitex/GramLab corpus processor
Jointreps
⭐
7
Joint Word Representation Learning using a Corpus and a Semantic Lexicon
Corpusharvester
⭐
6
CorpusHarvester is a tool suite used to easily and quickly retrieve and create corpus of data from different websites.
Presage
⭐
6
Fork of Presage (http://presage.sourceforge.net/)
Elan2split
⭐
6
Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners
Fast_umorph
⭐
6
Unsupervised morphology induction with OpenFst
Ner_tsd2016
⭐
5
Software and data accompanying paper Neural Networks for Featureless Named Entity Recognition in Czech
Collocations Benchmark
⭐
5
Counting word colocations in natural language corpora. This project benchmarks naive implementations of a colocation counter in C++ and Haskell, compiled with G++ and GHC. respectively.
Exchangeandbrown
⭐
5
Fmindex
⭐
5
Efficient substring searches on text corpora using a compressed index
Mwe Tools
⭐
5
A set of useful tools for use with multiword expression extraction from parallel corpora for Moses statistical machine translation system
Simulsig
⭐
5
Probabilistic Synthesis Engine
Bn256 Fuzzing
⭐
5
Compare output of operations on Barreto-Naehrig curves in the Go, Rust and CPP implementations of Ethereum using fuzzing
Related Searches
C Plus Plus Cmake (8,712)
C Plus Plus Qt (8,557)
C Plus Plus Video Game (8,255)
C Plus Plus Algorithms (6,194)
C Plus Plus Opengl (4,396)
C Plus Plus 3d Graphics (3,196)
C Plus Plus Testing (2,735)
Java C Plus Plus (2,629)
C Plus Plus Command Line (2,304)
Javascript C Plus Plus (2,235)
1-54 of 54 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.