Awesome Open Source

Programming Languages

Search results for python computational linguistics

computational-linguistics x

99 search results found

Python Keyphrase Extraction module

Arguman.org ⭐ 1,349

Argument mapping and analysis platform

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec

🥛turkic morphology project

Acl Anthology ⭐ 304

Data and software for building the ACL Anthology.

Pycantonese ⭐ 290

Cantonese Linguistics and NLP

Wikipron ⭐ 256

Massively multilingual pronunciation mining

Statistical NLG for spoken dialogue systems

Datastories Semeval2017 Task4 ⭐ 171

Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

Compling_nlp_hse_course ⭐ 157

Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ

🙊 software for creating speech recognition models.

Colibri Core ⭐ 122

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

Robotreviewer ⭐ 97

Automatic synthesis of RCTs

Python Tutorial Notebooks ⭐ 97

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Библиотека для извлечения статистик из текстов на русском языке.

Transition-based tree-to-graph AMR Parser

Lamachine ⭐ 66

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas,

Doing things with embeddings

Emnlp 2023 Papers ⭐ 54

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. ⭐ support NLP!

Segmentation.evaluation ⭐ 47

SegEval Segmentation Evaluation Package

Python_nlp_tutorial ⭐ 47

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

Phonemes ⭐ 42

Jason Riggle's chart of phonological features in JSON format + extras

A set of workflows for corpus building through OCR, post-correction and normalisation

takahe is a multi-sentence compression module

Negation Detection ⭐ 38

Negation detection NLP tool. If you use the code, please cite George Gkotsis, Sumithra Velupillai, Anika Oellrich, Harry Dean, Maria Liakata and Rina Dutta. Don't Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records, Computational Linguistics and Clinical Psychology 2016

Pylangacq ⭐ 36

Language Acquisition Research Tools

Stemmer for German

Python Arpa ⭐ 32

🐍 Python library for n-gram models in ARPA format

Mingpipe ⭐ 30

A Chinese name matcher written in Python. Describe in: Nanyun Peng, Mo Yu, Mark Dredze. An Empirical Study of Chinese Name Matching and Applications. Association for Computational Linguistics (ACL) (short paper), 2015.

Python Ucto ⭐ 29

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars

Chinese_ner_with_attention ⭐ 27

Calamancy ⭐ 26

NLP pipelines for Tagalog using spaCy

Embedding_evaluation ⭐ 25

Evaluate your word embeddings

Sentiment Analysis Imdb ⭐ 23

Example project of sentiment analysis using LSTM NN on IMDB reviews database

Sentimentanalysis ⭐ 22

Sentiment Analysis: Deep Bi-LSTM+attention model

Linguistica 5: Unsupervised Learning of Linguistic Structure

Charscnn Theano ⭐ 22

implementation of CharSCNN and SCNN.

Emotional Dialogue Acts corpus contains dialogue act labels for the multimodal conversational emotion datasets IEMOCAP and MELD. https://www.aclweb.org/anthology/2020.lrec-1.78/

Linguistics_problems ⭐ 22

Natural language processing in examples and games

Streaming_lsh ⭐ 21

A project for clustering text streams using locality-sensitive hashing (LSH) in Python

Pytorch Rnng ⭐ 21

Neural Abstract Anaphora ⭐ 20

A Mention-Ranking Model for Abstract Anaphora Resolution

Repository for the CLiPS HAte speech DEtection System [HADES].

Acl22 Identifying The Human Values Behind Arguments ⭐ 19

Machine Learning scripts for the identification of human values behind arguments.

Datastories Semeval2017 Task6 ⭐ 19

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

An Ancient Greek Morphology Tagger

Novoic's linguistic feature extraction library

Textexpansion ⭐ 17

Abuseeval ⭐ 15

Data set for LREC 2020 paper "I Feel Offended, Don't Be Abusive!"

Babyberta ⭐ 15

Source code for CoNLL 2021 paper by Huebner et al. 2021

Arabicprocessingcog ⭐ 15

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Uncertainty ⭐ 14

A Python implementation of the uncertainty classifier, based on the work of Veronika Vincze.

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

The Code & Paper for ACL 2023 paper "Enhancing Language Representation with Constructional Information for Natural Language Understanding"

Python library for vector space models

Multilingual Joint Embeddings ⭐ 13

Russian_soundex ⭐ 13

Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone

Cross Distill ⭐ 13

"Cross-lingual Distillation for Text Classification" 55th annual meeting of the Association for Computational Linguistics (ACL 2017)

Kaldi_helpers ⭐ 12

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Arguminsci ⭐ 11

Analyze Argumentation and Rhetorical Aspects in Scientific Writing.

python package for calculating famous measures in computational linguistics

Jgtextrank ⭐ 10

jgtextrank: Yet another Python implementation of TextRank

Kurdishhunspell ⭐ 10

A morphological analyzer and spell checker for Kurdish in Hunspell

Opinionspam ⭐ 9

Research code for opinion spam detection

A Natural Language Processing toolkit for sequence labeling in its simplest form.

Projects in Machine Learning ETH team trying to use mechanical turk and active learning for solving word-sense disambiguation task

Morfessor FlatCat

Discoursesegmenter ⭐ 8

A collection of various discourse segmenters

Foliatools ⭐ 8

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.

Fado Python3 ⭐ 8

A quick Python 3 port of the FAdo Project

Paraphrase_identification ⭐ 7

Paraphrase Identification with Deep Learning using Keras

Wlp Parser ⭐ 7

This repository contains a collection of neural network models that we used to demonstrate the utility of our dataset.

Latent Aspect Detection ⭐ 7

Code and models for the paper "Latent Aspect Detection from Online Unsolicited Customer Reviews"

Multiword Idiom Collocation Extractor

Diachrony_for_russian ⭐ 7

Code and dataset for tracing semantic changes in Russian adjectives

Phrasal Composition In Transformers ⭐ 7

This repo contains datasets and code for Assessing Phrasal Representation and Composition in Transformers, by Lang Yu and Allyson Ettinger.

Structscribe ⭐ 7

Resources and code for: Scalable Micro-planned Generation of Discourse from Structured Data

Correction_detector ⭐ 7

Correction Detector. An JSON RPC server that compares an input sentence with its revision and summarizes errors have been corrected.

LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence

Pun Model ⭐ 7

Clean python implementation of the paper "Computational Model for Linguistic Humor in Puns"

Veʹrdd is an open-source dictionary editing framework with the focus on low-resourced and endangered languages. The framework is mainly built to facilitate collecting, importing, editing and exporting dictionaries while allowing the involvement of the native speakers to contribute easily to the preservation of the language and construction of the dictionary.

Wikinflection ⭐ 6

Generating an inflectional corpus out of Wiktionary.

Contextualised Word Representations for Lexical Semantic Change Analysis

Separation Rank For Nlp ⭐ 5

Explainable Mechanism for Modeling Contextual Dependency in Neural Language Model

💬 Cross-platform application for the creation of language resources from ELAN linguistic analysis files, or from scratch.

Semeval2022 Task8 Tonyx ⭐ 5

Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity

Text_analysis_technobabble ⭐ 5

NLP Using Star Trek scripts as training data.

Naacl Mpqa Srl4orl ⭐ 5

SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling

Wordembedding ⭐ 5

Bilingual Word Embedding CCA model

Optimalnumberoftopics ⭐ 5

A set of methods for finding an appropriate number of topics in a text collection

Memorable Quotes ⭐ 5

The repository for memorable quotes project.

Ud_hindi_english Hiencs ⭐ 5

Related Searches

Python Django (28,897)

Python Machine Learning (20,195)

Python Dataset (14,792)

Python Docker (14,113)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Jupyter Notebook (12,976)

Python Html (10,924)

Python Natural Language Processing (9,064)

Python Pytorch (7,877)

1-99 of 99 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.