Awesome Open Source

Programming Languages

Search results for natural language processing computational linguistics

computational-linguistics x

natural-language-processing x

48 search results found

Python Keyphrase Extraction module

Nlp With Ruby ⭐ 1,002

Curated List: Practical Natural Language Processing done in Ruby

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec

Nlp Papers With Arxiv ⭐ 363

Statistics and accepted paper list of NLP conferences with arXiv link

German Nlp ⭐ 360

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Language modeling and instruction tuning for Russian

Acl Anthology ⭐ 304

Data and software for building the ACL Anthology.

Pycantonese ⭐ 290

Cantonese Linguistics and NLP

Nlp Conference Compendium ⭐ 285

Compendium of the resources available from top NLP conferences.

Wikipron ⭐ 256

Massively multilingual pronunciation mining

Bllip Parser ⭐ 207

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

Awesome Hungarian Nlp ⭐ 192

A curated list of NLP resources for Hungarian

Acl Papers ⭐ 178

paper summary of Association for Computational Linguistics

Datastories Semeval2017 Task4 ⭐ 171

Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

An Efficient Chinese Text Classifier

Compling_nlp_hse_course ⭐ 157

Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ

Colibri Core ⭐ 122

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

Amr Tutorial ⭐ 100

Abstract Meaning Representation (AMR) tutorial slides

Python Tutorial Notebooks ⭐ 97

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Datalinguist ⭐ 87

Stanford CoreNLP in idiomatic Clojure.

Библиотека для извлечения статистик из текстов на русском языке.

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Lamachine ⭐ 66

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-s

Doing things with embeddings

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas,

Emnlp 2023 Papers ⭐ 54

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. ⭐ support NLP!

Python_nlp_tutorial ⭐ 47

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Sentiment Analysis Of Tweets In Russian ⭐ 46

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

A set of workflows for corpus building through OCR, post-correction and normalisation

Yet Another (natural language) Parser

Pylangacq ⭐ 36

Language Acquisition Research Tools

Word2vec Tsne ⭐ 35

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Amr Bibliography ⭐ 34

Organized inventory of research using the Abstract Meaning Representation

Stemmer for German

Python Arpa ⭐ 32

🐍 Python library for n-gram models in ARPA format

A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars

Python Ucto ⭐ 29

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Calamancy ⭐ 26

NLP pipelines for Tagalog using spaCy

DELPH-IN Documentation

Elixir Nlp ⭐ 25

A (hopefully helpful) collection of resources for Elixir NLP devs

Linguistics_problems ⭐ 22

Natural language processing in examples and games

Linguistica 5: Unsupervised Learning of Linguistic Structure

Sentimentanalysis ⭐ 22

Sentiment Analysis: Deep Bi-LSTM+attention model

Mystem Scala ⭐ 21

Morphological analyzer `mystem` wrapper for JVM languages

Repository for the CLiPS HAte speech DEtection System [HADES].

🦜 NLP for Tibetan, in Python.

An Ancient Greek Morphology Tagger

Ruby based API for the project Wortschatz Leipzig.

Datastories Semeval2017 Task6 ⭐ 19

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Tools for the 3rd edition of the Constraint Grammar formalism.

Novoic's linguistic feature extraction library

New York Times Word Innovation Types dataset

Arabicprocessingcog ⭐ 15

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Uncertainty ⭐ 14

A Python implementation of the uncertainty classifier, based on the work of Veronika Vincze.

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

Survey on machine learning.

🍘 単語分割を経由しない単語埋め込み 🍘

An unsupervised Chinese word segmentation tool.

Arguminsci ⭐ 11

Analyze Argumentation and Rhetorical Aspects in Scientific Writing.

Kurdishhunspell ⭐ 10

A morphological analyzer and spell checker for Kurdish in Hunspell

Gsoc2019 Text Extraction ⭐ 10

GSoC 2019: Development of a Tool for Extracting Quantitative Text Profiles

Jgtextrank ⭐ 10

jgtextrank: Yet another Python implementation of TextRank

A Natural Language Processing toolkit for sequence labeling in its simplest form.

Emosense Semeval2019 Task3 Emocontext ⭐ 9

Deep-learning system presented in "EmoSence at SemEval-2019 Task 3: Bidirectional LSTM Network for Contextual Emotion Detection in Textual Conversations" at SemEval-2019.

Computationally Modelling Resisting Strategies in Persuasive Conversations

Projects in Machine Learning ETH team trying to use mechanical turk and active learning for solving word-sense disambiguation task

Nlp Learning Notes ⭐ 8

🧠 NLP笔记，入门概念，基础知识，研究方法，顶会研读

Discoursesegmenter ⭐ 8

A collection of various discourse segmenters

Foliatools ⭐ 8

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.

Phrasal Composition In Transformers ⭐ 7

This repo contains datasets and code for Assessing Phrasal Representation and Composition in Transformers, by Lang Yu and Allyson Ettinger.

Structscribe ⭐ 7

Resources and code for: Scalable Micro-planned Generation of Discourse from Structured Data

Kurdishcl ⭐ 7

The Computational Linguistics course in Kurdish

Wlp Parser ⭐ 7

This repository contains a collection of neural network models that we used to demonstrate the utility of our dataset.

Pun Model ⭐ 7

Clean python implementation of the paper "Computational Model for Linguistic Humor in Puns"

Diachrony_for_russian ⭐ 7

Code and dataset for tracing semantic changes in Russian adjectives

Rl3stdlib ⭐ 7

The RL3 Standard Library is a collection of modules accessible to a RL3 program to simplify the programming process and removing the need to rewrite commonly used RL3 patterns and predicates.

Latent Aspect Detection ⭐ 7

Code and models for the paper "Latent Aspect Detection from Online Unsolicited Customer Reviews"

Semi-structured Document Model (Next-generation)

Genomenlp ⭐ 5

Purplemonkeydishwasher ⭐ 5

A public git version of my research projects, i.e. articles and all that

Optimalnumberoftopics ⭐ 5

A set of methods for finding an appropriate number of topics in a text collection

Semeval2022 Task8 Tonyx ⭐ 5

Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity

UkrVectōrēs (former docsim) – an NLU-powered tool for knowledge discovery, classification, diagnostics and prediction. Entities similarity tool. Інструмент, "когнітивно-семантичний калькулятор", що працює на основі NLU, для виявлення, класифікації, діагностики та прогнозування знань.

Naacl Mpqa Srl4orl ⭐ 5

SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling

A collaborative online computational linguistics development environment.

Dependencytrees.jl ⭐ 5

Dependency parsing in Julia

Text_analysis_technobabble ⭐ 5

NLP Using Star Trek scripts as training data.

Memorable Quotes ⭐ 5

The repository for memorable quotes project.

Related Searches

Python Natural Language Processing (7,915)

Jupyter Notebook Natural Language Processing (4,405)

Machine Learning Natural Language Processing (3,939)

Deep Learning Natural Language Processing (2,414)

Pytorch Natural Language Processing (1,212)

Dataset Natural Language Processing (1,010)

Artificial Intelligence Natural Language Processing (1,010)

Tensorflow Natural Language Processing (909)

Javascript Natural Language Processing (843)

Natural Language Processing Sentiment Analysis (839)

1-48 of 48 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.