Awesome Open Source

Programming Languages

Search results for dataset bert

58 search results found

Nlp_chinese_corpus ⭐ 8,344

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Transformers Tutorials ⭐ 6,731

This repository contains demos I made with the Transformers library by HuggingFace.

Awesome Pretrained Chinese Nlp Models ⭐ 3,738

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Codesearchnet ⭐ 2,054

Datasets, tools, and benchmarks for representation learning of code.

Chineseglue ⭐ 1,765

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

Cluener2020 ⭐ 1,384

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Bert Ner ⭐ 1,000

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Source code of K-BERT (AAAI2020)

Cluepretrainedmodels ⭐ 536

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

Fastbert ⭐ 527

The score code of FastBERT (ACL2020)

Cluecorpus2020 ⭐ 517

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

Indonlu ⭐ 494

The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Accepted by ACL 2020.

Openai Clip ⭐ 404

Simple implementation of OpenAI CLIP model in PyTorch.

Arabert ⭐ 372

Pre-trained Transformers for the Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic Electra)

Cmrc2018 ⭐ 313

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

Bert Attributeextraction ⭐ 185

USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。

Awesome Llm Eval ⭐ 183

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, learderboard, papers, docs and models, mainly for Evaluation on LLMs.

Robbert ⭐ 180

A Dutch RoBERTa-based language model

Tabformer ⭐ 144

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

French Sentiment Analysis With Bert ⭐ 124

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset

BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision

Bertqa Attention On Steroids ⭐ 105

BertQA - Attention on Steroids

Prosody ⭐ 104

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Scientificsummarizationdatasets ⭐ 88

Datasets I have created for scientific summarization, and a trained BertSum model

Dialogue Understanding ⭐ 82

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Marathinlp ⭐ 80

Marathi NLP - is a repository dedicated to development of tools and resources for Marathi language.

Code for the NAACL2022 paper "Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction"

✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and prompting mass-media news into datasets for ML-model training

Xpersona ⭐ 51

XPersona: Evaluating Multilingual Personalized Chatbot

Transformer Srl ⭐ 45

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

Colbert Using Bert Sentence Embedding For Humor Detection ⭐ 44

Novel model and dataset for the task of humor detection

Finbert Qa ⭐ 39

Financial Domain Question Answering with pre-trained BERT Language Model

Pn Summary ⭐ 37

A well-structured summarization dataset for the Persian language!

Sentilare ⭐ 34

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Easy multi-task learning with HuggingFace Datasets and Trainer

Sohu2019 ⭐ 24

2019搜狐校园算法大赛

Tradetheevent ⭐ 20

Implementation of "Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading." In Findings of ACL2021

Squad2.q Augmented Dataset ⭐ 19

Augmented version of SQUAD 2.0 for Questions

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

Quasi Attention Absa ⭐ 18

The codebase for a new quasi-attention BERT model for TABSA tasks

Pragmeval ⭐ 18

Discourse Based Evaluation of Language Understanding

Representation Engineering with Natural Language Explanations

COVID-19 Question Dataset from the paper "What Are People Asking About COVID-19? A Question Classification Dataset"

Protonet Bert Text Classification ⭐ 16

finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet

Berserker ⭐ 16

Berserker - BERt chineSE woRd toKenizER

Filipino Text Benchmarks ⭐ 13

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Ec Darkpattern ⭐ 13

[IEEE BigData 2022, 5th Workshop on Big Data for CyberSecurity (BigCyber-2022)] Dark patterns in e-commerce: a dataset and its baseline evaluations

🔮 Answering multiple choice questions with Language Models.

Jd2skills Bert Xmlc ⭐ 12

Code and Dataset for the Bhola et al. (2020) Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework

Bert Disambiguation ⭐ 12

Code and CoarseWSD-20 datasets for "Language Models and Word Sense Disambiguation: An Overview and Analysis"

Offenseval2020 Code ⭐ 11

Malay Fake News Classification ⭐ 10

Malay Fake News Classification using CNN, BiLSTM, C-LSTM, RCNN, FT-BERT and BERTCNN.

Personality_detection ⭐ 9

BB-SVM model for automatic personality detection of the essays dataset (Big-Five personality labeled traits)

Meddistant19 ⭐ 8

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction (COLING 2022)

The universal integrated corpus-building environment.

Orangesum ⭐ 8

The French summarization dataset introduced in "BARThez: a Skilled Pretrained French Sequence-to-Sequence Model".

Sum_liputan6 ⭐ 8

The first large-scale summarization corpus for the Indonesian language. AACL 2020.

BERT-For-NLU-Tasks

Generate SQUAD style dataset from raw text file and train a transformer based question answering model .This repo has code from https://github.com/facebookresearch/UnsupervisedQA and https://github.com/deepset-ai/haystack

Multi Label Text Classification For Chinese ⭐ 6

pytorch implementation of multi-label text classification, includes kinds of models and pretrained. Especially for Chinese preprocessing.

An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.

Natural Language Processing of academic papers for dataset indexing

KR3: Korean Restaurant Review with Ratings / Experiments on Parameter-efficient Tuning and Task-adaptive Pre-training

Related Searches

Python Dataset (14,792)

Jupyter Notebook Dataset (6,824)

Deep Learning Dataset (2,364)

Machine Learning Dataset (2,279)

Dataset Pytorch (1,847)

Dataset Tensorflow (1,583)

Dataset Classification (1,500)

Dataset Convolutional Neural Networks (1,264)

Dataset Paper (1,252)

Javascript Dataset (1,014)

1-58 of 58 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.