Transfer-related Papers

a list of BERT-related papers. Any feedback is welcome.

Downstream task

QA, MC, Dialogue

A BERT Baseline for the Natural Questions
MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension (ACL2019)
Unsupervised Domain Adaptation on Reading Comprehension
BERTQA -- Attention on Steroids
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning (EMNLP2019)
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
Multi-hop Question Answering via Reasoning Chains
Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering (EMNLP2019 WS)
End-to-End Open-Domain Question Answering with BERTserini (NAALC2019)
Latent Retrieval for Weakly Supervised Open Domain Question Answering (ACL2019)
Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering (EMNLP2019)
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering (ICLR2020)
Learning to Ask Unanswerable Questions for Machine Reading Comprehension (ACL2019)
Unsupervised Question Answering by Cloze Translation (ACL2019)
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
A Recurrent BERT-based Model for Question Generation (EMNLP2019 WS)
Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension (ACL2019)
Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning (CIKM2019)
SG-Net: Syntax-Guided Machine Reading Comprehension
MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (EMNLP2019)
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR2020)
Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization
BAS: An Answer Selection Method Using BERT Language Model
Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
A Simple but Effective Method to Incorporate Multi-turn Context with BERT for Conversational Machine Comprehension (ACL2019 WS)
FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension (ACL2019 WS)
BERT with History Answer Embedding for Conversational Question Answering (SIGIR2019)
GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension (ICML2019 WS)
Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian (RANLP2019)
XQA: A Cross-lingual Open-domain Question Answering Dataset (ACL2019)
Cross-Lingual Machine Reading Comprehension (EMNLP2019)
Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
Multilingual Question Answering from Formatted Text applied to Conversational Agents
BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels (EMNLP2019)
MLQA: Evaluating Cross-lingual Extractive Question Answering
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension (TACL)
SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis
Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension (EMNLP2019)
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer (Interspeech2019)
Dialog State Tracking: A Neural Reading Comprehension Approach
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems
Fine-Tuning BERT for Schema-Guided Zero-Shot Dialogue State Tracking
Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
Domain Adaptive Training BERT for Response Selection
BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding

Slot filling

BERT for Joint Intent Classification and Slot Filling
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
A Comparison of Deep Learning Methods for Language Understanding (Interspeech2019)

Analysis

Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP2019)
Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations (EMNLP2019)
Using BERT for Word Sense Disambiguation
Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation (ACL2019)
Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision (ACL2019)
Assessing BERT’s Syntactic Abilities
Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
Simple BERT Models for Relation Extraction and Semantic Role Labeling
LIMIT-BERT : Linguistic Informed Multi-Task BERT
A Simple BERT-Based Approach for Lexical Simplification
Multi-headed Architecture Based on BERT for Grammatical Errors Correction (ACL2019 WS)
Towards Minimal Supervision BERT-based Grammar Error Correction
BERT-Based Arabic Social Media Author Profiling
Sentence-Level BERT and Multi-Task Learning of Age and Gender in Social Media
Evaluating the Factual Consistency of Abstractive Text Summarization
NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution
xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
TabFact: A Large-scale Dataset for Table-based Fact Verification
Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
LAMBERT: Layout-Aware language Modeling using BERT for information extraction

Word segmentation, parsing, NER

BERT Meets Chinese Word Segmentation
Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited (EMNLP2019)
Parsing as Pretraining (AAAI2020)
Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
Named Entity Recognition -- Is there a glass ceiling? (CoNLL2019)
A Unified MRC Framework for Named Entity Recognition
Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
Robust Named Entity Recognition with Truecasing Pretraining (AAAI2020)
LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition
MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
Portuguese Named Entity Recognition using BERT-CRF
Towards Lingua Franca Named Entity Recognition with BERT

Pronoun/coreference resolution

Resolving Gendered Ambiguous Pronouns with BERT (ACL2019 WS)
Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge (ACL2019 WS)
Gendered Pronoun Resolution using BERT and an extractive question answering formulation (ACL2019 WS)
MSnet: A BERT-based Network for Gendered Pronoun Resolution (ACL2019 WS)
Fill the GAP: Exploiting BERT for Pronoun Resolution (ACL2019 WS)
On GAP Coreference Resolution Shared Task: Insights from the 3rd Place Solution (ACL2019 WS)
Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution (ACL2019 WS)
BERT Masked Language Modeling for Co-reference Resolution (ACL2019 WS)
Coreference Resolution with Entity Equalization (ACL2019)
BERT for Coreference Resolution: Baselines and Analysis (EMNLP2019) [github]
WikiCREM: A Large Unsupervised Corpus for Coreference Resolution (EMNLP2019)
Ellipsis and Coreference Resolution as Question Answering
Coreference Resolution as Query-based Span Prediction

Sentiment analysis

Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL2019)
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis (NAACL2019)
Exploiting BERT for End-to-End Aspect-based Sentiment Analysis (EMNLP2019 WS)
Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese (ACL2019)
"Mask and Infill" : Applying Masked Language Model to Sentiment Transfer
Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference

Relation extraction

Matching the Blanks: Distributional Similarity for Relation Learning (ACL2019)
BERT-Based Multi-Head Selection for Joint Entity-Relation Extraction (NLPCC2019)
Enriching Pre-trained Language Model with Entity Information for Relation Classification
Span-based Joint Entity and Relation Extraction with Transformer Pre-training
Fine-tune Bert for DocRED with Two-step Process
Entity, Relation, and Event Extraction with Contextualized Span Representations (EMNLP2019)
Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text

Knowledge base

KG-BERT: BERT for Knowledge Graph Completion
Language Models as Knowledge Bases? (EMNLP2019) [github]
BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA
Inducing Relational Knowledge from BERT (AAAI2020)
Latent Relation Language Models (AAAI2020)
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model (ICLR2020)
Zero-shot Entity Linking with Dense Entity Retrieval
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNLL2019)
Improving Entity Linking by Modeling Latent Entity Type Information (AAAI2020)
How Can We Know What Language Models Know?
REALM: Retrieval-Augmented Language Model Pre-Training

Text classification

How to Fine-Tune BERT for Text Classification?
X-BERT: eXtreme Multi-label Text Classification with BERT
DocBERT: BERT for Document Classification
Enriching BERT with Knowledge Graph Embeddings for Document Classification
Classification and Clustering of Arguments with Contextualized Word Embeddings (ACL2019)
BERT for Evidence Retrieval and Claim Verification
Conditional BERT Contextual Augmentation
Stacked DeBERT: All Attention in Incomplete Data for Text Classification

WSC, WNLI, NLI

Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
A Surprisingly Robust Trick for the Winograd Schema Challenge
WinoGrande: An Adversarial Winograd Schema Challenge at Scale (AAAI2020)
Improving Natural Language Inference with a Pretrained Parser
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial Analysis of Natural Language Inference Systems (ICSC2020)
Evaluating BERT for natural language inference: A case study on the CommitmentBank (EMNLP2019)

Commonsense

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (NAACL2019)
HellaSwag: Can a Machine Really Finish Your Sentence? (ACL2019) [website]
Story Ending Prediction by Transferable BERT (IJCAI2019)
Explain Yourself! Leveraging Language Models for Commonsense Reasoning (ACL2019)
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Informing Unsupervised Pretraining with External Linguistic Knowledge
Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test
BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
Commonsense Knowledge Mining from Pretrained Models (EMNLP2019)
Do Massively Pretrained Language Models Make Better Storytellers? (CoNLL2019)
PIQA: Reasoning about Physical Commonsense in Natural Language (AAAI2020)
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?

Extractive summarization

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization (ACL2019)
Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression
Discourse-Aware Neural Extractive Model for Text Summarization

IR

Passage Re-ranking with BERT
Investigating the Successes and Failures of BERT for Passage Re-Ranking
Understanding the Behaviors of BERT in Ranking
Document Expansion by Query Prediction
CEDR: Contextualized Embeddings for Document Ranking (SIGIR2019)
Deeper Text Understanding for IR with Contextual Neural Language Modeling (SIGIR2019)
FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance (SIGIR2019)
Multi-Stage Document Ranking with BERT

Generation

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model (NAACL2019 WS)
Pretraining-Based Natural Language Generation for Text Summarization
Text Summarization with Pretrained Encoders (EMNLP2019) [github (original)] [github (huggingface)]
Multi-stage Pretraining for Abstractive Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML2019) [github], [github]
Unified Language Model Pre-training for Natural Language Understanding and Generation (NeurIPS2019)
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Towards Making the Most of BERT in Neural Machine Translation
Improving Neural Machine Translation with Pre-trained Representation
On the use of BERT for Neural Machine Translation
Incorporating BERT into Neural Machine Translation (ICLR2020)
Recycling a Pre-trained BERT Encoder for Neural Machine Translation
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Mask-Predict: Parallel Decoding of Conditional Masked Language Models (EMNLP2019)
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
Cross-Lingual Natural Language Generation via Pre-Training (AAAI2020) [github]
Multilingual Denoising Pre-training for Neural Machine Translation
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Unsupervised Pre-training for Natural Language Generation: A Literature Review

Modification (multi-task, masking strategy, etc.)

Multi-Task Deep Neural Networks for Natural Language Understanding (ACL2019)
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning (ICML2019)
Unifying Question Answering and Text Classification via Span Extraction
ERNIE: Enhanced Language Representation with Informative Entities (ACL2019)
ERNIE: Enhanced Representation through Knowledge Integration
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (AAAI2020)
Pre-Training with Whole Word Masking for Chinese BERT
SpanBERT: Improving Pre-training by Representing and Predicting Spans [github]
Blank Language Models
RoBERTa: A Robustly Optimized BERT Pretraining Approach [github]
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR2020)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR2020)
FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR2020)
KERMIT: Generative Insertion-Based Modeling for Sequences
DisSent: Sentence Representation Learning from Explicit Discourse Relations (ACL2019)
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR2020)
Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
SenseBERT: Driving Some Sense into BERT
Semantics-aware BERT for Language Understanding (AAAI2020)
K-BERT: Enabling Language Representation with Knowledge Graph
Knowledge Enhanced Contextual Word Representations (EMNLP2019)
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP2019)
SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
Universal Text Representation from BERT: An Empirical Study
Symmetric Regularization based BERT for Pair-wise Semantic Reasoning
Transfer Fine-Tuning: A BERT Case Study (EMNLP2019)
Improving Pre-Trained Multilingual Models with Vocabulary Expansion (CoNLL2019)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
SesameBERT: Attention for Anywhere
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [github]
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Transformer variants

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL2019) [github]
The Evolved Transformer (ICML2019)
Reformer: The Efficient Transformer (ICLR2020) [github]
Transformer on a Diet

Probe

A Structural Probe for Finding Syntax in Word Representations (NAACL2019)
Linguistic Knowledge and Transferability of Contextual Representations (NAACL2019) [github]
Probing What Different NLP Tasks Teach Machines about Function Word Comprehension (*SEM2019)
BERT Rediscovers the Classical NLP Pipeline (ACL2019)
Probing Neural Network Comprehension of Natural Language Arguments (ACL2019)
Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations (EMNLP2019 WS)
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Quantity doesn't buy quality syntax with neural language models (EMNLP2019)
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction (ICLR2020)
oLMpics -- On what Language Model Pre-training Captures
How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Inside BERT

What does BERT learn about the structure of language? (ACL2019)
Open Sesame: Getting Inside BERT's Linguistic Knowledge (ACL2019 WS)
Analyzing the Structure of Attention in a Transformer Language Model (ACL2019 WS)
What Does BERT Look At? An Analysis of BERT's Attention (ACL2019 WS)
Do Attention Heads in BERT Track Syntactic Dependencies?
Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains (ACL2019 WS)
Inducing Syntactic Trees from BERT Representations (ACL2019 WS)
A Multiscale Visualization of Attention in the Transformer Model (ACL2019 Demo)
Visualizing and Measuring the Geometry of BERT
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP2019)
Are Sixteen Heads Really Better than One? (NeurIPS2019)
On the Validity of Self-Attention as Explanation in Transformer Models
Visualizing and Understanding the Effectiveness of BERT (EMNLP2019)
Attention Interpretability Across NLP Tasks
Revealing the Dark Secrets of BERT (EMNLP2019)
Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs (EMNLP2019)
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives (EMNLP2019)
A Primer in BERTology: What we know about how BERT works
Do NLP Models Know Numbers? Probing Numeracy in Embeddings (EMNLP2019)
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations (CIKM2019)
Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering
What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models [github]

Multi-lingual

Multilingual Constituency Parsing with Self-Attention and Pre-Training (ACL2019)
Language Model Pretraining (NeurIPS2019) [github]
75 Languages, 1 Model: Parsing Universal Dependencies Universally (EMNLP2019) [github]
Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations (EMNLP2019 WS)
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (EMNLP2019)
How multilingual is Multilingual BERT? (ACL2019)
How Language-Neutral is Multilingual BERT?
Is Multilingual BERT Fluent in Language Generation?
BERT is Not an Interlingua and the Bias of Tokenization (EMNLP2019 WS)
Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020)
Multilingual Alignment of Contextual Word Representations (ICLR2020)
On the Cross-lingual Transferability of Monolingual Representations
Unsupervised Cross-lingual Representation Learning at Scale
Emerging Cross-lingual Structure in Pretrained Language Models
Can Monolingual Pretrained Models Help Cross-Lingual Classification?
Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data (CoNLL2019)

Other than English models

CamemBERT: a Tasty French Language Model
FlauBERT: Unsupervised Language Model Pre-training for French
Multilingual is not enough: BERT for Finnish
BERTje: A Dutch BERT Model
RobBERT: a Dutch RoBERTa-based Language Model
Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Domain specific

BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets (ACL2019 WS)
BERT-based Ranking for Biomedical Entity Normalization
PubMedQA: A Dataset for Biomedical Research Question Answering (EMNLP2019)
Pre-trained Language Model for Biomedical Question Answering
How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
Publicly Available Clinical BERT Embeddings (NAACL2019 WS)
Progress Notes Classification and Keyword Extraction using Attention-based Deep Learning Models with BERT
SciBERT: Pretrained Contextualized Embeddings for Scientific Text [github]
PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model

Multi-modal

VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)
VisualBERT: A Simple and Performant Baseline for Vision and Language
Selfie: Self-supervised Pretraining for Image Embedding
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Contrastive Bidirectional Transformer for Temporal Representation Learning
M-BERT: Injecting Multimodal Information in the BERT Structure
LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)
Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)
Unified Vision-Language Pre-Training for Image Captioning and VQA [github]
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
VL-BERT: Pre-training of Generic Visual-Linguistic Representations (ICLR2020)
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
UNITER: Learning UNiversal Image-TExt Representations
Supervised Multimodal Bitransformers for Classifying Images and Text
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations
BERT for Large-scale Video Segment Classification with Test-time Augmentation (ICCV2019WS)
SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Effectiveness of self-supervised pre-training for speech recognition
Understanding Semantics from Speech Through Pre-training
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

Model compression

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Patient Knowledge Distillation for BERT Model Compression (EMNLP2019)
Small and Practical BERT Models for Sequence Labeling (EMNLP2019)
Pruning a BERT-based Question Answering Model
TinyBERT: Distilling BERT for Natural Language Understanding [github]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS2019 WS) [github]
PoWER-BERT: Accelerating BERT inference for Classification Tasks
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
Extreme Language Model Compression with Optimal Subwords and Shared Projections
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q8BERT: Quantized 8Bit BERT (NeurIPS2019 WS)

Misc.

Cloze-driven Pretraining of Self-attention Networks
Learning and Evaluating General Linguistic Intelligence
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (ACL2019 WS)
BERTScore: Evaluating Text Generation with BERT (ICLR2020)
Machine Translation Evaluation with BERT Regressor
SumQE: a BERT-based Summary Quality Estimation Model (EMNLP2019)
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (ICLR2020)
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models (ICLR2020)
A Mutual Information Maximization Perspective of Language Representation Learning (ICLR2020)
Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment (AAAI2020)
Thieves on Sesame Street! Model Extraction of BERT-based APIs (ICLR2020)
Graph-Bert: Only Attention is Needed for Learning Graph Representations
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Extending Machine Language Models toward Human-Level Language Understanding
Glyce: Glyph-vectors for Chinese Character Representations
Back to the Future -- Sequential Alignment of Text Representations
Improving Cuneiform Language Identification with BERT (NAACL2019 WS)
BERT has a Moral Compass: Improvements of ethical and moral values of machines
SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (ACM-BCB2019)
On the comparability of Pre-trained Language Models
Transformers: State-of-the-art Natural Language Processing
Evolution of transfer learning in natural language processing

collect BERT related resources.

Papers:

arXiv:1810.04805, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
arXiv:1812.06705, Conditional BERT Contextual Augmentation, Authors: Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, Songlin Hu
arXiv:1812.03593, SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering, Authors: Chenguang Zhu, Michael Zeng, Xuedong Huang
arXiv:1901.02860, Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Authors: Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le and Ruslan Salakhutdinov.
arXiv:1901.04085, Passage Re-ranking with BERT, Authors: Rodrigo Nogueira, Kyunghyun Cho
arXiv:1902.02671, BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning, Authors: Asa Cooper Stickland, Iain Murray
arXiv:1904.02232, BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis, Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu, [code]

Github Repositories:

official implement:

google-research/bert, officical TensorFlow code and pre-trained models for BERT ,

implement of BERT besides tensorflow:

codertimo/BERT-pytorch, Google AI 2018 BERT pytorch implementation,
huggingface/pytorch-pretrained-BERT, A PyTorch implementation of Google AI's BERT model with script to load Google's pre-trained models,
Separius/BERT-keras, Keras implementation of BERT with pre-trained weights,
soskek/bert-chainer, Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
innodatalabs/tbert, PyTorch port of BERT ML model
guotong1988/BERT-tensorflow, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
dreamgonfly/BERT-pytorch, PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
CyberZHG/keras-bert, Implementation of BERT that could load official pre-trained models for feature extraction and prediction
soskek/bert-chainer, Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
MaZhiyuanBUAA/bert-tf1.4.0, bert-tf1.4.0
dhlee347/pytorchic-bert, Pytorch Implementation of Google BERT,
kpot/keras-transformer, Keras library for building (Universal) Transformers, facilitating BERT and GPT models,
miroozyx/BERT_with_keras, A Keras version of Google's BERT model,
conda-forge/pytorch-pretrained-bert-feedstock, A conda-smithy repository for pytorch-pretrained-bert. ,
Rshcaroline/BERT_Pytorch_fastNLP, A PyTorch & fastNLP implementation of Google AI's BERT model.
nghuyong/ERNIE-Pytorch, ERNIE Pytorch Version,
dmlc/gluon-nlp, Gluon + MXNet implementation that reproduces BERT pretraining and finetuning on GLUE benchmark, SQuAD, etc,
dbiir/UER-py, UER-py is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of different pre-training models (e.g. BERT), and provides interfaces for users to further extend upon.

improvement over BERT:

thunlp/ERNIE, Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities", imporove bert with heterogeneous information fusion.
PaddlePaddle/LARK, LAnguage Representations Kit, PaddlePaddle implementation of BERT. It also contains an improved version of BERT, ERNIE, for chinese NLP tasks.
ymcui/Chinese-BERT-wwm, Pre-Training with Whole Word Masking for Chinese BERT https://arxiv.org/abs/1906.08101,
zihangdai/xlnet, XLNet: Generalized Autoregressive Pretraining for Language Understanding,
kimiyoung/transformer-xl, Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, This repository contains the code in both PyTorch and TensorFlow for our paper.
GaoPeng97/transformer-xl-chinese, （transformer xl for text generation of chinese）,

other resources for BERT:

brightmart/bert_language_understanding, Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN,
Y1ran/NLP-BERT--ChineseVersion,
yangbisheng2009/cn-bert,
JayYip/bert-multiple-gpu, A multiple GPU support version of BERT,
HighCWu/keras-bert-tpu, Implementation of BERT that could load official pre-trained models for feature extraction and prediction on TPU,
Willyoung2017/Bert_Attempt, PyTorch Pretrained Bert,
Pydataman/bert_examples, some examples of bert, run_classifier.py
guotong1988/BERT-chinese, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
zhongyunuestc/bert_multitask, 多任务task
Microsoft/AzureML-BERT, End-to-end walk through for fine-tuning BERT using Azure Machine Learning ,
bigboNed3/bert_serving, export bert model for serving,
yoheikikuta/bert-japanese, BERT with SentencePiece for Japanese text.
whqwill/seq2seq-keyphrase-bert, add BERT to encoder part for https://github.com/memray/seq2seq-keyphrase-pytorch,
algteam/bert-examples, bert-demo,
cedrickchee/awesome-bert-nlp, A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
cnfive/cnbert,
brightmart/bert_customized, bert with customized features,
JayYip/bert-multitask-learning, BERT for Multitask Learning,
yuanxiaosc/BERT_Paper_Chinese_Translation, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 。Chinese Translation! https://yuanxiaosc.github.io/2018/12/…,
yaserkl/BERTvsULMFIT, Comparing Text Classification results using BERT embedding and ULMFIT embedding,
kpot/keras-transformer, Keras library for building (Universal) Transformers, facilitating BERT and GPT models,
1234560o/Bert-model-code-interpretation,
cdathuraliya/bert-inference, A helper class for Google BERT (Devlin et al., 2018) to support online prediction and model pipelining.
gameofdimension/java-bert-predict, turn bert pretrain checkpoint into saved model for a feature extracting demo in java
1234560o/Bert-model-code-interpretation,

domain specific BERT:

allenai/scibert, A BERT model for scientific text. https://arxiv.org/abs/1903.10676,
MeRajat/SolvingAlmostAnythingWithBert, BioBert Pytorch
kexinhuang12345/clinicalBERT, ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission https://arxiv.org/abs/1904.05342
EmilyAlsentzer/clinicalBERT, repository for Publicly Available Clinical BERT Embeddings

BERT Deploy Tricks:

zhihu/cuBERT, Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
xmxoxo/BERT-train2deploy, Bert Model training and deploy,

BERT QA & RC task:

sogou/SMRCToolkit, This toolkit was designed for the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.,
benywon/ChineseBert, This is a chinese Bert model specific for question answering,
matthew-z/R-net, R-net in PyTorch, with BERT and ELMo,
nyu-dl/dl4marco-bert, Passage Re-ranking with BERT,
xzp27/BERT-for-Chinese-Question-Answering,
chiayewken/bert-qa, BERT for question answering starting with HotpotQA,
ankit-ai/BertQA-Attention-on-Steroids, BertQA - Attention on Steroids,
NoviScl/BERT-RACE, This work is based on Pytorch implementation of BERT (https://github.com/huggingface/pytorch-pretrained-BERT). I adapted the original BERT model to work on multiple choice machine comprehension.
eva-n27/BERT-for-Chinese-Question-Answering,
allenai/allennlp-bert-qa-wrapper, This is a simple wrapper on top of pretrained BERT based QA models from pytorch-pretrained-bert to make AllenNLP model archives, so that you can serve demos from AllenNLP.
edmondchensj/ChineseQA-with-BERT, EECS 496: Advanced Topics in Deep Learning Final Project: Chinese Question Answering with BERT (Baidu DuReader Dataset)
graykode/toeicbert, TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.,
graykode/KorQuAD-beginner, https://github.com/graykode/KorQuAD-beginner
krishna-sharma19/SBU-QA, This repository uses pretrain BERT embeddings for transfer learning in QA domain

BERT classification task:

zhpmatrix/Kaggle-Quora-Insincere-Questions-Classification,
maksna/bert-fine-tuning-for-chinese-multiclass-classification, use google pre-training model bert to fine-tuning for the chinese multiclass classification
NLPScott/bert-Chinese-classification-task,
Socialbird-AILab/BERT-Classification-Tutorial,
fooSynaptic/BERT_classifer_trial, BERT trial for chinese corpus classfication
xiaopingzhong/bert-finetune-for-classfier,
pengming617/bert_classification, ,
xieyufei1993/Bert-Pytorch-Chinese-TextClassification, Pytorch Bert Finetune in Chinese Text Classification,
liyibo/text-classification-demos, Neural models for Text Classification in Tensorflow, such as cnn, dpcnn, fasttext, bert ...,
circlePi/BERT_Chinese_Text_Class_By_pytorch, A Pytorch implements of Chinese text class based on BERT_Pretrained_Model,
kaushaltrivedi/bert-toxic-comments-multilabel, Multilabel classification for Toxic comments challenge using Bert,
lonePatient/BERT-chinese-text-classification-pytorch, This repo contains a PyTorch implementation of a pretrained BERT model for text classification.,

BERT Sentiment Analysis

Chung-I/Douban-Sentiment-Analysis, Sentiment Analysis on Douban Movie Short Comments Dataset using BERT.
lynnna-xu/bert_sa, bert sentiment analysis tensorflow serving with RESTful API
HSLCY/ABSA-BERT-pair, Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL 2019) https://arxiv.org/abs/1903.09588,
songyouwei/ABSA-PyTorch, Aspect Based Sentiment Analysis, PyTorch Implementations.,
howardhsu/BERT-for-RRC-ABSA, code for our NAACL 2019 paper: "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
brightmart/sentiment_analysis_fine_grain, Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger,

BERT NER task:

zhpmatrix/bert-sequence-tagging,
kyzhouhzau/BERT-NER, Use google BERT to do CoNLL-2003 NER ! ,
king-menin/ner-bert, NER task solution (bert-Bi-LSTM-CRF) with google bert https://github.com/google-research.
macanv/BERT-BiLSMT-CRF-NER, Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning ,
FuYanzhe2/Name-Entity-Recognition, Lstm-crf,Lattice-CRF,bert-ner
mhcao916/NER_Based_on_BERT, this project is based on google bert model, which is a Chinese NER
ProHiryu/bert-chinese-ner,
sberbank-ai/ner-bert, BERT-NER (nert-bert) with google bert,
kyzhouhzau/Bert-BiLSTM-CRF, This model base on bert-as-service. Model structure : bert-embedding bilstm crf. ,
Hoiy/berserker, Berserker - BERt chineSE woRd toKenizER, Berserker (BERt chineSE woRd toKenizER) is a Chinese tokenizer built on top of Google's BERT model. ,
Kyubyong/bert_ner, Ner with Bert,
jiangpinglei/BERT_ChineseWordSegment, A Chinese word segment model based on BERT, F1-Score 97%,
yanwii/ChineseNER,
lemonhu/NER-BERT-pytorch, PyTorch solution of NER task Using Google AI's pre-trained BERT model.

BERT Text Summarization Task:

nlpyang/BertSum, Code for paper Fine-tune BERT for Extractive Summarization,
santhoshkolloju/Abstractive-Summarization-With-Transfer-Learning, Abstractive summarisation using Bert as encoder and Transformer Decoder,
nayeon7lee/bert-summarization, Implementation of 'Pretraining-Based Natural Language Generation for Text Summarization', Paper: https://arxiv.org/pdf/1902.09243.pdf
dmmiller612/lecture-summarizer, Lecture summarizer with BERT

BERT Text Generation Task:

asyml/texar, Toolkit for Text Generation and Beyond https://texar.io, Texar is a general-purpose text generation toolkit, has also implemented BERT here for classification, and text generation applications by combining with Texar's other modules.
voidful/BertGenerate, Fine tuning bert for text generation,
Tiiiger/bert_score, BERT score for language generation,

BERT Knowledge Graph Task :

lvjianxin/Knowledge-extraction,
sakuranew/BERT-AttributeExtraction, USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction.,
aditya-AI/Information-Retrieval-System-using-BERT,
jkszw2014/bert-kbqa-NLPCC2017, A trial of kbqa based on bert for NLPCC2016/2017 Task 5, https://blog.csdn.net/ai_1046067944/article/details/86707784 ,
yuanxiaosc/Schema-based-Knowledge-Extraction, Code for http://lic2019.ccf.org.cn/kg,
yuanxiaosc/Entity-Relation-Extraction, Entity and Relation Extraction Based on TensorFlow.Schema based Knowledge Extraction, SKE 2019 http://lic2019.ccf.org.cn,
WenRichard/KBQA-BERT, https://zhuanlan.zhihu.com/p/62946533 ,

BERT Coreference Resolution

ianycxu/RGCN-with-BERT, Gated-Relational Graph Convolutional Networks (RGCN) with BERT for Coreference Resolution Task
isabellebouchard/BERT_for_GAP-coreference, BERT finetuning for GAP unbiased pronoun resolution

BERT visualization toolkit:

jessevig/bertviz, Tool for visualizing BERT's attention,

BERT chatbot :

GaoQ1/rasa_nlu_gq, turn natural language into structured data,
GaoQ1/rasa_chatbot_cn,
GaoQ1/rasa-bert-finetune,
geodge831012/bert_robot
yuanxiaosc/BERT-for-Sequence-Labeling-and-Text-Classification, This is the template code to use BERT for sequence lableing and text classification, in order to facilitate BERT for more tasks. Currently, the template code has included conll-2003 named entity identification, Snips Slot Filling and Intent Prediction.
guillaume-chevalier/ReuBERT, A question-answering chatbot, simply.

BERT language model and embedding:

hanxiao/bert-as-service, Mapping a variable-length sentence to a fixed-length vector using pretrained BERT model,
YC-wind/embedding_study,
Kyubyong/bert-token-embeddings, Bert Pretrained Token Embeddings,
xu-song/bert_as_language_model, bert as language model, fork from https://github.com/google-research/bert,
yuanxiaosc/Deep_dynamic_word_representation, TensorFlow code and pre-trained models for deep dynamic word representation (DDWR). It combines the BERT model and ELMo's deep context word representation.,
imgarylai/bert-embedding, Token level embeddings from BERT model on mxnet and gluonnlp http://bert-embedding.readthedocs.io/,
terrifyzhao/bert-utils,
fennuDetudou/BERT_implement,
whqwill/seq2seq-keyphrase-bert, add BERT to encoder part for https://github.com/memray/seq2seq-keyphrase-pytorch,
charles9n/bert-sklearn, a sklearn wrapper for Google's BERT model,
NVIDIA/Megatron-LM, Ongoing research training transformer language models at scale, including: BERT,
hankcs/BERT-token-level-embedding, Generate BERT token level embedding without pain

BERT Text Match:

pengming617/bert_textMatching, 利用预训练的中文模型实现基于bert的语义匹配模型数据集为LCQMC官方数据
Brokenwind/BertSimilarity, Computing similarity of two sentences with google's BERT algorithm
policeme/chinese_bert_similarity, bert chinese similarity
lonePatient/bert-sentence-similarity-pytorch, This repo contains a PyTorch implementation of a pretrained BERT model for sentence similarity task.
nouhadziri/DialogEntailment, The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment" https://arxiv.org/abs/1904.03371

ko bert

https://github.com/jeongukjae/KR-BERT-SimCSE

BERT tutorials:

graykode/nlp-tutorial, Natural Language Processing Tutorial for Deep Learning Researchers https://www.reddit.com/r/MachineLearn…,
dragen1860/TensorFlow-2.x-Tutorials, TensorFlow 2.x version's Tutorials and Examples, including CNN, RNN, GAN, Auto-Encoders, FasterRCNN, GPT, BERT examples, etc. TF 2.0。,

한국어 sentence bert 모델

깃허브에 sentence-transformers 다국어 모델과의 벤치마크 성능 비교를 기재해두었습니다) ko-sentence-transformers 라이브러리를 설치하시면 허깅페이스 허브에서 바로 다운받아 사용 가능합니다.
허깅페이스 모델: https://huggingface.co/jhgan/ko-sbert-multitask
깃허브 저장소: https://github.com/jhgan00/ko-sentence-transformers

논문 요약: https://www.marktechpost.com/.../researchers-from-china.../

종이: https://arxiv.org/pdf/2207.07116v1.pdf Github: https://github.com/lightdxy/bootmae

Time Series Related Survey

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects, in arXiv 2023. [paper] [Website]
A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection, in arXiv 2023. [paper] [Website]
Time series data augmentation for deep learning: a survey, in IJCAI 2021. [paper]
Neural temporal point processes: a review, in IJCAI 2021. [paper]
Time-series forecasting with deep learning: a survey, in Philosophical Transactions of the Royal Society A 2021. [paper]
Deep learning for time series forecasting: a survey, in Big Data 2021. [paper]
Neural forecasting: Introduction and literature overview, in arXiv 2020. [paper]
Deep learning for anomaly detection in time-series data: review, analysis, and guidelines, in Access 2021. [paper]
A review on outlier/anomaly detection in time series data, in ACM Computing Surveys 2021. [paper]
A unifying review of deep and shallow anomaly detection, in Proceedings of the IEEE 2021. [paper]
Deep learning for time series classification: a review, in Data Mining and Knowledge Discovery 2019. [paper]
More related time series surveys, tutorials, and papers can be found at this repo.

Application Domains of Time Series Transformers

Transformers in Forecasting

Time Series Forecasting

Make Transformer Great Again for Time Series Forecasting: Channel Aligned Robust Dual Transformer, in arXiv 2023. [paper]
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers, in ICLR 2023. [paper] [code]
Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting, in ICLR 2023. [paper]
Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting, in ICLR 2023. [paper]
Non-stationary Transformers: Rethinking the Stationarity in Time Series Forecasting, in NeurIPS 2022. [paper]
Learning to Rotate: Quaternion Transformer for Complicated Periodical Time Series Forecasting”, in KDD 2022. [paper]
FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting, in ICML 2022. [paper] [official code]
TACTiS: Transformer-Attentional Copulas for Time Series, in ICML 2022. [paper]
Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting, in ICLR 2022. [paper] [official code]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, in NeurIPS 2021. [paper] [official code]
Informer: Beyond efficient transformer for long sequence time-series forecasting, in AAAI 2021. [paper] [official code] [dataset]
Temporal fusion transformers for interpretable multi-horizon time series forecasting, in International Journal of Forecasting 2021. [paper] [code]
Probabilistic Transformer For Time Series Analysis, in NeurIPS 2021. [paper]
Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case, in arXiv 2020. [paper]
Adversarial sparse transformer for time series forecasting, in NeurIPS 2020. [paper] [code]
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, in NeurIPS 2019. [paper] [code]
SSDNet: State Space Decomposition Neural Network for Time Series Forecasting, in ICDM 2021, [paper]
From Known to Unknown: Knowledge-guided Transformer for Time-Series Sales Forecasting in Alibaba, in arXiv 2021. [paper]
TCCT: Tightly-coupled convolutional transformer on time series forecasting, in Neurocomputing 2022. [paper]
Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting, in IJCAI 2022. [paper]

Spatio-Temporal Forecasting

AirFormer: Predicting Nationwide Air Quality in China with Transformers, in AAAI 2023. [paper] [official code]
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting, in NeurIPS 2022. [paper] [official code]
Bidirectional Spatial-Temporal Adaptive Transformer for Urban Traffic Flow Forecasting, in TNNLS 2022. [paper]
Spatio-temporal graph transformer networks for pedestrian trajectory prediction, in ECCV 2020. [paper] [official code]
Spatial-temporal transformer networks for traffic flow forecasting, in arXiv 2020. [paper] [official code]
Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting, in Transactions in GIS 2022. [paper]

Event Forecasting

HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences，in NeurIPS 2022. [paper] [official code]
Transformer Embeddings of Irregularly Spaced Events and Their Participants, in ICLR 2022. [paper] [official code]
Self-attentive Hawkes process, in ICML 2020. [paper] [official code]
Transformer Hawkes process, in ICML 2020. [paper] [official code]

Transformers in Anomaly Detection

CAT: Beyond Efficient Transformer for Content-Aware Anomaly Detection in Event Sequences, in KDD 2022. [paper] [official code]
DCT-GAN: Dilated Convolutional Transformer-based GAN for Time Series Anomaly Detection, in TKDE 2022. [paper]
Concept Drift Adaptation for Time Series Anomaly Detection via Transformer, in Neural Processing Letters 2022. [paper]
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy, in ICLR 2022. [paper] [official code]
TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data, in VLDB 2022. [paper] [official code]
Learning graph structures with transformer for multivariate time series anomaly detection in IoT, in IEEE Internet of Things Journal 2021. [paper] [official code]
Spacecraft Anomaly Detection via Transformer Reconstruction Error, in ICASSE 2019. [paper]
Unsupervised Anomaly Detection in Multivariate Time Series through Transformer-based Variational Autoencoder, in CCDC 2021. [paper]
Variational Transformer-based anomaly detection approach for multivariate time series, in Measurement 2022. [paper]

Transformers in Classification

TrajFormer: Efficient Trajectory Classification with Transformers, in CIKM 2022. [paper]
TARNet : Task-Aware Reconstruction for Time-Series Transformer, in KDD 2022. [paper] [official code]
A transformer-based framework for multivariate time series representation learning, in KDD 2021. [paper] [official code]
Voice2series: Reprogramming acoustic models for time series classification, in ICML 2021. [paper] [official code]
Gated Transformer Networks for Multivariate Time Series Classification, in arXiv 2021. [paper] [official code]
Self-attention for raw optical satellite time series classification, in ISPRS Journal of Photogrammetry and Remote Sensing 2020. [paper] [official code]
Self-supervised pretraining of transformers for satellite image time series classification, in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020. [paper]
Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series, in ACM TKDD 2022. [paper] [official code]

canvers-ko2en

840만 번역쌍으로 bart 기반으로 튜닝이 되어 있습니다. GPU가 있다면, transformers 에서 flash attention 2 와 사용하실 수도 있고 ctranslate2 버전도 있어 cpu에서도 충분히 빠르게 모델을 사용할 수도 있습니다. 저희 내부적인 전략이 영문 모델을 한글 튜닝하기 보다는 영문 모델을 기본으로 빠르게 follow-up 하되 앞뒤로 번역을 붙여서 쓰는 방식을 취하고 있는데, 관리차원에서 올린 모델중에 번역모델들이 꾸준하게 다운로드가 되고 있긴 하더라구요. https://huggingface.co/circulus/canvers-ko2en-v2 https://huggingface.co/circulus/canvers-en2ko-v2 https://huggingface.co/circulus/canvers-ko2en-ct2-v2 https://huggingface.co/circulus/canvers-en2ko-ct2-v2

Files

BERT.md

Latest commit

History