Deep Learning for Anomaly Detection: A Survey

Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.

Do CIFAR-10 Classifiers Generalize to CIFAR-10?

Machine learning is currently dominated by largely experimental work focused on improvements in a few key tasks. However, the impressive accuracy numbers of the best performing models are questionable because the same test sets have been used to select these models for multiple years now. To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images. Although we ensure that the new test set is as close to the original data distribution as possible, we find a large drop in accuracy (4% to 10%) for a broad range of deep learning models. Yet more recent models with higher original accuracy show a smaller drop and better overall performance, indicating that this drop is likely not due to overfitting based on adaptivity. Instead, we view our results as evidence that current accuracy numbers are brittle and susceptible to even minute natural variations in the data distribution.

Semantics vs. Syntax

Syntax is the grammar. It describes the way to construct a correct sentence. For example, this water is triangular is syntactically correct.
Semantics relates to the meaning. this water is triangular does not mean anything, though the grammar is ok.

BERT for text classification

BERT can achieve high accuracy with small sample size (e.g. 1000):

To simply get features (embedding) from BERT, this Keras package is easy to start with:

For fine-tuning with GPUs, this PyTorch version is handy (gradient accumulation is implemented):

For people how have access to TPUs:

Combine different word embeddings

How to take advantage of different word embeddings in text classification task? Please check my Kaggle post:

A subset of this field is called meta-embedding. Here is a list of papers:

I found that just taking average of different embeddings is already powerful enough.

One thing to try is BERT: