- Sample efficiency: models typically need 6000 samples per digit to recognize digit handwriting.
- Poor transferablity: models don’t learn from previous experience or learned knowledge.
So meta-learning is the solution to the two questions above. And we try to define it as “learning how to learn”. Our dream is:
- Learn to learn
- Train a system on many tasks
- Resulting system can solve new tasks quickly
In deep learning, we use regularization to make sure we are not overfitting out model with a small dataset, but we are overfitting our task. Therefore what we learned cannot be generalized to other tasks.
We often get stuck when test samples that are not common in dataset.
In one-shot-learning, we will only provide one training sample per category. There is an example:
In this one-shot learning, we often train a RNN to learn the training data and labels. When we represent with a test input, we should predict its label correctly.
In meta-testing, we provide many datasets again with classes that never trained before. Once we have learned from hundred tasks, we should discover the general pattern in classifying objects.
One of the meta-learning methods using an external memory network with RNN. Note that in supervised learning, we provide both input and label in the same time step $t$. However, in this model, the label is not provided untild the next time step $t+1$(shown below).
When updating the model, instead of updating the model immediately, we wait until a batch of tasks is completed. We later merge all we learned from these tasks for a single update.