Few-shot learning has been transforming AI and machine learning. With few-shot learning, algorithms are trained on small datasets. If you need to train an AI model on limited data, few-shot might just be the solution you’ve been searching for.
Where is Few-Shot Learning Used?
Few-shot learning is being applied in the real world pretty much everywhere. Whether you need a true general purpose LLM or just some AI powered scraping, few-shot learning is going to be used on your model to at least some degree.
- Healthcare: When using radiology for diagnosis, or dealing with rare diseases and conditions, large datasets simply aren’t very available. You can explore our healthcare datasets here.
- Robotics: When learning to pick things up, autonomous robots don’t need large datasets. They experience the process and then perform generalizations.
- Personalized Tech: Phone keyboards and fitness watches use few-shot learning very effectively.
- Pharmaceuticals: When discovering new drugs, scientists are often using very limited datasets. Few-shot learning can be leveraged in early trials to speed things up.
- Language Processing: Linguists and archaeologists often need to deal with unused, dead languages. First-hand sources of these writings are scarce. AI can use few-shot learning to help decipher these languages.
- Image Recognition: Facial recognition requires few-shot learning. Most people aren’t going to train an AI using thousands of pictures of a single person. This same concept applies to endangered and rare species as well.
Few-Shot Learning vs. Other Paradigms
Few-shot learning is part of a broader family of machine learning techniques called n-shot learning. With n-shot learning, n represents the number of labeled examples per class that a model is trained on.
Here are some other examples of n-shot learning.
- Zero-Shot: A model uses prior knowledge to guess a class it hasn’t seen. Imagine a model trained on horses and tigers. This model has never seen a zebra before, but it can infer that a horse with stripes is a zebra.
- One-Shot: A model is trained on only one example per class. When a smartphone learns your face from a single picture, and allows you to unlock the screen, this is one-shot learning.
How Does Few-Shot Learning Work?
Few-shot is more extensive than zero-shot and one-shot, but still relies on very limited datasets. With the right training data, models can quickly make generalizations to identify patterns and trends.
Similar to its zero-shot and one-shot relatives, few-shot learning is built atop the following principles.
- Leveraging Prior Knowledge: Models use knowledge and training from previous tasks to identify patterns in new, unseen data.
- Task-Specific Adaptation: Models change their internal representations (classes) and their decision making process to properly handle new data with limited examples.
- Generalization from Small Datasets: With carefully selected training data, models can generalize efficiently after seeing just a few samples.
Types of Few-Shot Learning
Few-shot learning is not set in stone. It’s part of the larger, constantly evolving AI industry. However, the industry has reached a general consensus on the techniques listed in these next few sections.
Transfer Learning
When a model learns from one task and uses this knowledge in a new task, this is called “Transfer Learning.” Just like people, AI can use past experience to adapt to new situations. The model’s knowledge transfers and becomes relevant when trying to accomplish the new task.
Imagine you teach an AI to play Call of Duty. You then need this model to play Fortnite. It already knows how to aim, move and use combat strategy. The model’s previous Call of Duty experience gives it a better chance of success when playing Fortnite.
Transfer learning is not limited to AI or machine learning. Humans utilize transfer learning every single day. Transfer learning was the primary driver behind the agricultural revolution. Prehistoric humans learned how to grow certain types of food. Our ancestors then transferred these skills to every other plant based food they could find. Eventually, they applied these same principles in the domestication of livestock.
Data Augmentation
To expand few-shot learning, we can use data augmentation. In the real world, we often generate fictional data with similarities to the real data. This often involves adding randomness and noise into the fictional data.
Interpolation and extrapolation make data augmentation easier to understand. Look at the graph below. As you can see, we only have four actual pieces of data. Our dotted line uses these points to create a pattern. Using these plot points, we can extrapolate that if X=5, Y=10 and if X=0, Y=0. We can interpolate that when X=1.5, Y=3.
We can identify trends in our limited data. Once we understand the trends, we can generate infinite additional data by following the rules put forth by the original data. A formula (Y = 2X) augments our 4-shot dataset into an infinite set of points.
In the real world, nothing is perfect and examples like the one above often don’t exist. Imagine training an AI on horses. You have one real photo of a brown horse. You use some clever editing and now you’ve got photos of a red horse, a black horse, a white horse, and a zebra. From your single horse photo, you generated a much larger dataset.
You can learn more about data augmentation here.
Meta Learning
Meta learning is more about problem solving than it is about the actual data. With meta learning, a model is taught to break larger problems down into smaller ones and to use different strategies for different types of problems. Think back to the order of operations you learned in elementary school.
Look at the following problem: (2+(2*3))/4=?
. To solve this problem, we need to break it down.
(2+(2*3))/4=?
2*3=6
. We can now simplify this to(2 + 6)/4
.2+6=8
. Our problem is now8/4
.8/4=2
. The chain of smaller problems connects to prove that(2+(2*3))/4=2
.
By breaking the larger problem into smaller steps, we can come to the conclusion that our answer is 2
. When we teach machines that they can solve large problems by breaking them into smaller ones. It then uses appropriate strategies for each of the smaller problems. This is called meta learning. The machine learns problem solving strategies that it can apply to a myriad of different scenarios.
Let’s look at another example you learned early on. A sentence needs to start with a capital letter and it ends with a punctuation mark. When a model learns this, it doesn’t just learn how to write a sentence. The model learns how to effectively communicate all of its ideas in a way that humans can read and understand.
Like the other examples above, meta learning was used by people for hundreds of thousands of years before it was adapted to machine learning.
Metric Learning
With metric learning, a model is taught to compare similarities between data rather than simply assigning labels to it. It then uses a function to compare metrics and see how close new data is to previously seen data. Teachable Machine allows us to experiment with images and see how metric training really works.
Imagine we train a model using a set of cat pictures. The model analyzes these images and learns to compare different features like fur, whiskers, and ear shape.
Once the model is done training, we give it a new cat picture. It compares the data from this new picture to its training data. After looking at datapoints in the fur, whiskers, and ear shape, it will calculate a similarity score. If the new picture is 98% similar to previous data, the model will determine that it’s highly likely that the image is a cat.
If the model has been trained using other methods to say “cats are cute”, after being 98% sure that this new image is a cat, it might execute additional logic from other training types and say, “your cat picture is cute!”
Inherent Problems With Few-Shot Learning
When dealing with few-shot learning, the small datasets are both a strength and a weakness. Machine learning comes with a variety of pitfalls. To avoid the problems below, smaller models need to be trained using the concepts we went through in the previous sections.
Generalization
Few-shot models can do well with things like facial recognition, but when dealing with entirely new scenarios, they often fail when the data isn’t similar enough to what they’ve seen.
A traditional LLM is given millions, sometimes even billions or trillions of data points. This allows the model to effectively handle outliers and make decent predictions when dealing with data it hasn’t seen before.
If a model has only seen a handful of pencil drawn cat images, it very well might fail to recognize a picture of a real world cat. Without a robust dataset, the model can’t always make robust generalizations.
Data Diversity
Small sets of data often fail to capture the true diversity present in larger datasets. Imagine that a model has been trained on a small set of people and all of their addresses are in the US. This model will likely become biased and assume that all people are from the US. You can mitigate this issue with broad, diverse datasets. Our datasets can help you improve your model’s performance.
In the late 2010s, this problem plagued AI models all across the world. This issue still sometimes rears its ugly head in modern AI. When a model is trained on social media, the model often picks up the biases that it sees on social media. We’ve all heard stories about the racist AI bots of the late 2010s. This is how it happens.
Feature Representation
The data diversity problem is a double-edged sword. Think back to our hypothetical model that recognizes animals. If this model only learns that all cats have four legs, it will see a picture of a horse and decide that it’s incredibly similar to a cat.
If a facial recognition model learns that a face has eyes, ears, a mouth and a nose, but it hasn’t been taught to properly compare these features against its training data, the model will give incorrect (and sometimes dangerous) results. If anyone with these features can unlock your phone, this creates a massive security problem.
Conclusion
Few-shot learning can reduce your need for large datasets. Humans have been using few-shot learning since the very beginning. We’ve only recently adapted it for AI. There are some hurdles. Generalization, data diversity and feature representation present major obstacles when creating small models. Transfer learning, data augmentation, meta learning and metric learning give us great tools to overcome these challenges not just in large models, but small models as well.
No credit card required