State of Transfer Learning in NLP

Abhinav Sharma
3 min readJan 7, 2022


First we’ll start with NLP. What is NLP?

NLP stands for Natural language processing, which refers to the branch of computer science- or we can say the branch of artificial intelligence , giving computers the ability to the understand the text in the same way as of humans.

Now what is transfer learning?

In the most recent times, we are getting better in predicting the future outcome with some great training models. But many of the machine learning tasks are of domain specific, in those cases trained models usually fails. In real world these trained data set will not work, it contains a lot of data and the model will not able to make accurate prediction. So basically the ability to transfer the knowledge from a pre trained model into a new condition is called as transfer learning.

Computer vision mostly uses transfer learning because of the availability of the pre-trained models which are trained in a very large amount of data.

If we take a case of Deep learning, it is a training data intensive, i.e. for deep learning if we don’t have more than 10,000 examples then deep learning will not work accurately there. Similar type of process do occur in NLP. Deep learning is always not the best approach for many data sets. Extreme training requirement, time and most importantly expense put the deep learning input out of reach for many contexts.

How Does Transfer Learning address the above problem?

Now if we talk about the data, big data has less an issue than small data. Transfer learning is the application that is gained from one context to another context. So the training time can be reduced by applying the knowledge from one model and some deep learning issues through taking some parameters to solve the small data problems.

For example, for a small task like recognizing a lion is far too intensive for deep learning. Instead of this, transferring some models such as high level concepts of inputs like size, color etc. of object could give us a high activations, since each of the questions corresponds with the image of a lion. With less training power and less computing data, the relationship between the input feature and the target becomes straightforward.

Advantages of Transfer Learning:

Three separate ways to solve deep learning issue with transfer learning:

· Using pretrained data

· Small memory requirement

· Short target model training

Some methods for Transfer Learning:

Parameter initialization (INIT).

This approach first trains the network on S i.e. source task that contains large datasets. And then uses the tuned parameters to initialize T, i.e. target task that contains small datasets. After the transfer, we may fix the parameters in the target domain.

Multi-task learning (MULT)

MULT, trains samples simultaneously in both domains.

Combination (MULT+INIT)

IN this we pretrain on the source domain S for parameter initialization, then train both S and T simultaneously.