Pre-training
Pre-training is the process of training a machine learning model on a large, general-purpose dataset before fine-tuning it for specific tasks. During this stage, the model is not optimized for a single goal but instead learns broad patterns, structures, and relationships in the data. This training often relies on unlabeled or weakly labeled data and general learning objectives, allowing the model to develop foundational representations rather than task-specific skills.
Types of pre-training techniques include
- self-supervised pre-training, where the model learns by predicting parts of the data from other parts (for example, predicting missing words in text);
- supervised pre-training, where the model is first trained on a large labeled dataset for a general task;
- unsupervised pre-training, where the model learns patterns and structures directly from unlabeled data; and
- contrastive pre-training, where the model learns by comparing similar and dissimilar data examples and pushing them apart or pulling them together in representation space.
Pre-training provides the model with a strong starting point instead of beginning with random values. As a result, during fine-tuning, it can learn faster, require less task-specific data, and achieve better performance.