Pretraining

Pretraining is the process of training a machine learning model on a large, general-purpose dataset before fine-tuning it for specific tasks. During this stage, the model is not optimized for a single goal but instead learns broad patterns, structures, and relationships in the data. This training often relies on unlabeled or weakly labeled data and general learning objectives, allowing the model to develop foundational representations rather than task-specific skills.

Types of pretraining techniques include

self-supervised pretraining, where the model learns by predicting parts of the data from other parts (for example, predicting missing words in text);
supervised pretraining, where the model is first trained on a large labeled dataset for a general task;
unsupervised pretraining, where the model learns patterns and structures directly from unlabeled data; and
contrastive pretraining, where the model learns by comparing similar and dissimilar data examples and pushing them apart or pulling them together in representation space.

Pretraining provides the model with a strong starting point instead of beginning with random values. As a result, during fine-tuning, it can learn faster, require less task-specific data, and achieve better performance.