As Conneau et al. (2018) state, “a probing task is a classification problem that focuses on simple linguistic properties of sentences''. The main assumption behind probing is that models require linguistic knowledge that they are tested on for better performance on natural language generation and other tasks that a model can be used for (Saphra, 2021).
Belinkov (2020) classifies existing methods as structural and behavioural. The structutal probing method is to take a sentence vector from a large language model and then give it as an input to a probing classifier, for example, logistic regression. The task of this diagnostic classifier is to put a label of linguistic feature to each sentence vectors. Behavourial probes do not require any classifier on top of vector representations from a model. An example of behavourial probes is a masking task when a language model that is probed has to fill in a masked token, for example, to put a right verb form in a sentence.
Probing methods get critical response for relying on the resusts of logistic regression that might be biased because of the data distribution. For this reason, other probing techniques are used, such as control tasks with selectivity (Hewitt and Liang, 2019) and Minimum Description Length (MDL) (Voita and Titov, 2020). For more information about these methods see original papers and Probing Pipeline documentation.