One type of machine learning (ML) is ‘unsupervised learning’ which identifies hidden patterns in unlabeled visitor data such as grouping or anomalies. It does not rely on labels, so it is not affected by the issues posed by GIGO (‘Garbage in, garbage out’, in this case, the lack of definitive bot signature data) and the mutation of bot characteristics. On the other hand, ‘unsupervised learning’ helps in identifying bots with anomalous characteristics through anomaly detection and bot clusters (clustering) that possess similar characteristics. However, certain human visitors can also possess anomalous characteristics or groupings. For example, some users of a website may be seen to have very high levels of engagement. Such users could get flagged as anomalies or clusters by conventional bot detection systems, hence a direct application of unsupervised learning to bot detection can result in false positives (i.e., humans being mistaken for bots).
Therefore, for effective bot detection, a combination of supervised and unsupervised learning approaches ─ known as ‘semi-supervised’ learning ─ is applied, which works at a higher level of abstraction to discern a visitor’s intent, going beyond simple interaction-based behavior analysis even in the absence of definitive bot signatures.