Face detection is among the most commonly used computer vision technologies. Authentication and acceptance based on dimensions can be used in many situations. The first stage of face recognition usually consists of detecting and locating faces in photographs or videos. The device output will benefit remarkably from an effective detection algorithm, and vice versa. Therefore, face detection is one of the key steps in the application of face recognition systems. Collecting photographs and videos is becoming increasingly easy, due to the proliferation of mobile devices and smart cameras. The computational power of such devices, however, is relatively small. The best solution to this problem will rely on finding algorithms that are quicker and more efficient.
A big concern observed in face detection is that the different face scales of detection accuracy varies significantly for the very same detector for one image. Currently, to solve this problem many face-detection methods have tried to address various scales by using multiple network architectures. Another way is to use the different levels of functions taken from the last few layers of the network.
The key explanation for this is that the algorithm can detect objects of multiple sizes. YOLOv3 has obtained the cutting edge performance on the COCO dataset. However, the output was not as good as expected when applied to face detection. On the one hand, the dimensions of anchor boxes in YOLOv3 appropriate for the COCO dataset are not usually appropriate for facial detection, on the other hand, facial detection only requires to detect and locate faces and do not need to identify 8 kinds of object types as in the COCO dataset. YOLOv3-based approach to face detection focuses primarily on the selection strategy of a series of more appropriate anchor boxes and using a novel loss function o solve this problem.. The proposed face detector YOLO-face based on YOLOv3 has a much better performance compared to YOLOv3 by training on the Broader FACE testing dataset.
1. Deeper darknet that outperforms darknet-53, particularly when detecting small faces is the backbone of the new architecture.
2. Loss of MSE and loss of GIoU mixed and proposed a new regression loss.
3. Anchor boxes are learned by clustering k-means which are more appropriate for face detection.
Fig.1 - Architecture of YOLO-face. The feature extraction network has 71 convolution layers, and it reduces the size of feature maps by the progressive stride 2 layers. The detection network has a structure similar to FPN for extracting features from different map scales.
Anchor Boxes
Each face labels are split into groups k using the IoU as anchor box width. Then take the mean values of the sizes of the k class anchor box as the centers of the current clusters. Repeat before convergence occurs. Set the initial cluster centers, k to 9, in our experiments. To suit face detection, the horizontal anchor boxes were vertically transposed.
Loss Function
During the training of the model, YOLO optimizes a multi-part loss function consisting of four parts, namely loss of regression, loss of trust, loss of classification, and loss of no item. Face detection however is a binary classification problem. To make the total loss function more appropriate for face detection we are revising the weights empirically to 2:1:0.5:0.5.
YOLOv3 vs YOLOv2 vs YOLO face
Fig2. Few examples of the effects of face detection. Left: YOLOv2 results, Middle: YOLOv3 observed effects. Right: YOLO-face results.
Conclusion:
Comparing with other detection models YOLO face performs better having high accuracy from long distance.
Reference: Chen, W., Huang, H., Peng, S, eta al. YOLO-face: a real-time face detector. Vis Comput(2020)
Thanks for sharing. Could you share source code or the weights of the model?