OpenFilter Beautified Datasets

Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics. Given the wide adoption of AR face filters and the importance of faces in our social structures and relations, there is increased interest by the scientific community to analyze the impact of such filters from a psychological, artistic and sociological perspective. However, there are few quantitative analyses in this area mainly due to a lack of publicly available datasets of facial images with applied AR filters. The proprietary, close nature of most social media platforms does not allow users, scientists and practitioners to access the code and the details of the available AR face filters. Scraping faces from these platforms to collect data is ethically unacceptable and should, therefore, be avoided in research. We developped OpenFilter, a flexible framework to apply AR filters available in social media platforms on existing large collections of human faces. Here, we share FairBeauty and B-LFW, two beautified versions of the publicly available FairFace and LFW datasets and we outline insights derived from the analysis of these beautified datasets.

OpenFilter pipeline
The OpenFilter pipeline


Most of the AR filters available on social media platforms can only be applied in real-time on selfie images captured from the camera. Hence, it is challenging to carry out quantitative and systematic research on such filters. OpenFilter fulfills such a need by enabling the application of AR filters on publicly available datasets of faces.

Find more information in the Github folder for OpenFilter.

Beautified datasets

Through OpenFilter, we have beautified two datasets (FairBeauty and B-LFW), available online.


FairBeauty is a beautified version of the FairFace dataset, a dataset promoting algorithmic fairness in Computer Vision systems. The choice of this dataset is motivated by its focus on diversity and our will to be representative of the population of Instagram, without biasing the results. Eight popular, AR beauty filters are applied on equal portions of the original dataset.

Filters example
Filters example

The image above is an example of the eight different beauty filters applied to the left-most image from the FairFace dataset. From left to right and top to bottom: filter 0 “pretty” by herusugiarta; filter 1 “hari beauty” by hariani; filter 2 “Just Baby” by blondinochkavika; filter 3 “Shiny Foxy”; filter 4 “Caramel Macchiato” and filter 5 “Cute baby face” by sasha_soul_art; filter 6 “Baby_cute_face_” by anya__ilicheva; filter 7 “big city life” by triutra.

The AR beauty filters detect the position of the faces in an original image and super-impose digital content to modify the original facial features. As these filters apply the same transformation to the facial features of all faces, we hypothesize that they homogenize facial aesthetics making the beautified faces more similar to each other. To determine the homogenization of the filtered faces, we consider different face verification models, i.e. DeepFace, VGG-Face, Facenet, CurricularFace, MagFace and ElasticFace. The code for these experiments are available in the folders deepface_vggface_facenet, curricularface, elasticface and magface.


B-LFW is a beautified version of the LFW (Labeled Faces in the Wild) dataset, a public benchmark dataset for face verification, designed for studying and evaluating unconstrained face recognition systems. We have beautified LFW aligned at 112x112 pixels, using the same eight popular Instagram beauty filters, using different filters on different images from the same individuals.

The analysis of the B-LFW dataset may lead to new insights on understanding the impact of such filters on face recognition, particularly when no explicit occlusion is applied. We evaluate the performance of three state-of-the-art models (CurricularFace, ElasticFace and MagFace) on face recognition both on each single beauty filter (applied to LFW) and on the B-LFW dataset (in which different beauty filters are applied on different images of the same individual). The code for these experiments is available in folders curricularface, elasticface and magface.

License and attribution

The framework and the datasets are part of a scientific paper currently under review for the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks, under the title “OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters”, by Piera Riccio, Bill Psomas, Francesco Galati, Francisco Escolano, Thomas Hofmann and Nuria Oliver.

The datasets are shared under license CC BY-NC 4.0. The code is shared under GNU General Public License, version 2.