Published an article
https://ieeexplore.ieee.org/document/9656762
Pre-trained deep learning models underpin many public-facing applications, and their propensity to reproduce implicit racial and gender stereotypes is an increasing source of concern. The risk of large-scale, unfair outcomes resulting from their use thus raises the need for technical tools to test and audit these systems. In this work, a dataset of 10,000 portrait photographs was generated and classified, using CLIP (Contrastive Language–Image Pretraining), according to six pairs of opposing labels describing a subject’s gender, ethnicity, attractiveness, friendliness, wealth, and intelligence. Label correlation was analyzed and significant associations, corresponding to common implicit stereotypes in culture and society, were found at the 99% significance level. A strong positive correlation was notably found between labels Female and Attractive, Male and Rich, as well as White Person and Attractive. These results are used to highlight the risk of more innocuous labels being used as partial euphemisms for protected attributes. Moreover, some limitations of common definitions of algorithmic fairness as they apply to general-purpose, pre-trained systems are analyzed, and the idea of controlling for bias at the point of deployment of these systems rather than during data collection and training is put forward as a possible circumvention.