School of Human Sciences
Face and Voice Recognition Lab
Institute of Lifecourse Development
University of Greenwich
29 June 2023
University of Greenwich Face and Voice Recognition Lab: Volunteer Research Participant Pool.
How do your scores compare to the those by our volunteer pool?
The information in this blog was correct on 6 July 2023
Thank you to members of our volunteer pool who contribute to our research. We have been asked many times by volunteers, “what is a good score on these tests?” and “am I a super-recogniser?” and we hope this blog will help to partly answer these questions. The first question is far easier to answer than the second.
Establishing the volunteer participant pool
The volunteer participant pool of the Face and Voice Recognition Lab of the University of Greenwich was established in 2015, when the Could you be a Super-Recogniser Test (see Davis, 2019), and the Glasgow Face Matching Test (Version 1) (GFMT; Burton et al., 2010) were first uploaded to our website. Participants could leave their e-mail address if interested in taking part in future research and if they provided consent for us to retain test scores in our database.
Later, the Cambridge Face Memory Test: Extended (CFMT+; Russell et al., 2009) was added to the test battery, followed by the Short-Term Face Memory Test 30-60 (STFMT3060), which at one time was described as the Adult Face Memory Test (Robertson et al., 2019), and then the Kent Face Matching Test (KFMT; Fysh & Bindemann, 2018) in 2021. In 2022, volunteers were also asked to complete the Cambridge Cars Memory Test (CCarMT) (Dennett et al., 2011), the Cambridge Female Face Memory Test (CFemFMT), and the Glasgow Face Memory Test: Version 2 (High 40) (GFMT2HI40).
We have additionally asked volunteers if we could retain scores on other tests (e.g., Voice Recognition Tests) for the PhD research of Ryan Jenkins. The voice tests are reported in a different blog.
Over a million participants from around the world completed the Could you be a Super-Recogniser Test within a few months of publication (7 million+ by 2023, see Appendix A), and we soon found that those who left their contact details for future invites to research tended to be far better at face recognition than what would be expected of a representative sample randomly drawn from members of the public. In other words, participants taking part in our tests tend to score far higher as a group than would be expected from a typical sample of the general population. There is clearly a self-selection bias at work.
It is most likely that super recognisers (even though rare in the population) are far more likely than non-super-recognisers to volunteer to take part in our research.
For this reason, please do not be deterred from contributing to our research if your test results appear disappointing in comparison to some others. If we invite you to contribute, we are always very grateful for your contribution. We always need volunteers to contribute who do not score in the super-recogniser range. We will never invite someone who is ineligible for a specific project. We do not want to waste anyone’s time.
But who contributes to our research?
At the time of writing (6 July 2023), we have data of 53,066 participants in the volunteer pool. At one time there were over 100,000, but when GDPR was introduced in 2018, all were e-mailed to check they wished to remain on the database. If no reply was received, their record was deleted. The pool was reduced to about 38,000 and it has been growing ever since.
Not all volunteers, have, however, provided a full set of the demographic data, partly because when the database was created, we did not always ask if we could indefinitely store these data. The reliability of these data is important, as a large body of research suggests that these factors can impact face recognition ability, and scores on different face recognition tests. Most people are aware of the cross-ethnicity effect (e.g., see Meissner & Brigham, 2001 for a review), in that people tend to be better at recognising faces from their own ethnicity than other ethnicities. Some research has found similar cross-age (e.g., Anastasi & Rhodes, 2005), and cross-gender effects (e.g., Herlitz & Lovén, 2013). It is possible these effects are driven by levels of contact with those from specific out-groups (Meissner & Brigham, 2001). For instance, someone whose ethnicity is East Asian, but who lives in Europe, may be more likely to score higher on a test comprising White faces only, than a gender- and age-matched individual who lives in East Asia.
With collaborators we have investigated the cross-age (baby faces) (Belanova et al., 2018), and cross-ethnicity effects (Carroll et al., 2021; Robertson et al., 2020) in super-recognisers. In general, super-recognisers were significantly superior on the own-group and other-group tests in comparison to those of typical ability from the same demographic group as the super-recognisers.
Table 1 provides an indication of the countries of residence of most members of the volunteer pool. Not surprisingly as the University of Greenwich is in London, over 20% of the volunteer pool come from the UK. The proportion of the pool located in other countries is probably dependent on the impact of media articles about super-recognition. Many media articles contain a link to the Could you be a Super-Recogniser Test.
Table 1: The 30 most common countries represented by the volunteer pool
What has been lost is that prior to the 2018 clear out following GDPR introduction, the fourth highest proportion came from China, which included Hong Kong. Hardly any members of the volunteer pool claim to be from China now, possibly a consequence of being unable to access western websites. On the opposite extreme, when the pool was first established, 17 claimed Antarctica as their temporary home. One e-mailed to say they were working on a scientific base on the south pole, while another participant once e-mailed from the ice cap in the far north. They took the tests while sheltering from a storm. When e-mailing, they commented they could hear a polar bear outside.
More than 60% of respondents describe themselves as female. We also find that females are proportionally more likely to respond to e-mail invites to research. Mostly the gender imbalance is not a problem, but when gender has been a key variable in some projects, we sometimes struggled to recruit enough males. Please help guys!!
The mean age of the pool at about 40-years is far higher than that of typical student participants recruited to a high proportion of psychology research projects (Figure 1). As a group, the pool is also aging, which is not surprising as new members constitute only a small proportion of the database. This is probably one of the strongest features of the pool in terms of representativeness. Face recognition ability appears to peak in the early 30’s. This is followed by a slow decline. We hope to publish recent data on this soon.
Figure 1: Age distribution of volunteer pool
Rather more worryingly to us, as it demonstrates a lack of representativeness, is that the vast majority of members of the pool describe their ethnicity as White (82.0%). Although this statistic is roughly the same as the proportion of White people in England and Wales (86%; Gov.uk, 2020), our participants come from around the world. This will limit some conclusions we can make from our findings, as the proportion of White people in the world population is substantially lower than this.
One reason may be that the tests described in this blog only contain White faces. We may be inadvertently deterring people from other ethnicities from contributing. We do employ tests containing faces of other ethnicities in our research and in our tests with police forces and businesses for job deployment purposes (see examples here and here). However, we restrict access to these tests, so as to ensure that no one taking them has an advantage from having practiced them.
Why do these tests contain White faces only?
The tests we create contain facial images provided by volunteers. White participants are far more likely to contribute. This is perhaps understandable given the potential bias from the dominance of White faces in our tests and regular negative media reports about errors associated with computerised (artificial intelligence) face technology. Our research is sometimes linked to this issue.
It has also come to light that some computerised face recognition research (not our research) has used facial stimuli of people who never provided consent for their images to be used in this way. For our research, we want to assure participants that everyone depicted in our tests has given their informed consent for image use. So much so, that we prefer participants who have taken our tests to provide facial images, as they will be the best informed as to the likely use of their images.
Nevertheless, the lack of suitable images has meant that we have not, so far been able to create freely available tests on the internet containing non-White faces. We are aiming to do so with the London Face Memory Test, which should roughly represent the diverse population of London – one of the most multi-cultural cities in the world. However, we are still short of images of people from some ethnicities. You are welcome to help us construct the London Face Memory Test, by uploading images of you HERE, using an updated easy, quick, photo upload system – you will receive a £5 Amazon voucher for 8 images. Instructions are in the link.
Alternatively, for another project we are keen on developing tests measuring how accurate we are at recognising faces when photos of those faces were taken at different stages in life. To help us construct the Age Distanced Face Test HERE, Please donate 8-12 photos of you, taken in different years. In return, we will send a £5 GBP Amazon voucher.
Please do not donate photos to both projects. And the images must be of you.
Test results (see Appendix B for test descriptions, Appendix C for links)
We often receive emails from participants wondering how their scores compare to those of other participants. The distributions of the test scores for each of the main tests taken by our participant pool are shown in Figures 2-9. You should be able to compare your own scores with these to get a good idea of how you may rate your face recognition skills.
Far more reliably, on some of the tests, you can also compare your scores with those expected from typical members of the public, as we have superimposed lines (in red) representing the approximate mean score achieved by what we believe is probably the most representative sample drawn from members of the public on some of the tests.
We have also included three lines representing the values two and one standard deviation (SD) above and one SD below the estimated population mean from those samples. A score at least 2 SD above the mean has often been criteria for super-recognition on a single test, as this is likely to be achieved by about 2% of the population. Approximately, 68% of the population would be expected to generate a score that falls within the +1 SD and -1 SD boundaries, whereas 14% would be expected to score in the +1 SD to +2 SD range.
If the > 2 SD standard were to be applied to two tests, fewer participants would be expected to achieve these criteria. Even fewer will achieve the criteria if three tests are used, and so on. Therefore, if multiple tests are used, some researchers may employ somewhat lower thresholds on each individual test, in order to maintain an super-recognition eligibility level of approximately 2%.
Could you be a Super-Recogniser Test (CYBSRT)
Figure 2: Could you be a Super-Recogniser Test scores (2 SD = 13.44)
Readers might notice that we only report data from just over 21,000 participants when it is probable all volunteers will have taken this test. The reason is that the test is anonymous to take, and we only have data if participants manually enter their score into the system when they take the next three tests. About 75% do this. However, when we first created the volunteer pool, this manual entry system was not set up.
Cambridge (Male) Face Memory Test: Extended (CFMT+)
Figure 3: Cambridge (Male) Face Memory Test: Extended (CFMT+) scores (2 SD = 95.3)
Note: CFMT+ “population norms” (n = 254) were reported in Bobak et al. (2016)
Glasgow Face Matching Test (GFMT)
Figure 4: Glasgow Face Matching Test (GFMT) scores (2 SD = 40.3)
Note: GFMT “population norms” (n = 194) were reported in Burton et al. (2010)
Kent Face Matching Test (KFMT)
Figure 5: Kent Face Matching Test (KFMT) scores (2 SD = 33.4)
Note: KFMT “population norms” were compiled from three articles (Fysh, 2018; Fysh & Bindemann, 2018; Gentry & Bindemann, 2019).
Figure 6: Short-Term Face Memory Test 30-60 (STFMT3060) scores (2 SD = 55.6)
The following three tests are new to the battery and no projects have been published with unambiguously representative data in order to provide any population norms.
Cambridge Cars Memory Test (CCarMT)
Figure 7: Cambridge Cars Memory Test (CCarMT) scores
Glasgow Face Matching Test: Version 2 (High 40) (GFMT2HI40)
Figure 8: Glasgow Face Matching Test: Version 2 (High 40) (GFMT2HI40) scores
Cambridge (Female) Face Memory Test (CFemFMT)
Figure 9: Cambridge (Female) Face Memory Test (CFemFMT) scores
Defining super-recognisers for research projects
We are certain that many people reading this blog will be hoping to find out if they can be defined as a super-recogniser based on their scores on these tests. Our answer is that if your scores are in the highest range on all the face identity processing tests, then there is an extremely good chance that you may be a super-recogniser. However, there is no agreed definition for super-recognition in the academic literature and recently, there has been a tendency for researchers to create their own tests in order to measure the skill. No test will ever perfectly measure a cognitive skill such as face recognition (a colloquial term is that all tests will contain a “margin of error”), and participants, including super-recognisers, tend to generate a varied range of scores across a battery of tests.
Depending on the specific research project, we have tended to define someone as a super-recogniser if they have achieved scores on 2-3 tests that approximately 2% of the population are capable of. However, we normally used different minimum standards for super-recognition in research and for testing individuals applying for jobs as a super-recogniser.
Increasing the number of tests used in a face recognition test battery increases the accuracy of face recognition ability estimates. We are able to use more tests for job applicants, mostly because more participants are willing to take multiple tests to secure those jobs. Quite naturally, volunteers from our database are far less motivated to contribute to research as there is rarely a job on offer.
We are always very grateful when so many participants do contribute to our research, but as is clear from Figures 2-9, those participants tend to be the highest scorers. For sufficient statistical power and to make generalisations about the research results, we also need as many volunteers as possible from lower down the face recognition ability spectrum to take part. One of the limitations of our research is that far fewer lower-range scorers have taken all of the tests, meaning we are restricted to accessing scores from fewer tests than is ideal, when we classify participants as super-recognisers and/or typical-ability controls.
However, we are currently reviewing our classification procedures. We are aiming to use scores on all the tests listed in this blog (except the Could you be a Super-Recogniser Test and the Cambridge Cars Memory Test) to help us with that face recognition ability classification process. As noted, an increase in the number of the tests taken is associated with a more accurate estimate of true face recognition ability, but only as long as those tests were taken in reliable circumstances that provide equal opportunities for all to achieve their highest scores. We will never use some published tests as we know some participants would complain that they are unfairly designed.
According to participants from the volunteer pool, key features of a reliable test are: -
Instructions (implicit and explicit) should be bias free
There should be practice trials as they reduce stress
There should be no distractions or internet problems
Each trial should be displayed with a number, and some indication of progress should be provided, to reduce distractions from wondering how much longer will the test take
Tests should not contain too many trials as this induces boredom or distractions
If a test has large numbers of trials, there should be opportunities to take a break
Stimuli should not be forcibly displayed for too long on the screen as this induces boredom
On the other hand, stimuli should be viewable for a sufficient amount of time for everyone to learn those faces if a memory test and/or to make a decision
Stimuli should be scrutinised, and pilot tested to ensure they do not contain inappropriate but salient cues that bias responses. Specific faces or specific trials should not stand out from the others in any way. These trials may encourage second guessing and induce doubt
Many participants seek to maximise performances by attempting to deduce the researcher’s purpose in including each stimulus in a test. For this reason, tests should not contain trick trials (e.g., as an extreme, identical twins wearing similar clothes might be included by researchers to encourage incorrect “same person” responses. However, participants might quickly deduce that this is a trick and that they are twins by the unusual symmetry and similarity of appearance. Similar but opposite effects may occur, if a pair of images are very different in appearance (e.g., one has a very different hairstyle or facial expression from the second), or if a larger proportion of trials than normal are perceived as harder. Participants may then claim to distrust the researcher’s motivations and complain of not being able to self-monitor their performance. This decreases motivation and induces guessing, even on the easier trials
Participants should be fully debriefed after taking a test and provided with a score. For instance, if a matching test contains disproportionately large numbers of “same face” or “different face” trials, participants should be informed that this pattern may not occur in subsequent tests
Participants should not be tired
Screen size should be sufficient for all and ideally of a similar size to the system used most regularly at work or home
Note, some of these recommendations directly contradict others. Test construction may need to be a compromise. However, the issue to avoid described above become more likely if everyone takes the full battery of six tests. Furthermore, even if there are no problems reported, participants sometimes misunderstand instructions, or attempt to use inappropriate strategies to generate high scores and although these have no impact on face recognition ability, they may impact performances on face recognition tests.
Therefore, for our research classification process, we are intending to exclude anomalous test performances, particularly if the volunteer reports a problem at the time, which is easy to do at the end of our tests. However, if a volunteer produces scores in the super-recogniser range on some tests, and scores in the typical ability control range on others (without an explanation), we do not believe it possible to classify their ability without asking them to retake at least some of the tests, or to take new tests.
If you have been recently invited to retake tests, or, indeed, to take the new tests, you will be contributing to the research into how best to classify super-recognition ability. We hope we will have resolved our classification procedures in time to update this blog before the end of 2023.
Defining super-recognisers for police and business projects
With police and business projects we use additional tests with different designs that contain faces of different ethnicities, and ages. We believe that to be offered employment drawing on their skills, super-recognisers need to be able to generate exceptional scores on a wide range of different tests. The tests we use measure four main components (see Davis, 2019, Davis, 2020). Three of these components are short-term face memory (as also measured using the CFMT+ and STFMT3060), simultaneous face matching (as also measured using the GFMT and KFMT), and spotting faces in a crowd (see Davis et al., 2018, for a description of an early version of this test). The fourth component, however, long-term face memory ability, may best represent how super-recognition is perceived in the minds of super-recognisers themselves. Definitions of super-recognition in the media and in research articles often refer to super-recogniser’s superior ability to recognise people spontaneously and reliably after delays of months or even decades. In that time the appearance of those people will have changed.
On this basis, it would be hard to argue that an accurate definition of super-recognition should be that “super-recognisers are individuals who possess extraordinarily accurate perceptual and long-term face identity processing skills” and that “highly superior long-term face memory is the hallmark of super-recognition”.
We have published one two-experiment paper examining face memory retention, for up to two months only (Davis et al., 2020). A substantial proportion of participants classified as super-recognisers based on their CFMT+ and GFMT scores only, performed quite poorly on these tests of long-term face memory.
As noted, there may be many reasons for poor performance on any face recognition test that bear no relationship with true ability (e.g., distractions, illness, lack of sleep, internet disruptions). These factors may have a greater impact as the gap between learning and test phases widens. None of the tests described in this blog measure this skill. Therefore, we are naturally wary of certifying someone as a super-recogniser unless they have completed the full set of tests we use for police, some of which should be taken in invigilated examination conditions, to ensure the integrity of the process.
To the best of our knowledge, no other research group in the world or police organisation, includes face memory tests for super-recognisers that measure memory for faces for more than a few minutes. It is our contention that without tests of this type, it is impossible to describe someone as a super-recogniser.
Nevertheless, we have also compiled a battery of rapid-response simultaneous face matching tests for job roles in organisations that only require superior face comparison/matching skills. Memory for faces is not required. Someone scoring in the top 2% of the population on tests measuring this skill might best be described as a “super-matcher”.
The method by which we decide whether someone is a super-recogniser or not when using multiple tests is provided in Appendix B.
Can members of the public take the face recognition tests we use with police?
It is possible for members of the public to take the full set of tests (see link below).
The University of Greenwich has a research consultancy contract with Super-Recognisers International (https://superrecognisersinternational.com/), who can arrange for administration of the tests. Those who achieve our super-recogniser criteria across all four components (scores expected by approximately the top 2% of the population), or our super-matcher criteria (scores expected by approximately the top 2% of the population on the simultaneous face matching tests only) can additionally become a licensee of the Association of Super-Recognisers (https://www.associationofsuperrecognisers.org/). Certificates are issued for those achieving standards.
However, Super-Recognisers International will charge for this service (volunteer pool participants = £30).
Why do Super-Recognisers International charge for this service?
This charge funds the consultancy contract with the University of Greenwich and pays for the University of Greenwich student staff who set up and monitor the tests and analyse the data. They also deal with the large numbers of e-mails we receive every week. It would be impossible to provide a free test service, as, for instance, e-mail security systems regularly treat our invite e-mails containing links to the second part of the Long-Term Face Memory Test as junk. This is more common if participants have set up highly secure systems, and do not follow our instructions to load our e-mail address into their contact lists. If participants do not check their junk mail, the delay between the two parts could be substantial. We then need to organise the opportunity to retake these tests, or alternatives of the same type. We cannot send e-mail reminders, as each reminder will be slightly more likely to end up as junk.
We have set up reliable systems with police to stop this from happening. It is much harder to operationalise, when participants may be from anywhere around the world.
There are two stages.
1. Online tests: The link to the online tests and up to date information about the costs of completing these tests can be found HERE. These measure Short-Term Face Memory, Simultaneous Face Matching, and Long-Term Face Memory. Some of the tests reported in this blog are included in the battery for research purposes. They do not need to be retaken. Previous scores can be manually entered as long as participants use the same e-mail address as on the volunteer pool database. If someone chooses to take the tests, their scores will be sent by the University of Greenwich to Super-Recognisers International. The scores will be released only if that participant has paid the required funds. The University of Greenwich is not involved in any payment or any agreements between the participant and Super-Recognisers International.
2. The examination-administered invigilated tests: These tests are normally incorporated into online or live training courses that provide a wider insight into legal and technical issues associated with jobs in which superior face recognition skills are important. The tests include a Spotting Faces in a Crowd Test, and because of the invigilated procedures, we also request participants allow us to confirm the possession of superior Short-Term Face Memory and Simultaneous Face Matching skills. Some licensees have secured jobs based on their test results, and therefore all involved must be assured that high scores have been achieved in reliable conditions. There are, however, substantial costs involved in administering tests that are conducted in examination conditions with an invigilator either present in the room or remotely monitoring progress. In advance, we also need to ensure that the videos can be downloaded and played on participant’s laptops.
Because of the costs involved, we recommend that participants are very confident that they do possess superior skills before taking them. Those whose scores are in the highest range in comparison to others as depicted on the histograms in Figures 2 to 9 will be the most likely to achieve the required standards. However, it cannot be guaranteed that even the highest scorers on the tests available to the public will pass the final examination phase. Indeed, about 10% of those who take the examinations fail to achieve criteria, even though their scores will be higher than the vast majority of the population (we always offer one opportunity to retake the exams, if participants report problems), while another 10%-20% achieve the status of super-matcher. The University of Greenwich is unable to provide predictions as to how anybody will perform in future tests based on their past performances.
A paper describing the results of this collaboration is currently being prepared for publication. It has taken longer than expected as the lab was exceptionally busy with police projects in 2021 and 2022. We hope it will be published in 2023.
Ethics associated with the volunteer research participant pool
The research associated with creating the Greenwich Face and Voice Recognition Lab research participant pool database was approved by the University of Greenwich Research Ethics Committee in 2015. It has been updated several times since. All participants give their consent for their data to be stored. In a second repository the participants' email addresses are separately stored so that they can be invited to participate in future research. When the EU General Data Protection Regulation (GDPR) went into effect in 2018, all participants were reminded by email that we had saved their data. If there was no response, the corresponding data was deleted. We have been conducting a similar exercise in 2023, and we expect the database of e-mail addresses to be reduced to about 30,000. Those participants who have contributed to the tests since 2018 immediately receive information about GDPR and how to withdraw their consent to receive e-mail invites at any time.
More information about our ethical and data protection procedures can be found here.
Anastasi, J. S., & Rhodes, M. G. (2005). An own-age bias in face recognition for children and older adults. Psychonomic Bulletin & Review, 12, 1043-1047. https://doi.org/10.3758/BF03206441
Arrington, M., Elbich, D., Dai, J., Duchaine, B., & Scher, K. S. (2022). Introducing the female Cambridge face memory test – long form (F-CFMT+). Behavioral Research, 54, 3071–3084 (2022). https://doi.org/10.3758/s13428-022-01805-8
Belanova, E., Davis, J. P., & Thompson, T. (2018). Cognitive and neural markers of super-recognisers' face processing superiority and enhanced cross-age effect. Cortex, 98, 91-101. https://doi.org/10.1016/j.cortex.2018.07.008 (Download pre-print here: https://bit.ly/bdtb2018)
Bobak, A. K., Pampoulov, P., & Bate, S. (2016). Detecting superior face recognition skills in a large sample of young British adults. Frontiers in Psychology, 7(1378). https://doi.org/10.3389/fpsyg.2016.01378
Burton, A. M., White, D., & McNeill, A. (2010). The Glasgow face matching test. Behavior Research Methods, 42(1), 286-291. https://doi.org/10.3758/BRM.42.1.286
Correll, J., Ma., D. S., & Davis, J. P. (2020). Perceptual tuning through contact? Contact interacts with perceptual (not memory-based) face-processing ability to predict cross-race recognition. Journal of Experimental Social Psychology, 92, 104058, https://doi.org/10.1016/j.jesp.2020.104058
Davis, J. P. (2020). CCTV and the super-recognisers. In C. Stott, B. Bradford, M. Radburn, and L. Savigar-Shaw (Eds.), Making an Impact on Policing and Crime: Psychological Research, Policy and Practice (pp 34-67). London: Routledge. ISBN 9780815353577. https://doi.org/10.4324/9780429326592 (Download free pre-print here: https://bit.ly/34Phwjm)
Davis, J. P., Bretfelean, D., Belanova, E., & Thompson, T. (2020). Super-recognisers: face recognition performance after variable delay intervals. Applied Cognitive Psychology, 34(6), 1350-1368. https://doi.org/10.1002/acp.3712 (Download free pre-print here: https://bit.ly/3slkg0m) Davis, J. P., Treml. F., Forrest, C., & Jansari, A (2018). Identification from CCTV: Assessing police super recognisers ability to spot faces in a crowd and susceptibility to change blindness. Applied Cognitive Psychology, 32(3), 337-353. https://doi.org/10.1002/acp.3405
(Download pre-print here: https://bit.ly/3dtfj2018)
Dennett, H. W., McKone, E., Tavashmi, R., Hall, A., Pidcock, M., Edwards, M., & Duchaine, B. (2011). The Cambridge Car Memory Test: A task matched in format to the Cambridge Face Memory Test, with norms, reliability, sex differences, dissociations from face memory, and expertise effects. Behavior Research Methods, 44(2), 587–605. https://doi.org/10.3758/s13428-011-0160-2
Fysh, M. C. (2018). Individual differences in the detection, matching and memory of faces. Cognitive Research: Principles and Implications, 3(20), 1-12. https://doi.org/10.1186/s41235-018-0111-x
Fysh, M. C., & Bindemann, M. (2018). The Kent face matching test. British Journal of Psychology, 109(2), 219-231. https://doi.org/10.1111/bjop.12260
Gentry, N. W., & Bindemann, M. (2019). Examples improve facial identity comparison. Journal of Applied Research in Memory and Cognition, 8(3), 376-385. https://doi.org/10.1016/j.jarmac.2019.06.002
Gov.uk (2020). Population of England and Wales. Downloaded 20 November 2021 from, https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/national-and-regional-populations/population-of-england-and-wales/latest
Herlitz, A., & Lovén, J. (2013). Sex differences and the own-gender bias in face recognition: A meta-analytic review. Visual Cognition, 21(9-10), 1306-1336. https://doi.org/10.1080/13506285.2013.823140
Meissner, C. A., & Brigham, J. C. (2001). Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review. Psychology, Public Policy, and Law, 7(1), 3-35. https://doi.org/10.1037/1076-8922.214.171.124
Robertson, D., Black, J., Chamberlain, B., Megreya, A. M., & Davis, J. P. (2020). Super-recognisers show an advantage for other race face identification. Applied Cognitive Psychology, 34(1), 205-216. DOI: 10.1002/acp.3608 (Download free pre-print here: https://bit.ly/rbcmd2020)
Russell, R., Duchaine, B., & Nakayama, K. (2009). Super-recognizers: People with extraordinary face recognition ability. Psychonomic Bulletin & Review, 16(2), 252–257. https://doi.org/10.3758/PBR.16.2.252
White, D., Guilbert, D., Varela, V.P.L., Jenkins, R., & Burton, A. M. (2022). GFMT2: A psychometric measure of face matching ability. Behavioral Research, 54, 252–260 (2022). https://doi.org/10.3758/s13428-021-01638-x
Appendix A: Tests described in this blog
Could You be a Super-Recogniser Test (CYBSRT). This test is mostly provided for fun as no 14-trial test will be able to reliably measure the entire spectrum of face recognition ability in humans. Nevertheless, moderate positive correlations are normally found between scores on this test and other short-term face memory tests, and it therefore can be used as a reasonable predictor as to whether someone could possess good, typical-range, or poor face recognition ability. We mostly include it at the start of police projects as it provides an excellent introduction as to what to expect and participants are more practiced and reportedly less anxious, when they take the more reliable tests later.
It is almost certain that someone scoring below 10 out of 14 is not a super-recogniser. However, a high proportion of participants who take the test, use a mobile phone despite our advice not to. As the faces in the arrays may be too small on some phones to properly view key facial features, we have retained a message on completion, stating, “if you scored 10 or above, you could be a super-recogniser”. Every other test in our battery requires participants to select an option that states that they are not using a mobile phone as we want to discourage their use. Compared to laptops or PCs, mobile phone use is reliably associated with lower scores. If you did use a mobile phone when taking our tests, you are welcome to take all of them again. All we ask is that you please respond when prompted that this will be your second (or other) attempt, as you used a mobile phone the first time(s).
Cambridge Face Memory Test: Extended (CFMT+) (Russell et al., 2009). This 102-trial standardised short-term face memory test is probably the most commonly used worldwide test of face recognition, although it was not originally designed to measure super-recognition. Due to anomalies such as that the faces are mainly depicted with no hair, and that it has been available on the internet for participants to practice over a number of years, it would be unwise to define anyone as a police super-recogniser based on the results of this test alone. The Mean score on this test in most research is about 70 out of 102 (SD = 10), and scores of 90-95 out of 102 have been used to diagnose super-recognition in previous research, being representative of 2 SD above control means, a score likely to be achieved by about 2% of the population.
Glasgow Face Matching Test (GFMT) (Burton et al., 2010). This 40-trial test measures the ability to distinguish between two highly similar appearing white-ethnic facial images. It does not rely on memory. Participants decide whether 40 pairs of high-quality facial photographs depict the same person or not. Half of the trials are ‘matched’ (i.e., the same person is depicted in the pair), half are mismatch trials. Participants are warned in advance of the randomly ordered but equal match-to-mismatch trial ratio. This test may be unsuitable for use to classify someone as a super-recogniser. Many participants score 100%. On the other hand, if a participant scores well under 40, they are very unlikely to be a super-recogniser.
Short-Term Face Memory Test 30-60 (STFMT3060). This 60-trial test measures short-term memory for unfamiliar black-and-white ethnic faces. In the learning phase of this test, 30 male faces are sequentially presented in identical purple sweatshirts for 10 sec. In the test phase, new photos of the 30 ‘old’ faces are randomly intermixed with 30 ‘new’ faces, wearing a variety of different sweatshirts. Participants respond as to whether faces are ‘old’ or ‘new’.
The Kent Face Matching Test (KFMT) (Fysh & Bindemann, 2018). This 40-trial test measures the ability to distinguish between two highly similar appearing white-ethnic facial images. It does not rely on memory. Participants decide whether 40 pairs of high-quality facial photographs depict the same person or not. Half of the trials are ‘matched’ (i.e., the same person is depicted in the pair), half are mismatch trials. Participants are warned in advance of the randomly ordered but equal match-to-mismatch trial ratio.
Cambridge Cars Memory Test (CCarMT) (Dennet et al., 2011). This 72-trial standardised short-term memory test has an identical structure to the short-version of the Cambridge Face Memory Test. It allows us to extract cars from face scores to generate an estimate of Non-Face Object Memory Ability. However, the ability to recognise cars is partly based on exposure and interest in cars. We ask a question around this, to ensure we can interpret the results appropriately. However, an aim for 2022-2023 is to make available a series of three very different object memory tests. We think it unlikely anyone will be an expert on all four tests (including cars).
Cambridge (Female) Face Memory Test (CFemFMT) (Arrington et al., 2021). This 102-trial test was designed to act as a female equivalent for the Cambridge (Male) Face Memory Test: Extended (CFMT+). The authors demonstrated that mean scores on this test are approximately 13 out of 102 higher than mean scores on the male version of the test.
Glasgow Face Matching Test: Version 2 (High 40) (GFMT2HI40) (White et al., 2022). This 40-trial test is an update of Version 1 of the Glasgow Face Matching Test, using images taken from the same database. The authors produced different versions of the test to measure participants of different abilities. We use the High 40 version, which was designed for use with participants possessing superior ability.
Appendix B: The reliability of first attempts on a test and calculating z-scores for workplace projects
We always prioritise first attempt scores on a test when deciding if a participant has achieved our super-recognition standards. Everyone takes the test under more or less the same conditions the first time, meaning these scores will normally be the most reliable. On their first attempt, participants do not really know what to expect for a start and they will all be unfamiliar with all the faces. Nearly everyone improves on their second attempt on the same test and yet their face recognition ability has not improved. Scores improve because participants will be more familiar with the test design and requirements. Super-recognisers may gain an additional advantage as they are more likely to recognise the faces. For these reasons, we normally only retain second (or subsequent) attempt scores, if someone reports a problem when they made their first attempt. In our workplace testing regimes, we might ask them to wait several months before making that second attempt, so they are less influenced by the first.
When we calculate minimum criteria for classification as a super-recogniser for police and business projects, we amalgamate the first attempt scores of at least 100 super-recognisers who have piloted that test in the past. See Davis (2019) for an explanation of how we decide if someone is eligible to be a super-recogniser pilot tester.
The mean score achieved by the group of super-recogniser pilot testers is our standard for that test.
We convert this mean score by super-recognisers into a z-score of 0. Individual participant scores (e.g., x) are also converted into z-scores using the equation z = (x - Mean)/Standard Deviation.
This means that someone whose score on the test is exactly at the level of 1 standard deviation above the mean will be given the z-score of 1; someone scoring 1 standard deviation below the mean is given a z-score of -1.
Super-recognisers, like everyone, sometimes misunderstand instructions, press the wrong response, are tired, get distracted etc and so there is often substantial variability in first attempt scores.
For each of our four key categories (short-term face memory, simultaneous face matching, long-term face memory, spotting faces in a crowd), we calculate a mean z-score for each participant and then we amalgamate z-scores from each of the four categories to determine an overall mean z-score to determine if someone is a super-recogniser or not. The final super-recognition threshold is always slightly below z-score = 0, as approximately 50% of super-recognisers will score above zero. 50% below. The exact value normally depends on the number of tests and the policies of the client.
Other categories may also be employed (e.g., confidence, response times).
One of the reasons for encouraging volunteer participants to take the tests more than once in the summer of 2003 is to see if we could employ an alternative strategy by incorporating second or subsequent attempt scores in z-score calculations. We also want to determine whether there should be a minimum time interval between test attempts.
Because of our emphasis on retaining first attempt scores, we have never compiled a database of second or subsequent attempt scores to determine whether we could calculate reliable z-scores with these. The exception is the Glasgow Face Matching Test and the Cambridge Face Memory Test: Extended, as we asked participants to take these tests for a second time in 2022. The results were clear. If we used super-recogniser's second attempt scores to calculate z-scores on these tests, it would be very hard for subsequent participants to achieve the super-recogniser standard, as most super-recognisers achieved close to perfect scores. The same may not be true of the other tests.
Appendix C: Links to tests for new participants
Fun test: Could you be a Super-Recogniser Test
For participants who wish to sign up for future research as a volunteer, or just to find out how good they are at face recognition based on more reliable tests, the link to take the three follow up tests is here (Cambridge Face Memory Test: Extended (CFMT+), Glasgow Face Matching Test (GFMT), Short-Term Face Memory Test 30-60 (STFMT3060). They are available in the following languages (as are all the tests described in this blog).
We always invite participants to take the other tests described in this blog a few days after completing the first three-test battery.