Face recognition has become a commodity, you can now get it off the shelf. If your app needs to find faces, identify persons, detect age, gender, or other facial features, there’s an API you can subscribe to, and just start doing it.

Off The Shelf

Face recognition is a classical case of technology as an enabler. Ten years ago computers finding and recognizing faces was pure science fiction. Today – it’s just a library you include in your code, docker container you run, or an API call you from your application. The tech opens up new doors to enhance your existing applications, or create new ones.

First wave face recognition appeared on camera and image gallery applications. Cameras learned on how to automatically focus on the faces found in the image. Next, image gallery apps started asking you to name the faces found in your pictures. After a few examples, the app started suggesting who’s who on the images. As a result – you can now search your gallery by persons found in the images.

The next trend was kids’ social media apps (Snapchat filters were first?). You can add bunny ears, funny glasses or makeup to your selfies. The feature is based on detecting face landmarks in the image (like eye and ear positions), and then superimposing the funny features over the original image. Popular face swap apps work on the same landmark detection principle.

Next it would seem that we’ll have a bit more intelligent advertisement screens on shopping malls. Equipped with a camera and a tiny AI, the ad screen categorizes the passing persons by age and gender, and then shows the ads targeted just for you. Women between 25 and 30 years of age will get shampoo or hair color ads. Teen boys will get Cola Zero ads. Take the AI a bit further: “Hey dude, you seem to need a haircut!

Some use face recognition for access control: Open up your cellphone lock, login to computer, or authorize access to certain areas. Yes, it can be done, but I never recommend using your face as the only factor in authentication. You can use it in multifactor authentication, but never alone. Next, we look up a few reasons why.

When (and how) face detection fails?

Keep in mind: Technology is never perfect. By essence, machine learning tools are not even meant to give absolute truths, they only give you good statistical averages and estimations. If you treat you AI helpers this way, you’re on safe ground. If not – your are about to get seriously disappointed.

Face finding algorithms can find faces in the image where actually is none (this is called a false positive). Or in reverse, they fail to find a face where actually is one (false negative). Recognition may tell you with good confidence, that the face in the image belongs to person X, even if you have an image of person Y (false positive). Again in reverse, recognition fails to recognize a person, which is included in your training material (false negative). Why does this happen?

Images are just images

Try for yourself: Can you recognize for sure the person in image in the following situations:

  • Different lightning conditions: Daylight, dusk/dawn or night. Spotlight, direct light or back lightning. Natural light, shadow, fog. Indoors lightning. Strong over- or underexposure on the image.
  • Different angles: From left to right profile, bottom to top, or anything in between.
  • Same person 10 years older/younger, having different hairstyle or makeup, mustache, hat, cap, scarf?
  • Low res image, like typical surveillance camera: The person’s face may be just a few pixels of size, and black & white?

I sure can’t. There’s no magic. If I can’t, most likely the computer can’t do it either. This is called human parity in AI tests. The AI is said to be in par with humans if it gets as good results as real persons doing the same task. The machine does have one advantage over me: It can be taught to recognize far more people than I could never meet in my life. In theory, there’s no upper limit. The lesson? Have as many good quality images for training on each person (good lightning, all facial features showing, multiple angles – think of passport image from multiple angles). That, and organize the images to be recognized the same good quality as the training material.

Weighted training material

Most widely used algorithms favor the white males. This means that face detection and recognition accuracy is lower on females, asians, people with dark skin color, and other ethnical groups. The current winning explanation is that the neural nets doing the face featurization are trained with weighted training material. Think of this: The most accurate publicly available algorithms come from giants like Facebook and such. They use your Facebook profile, WhatsApp, Instagram and other images you uploaded and tagged to train the AI models. And if you think of diversity distribution here – you got your weighted material. Does not equally cover the population on earth.

Fooling the AI

Fooling the AI is quite simple. The basic face recognition can be fooled simplest by showing photo of a person to the device. The more advanced recognizers using depth information and 3D models, are fooled with fake heads. Finding the face can be obscured with face paints, hats, scarfs etc. Data scientists are naturally trying best to develop better algorithms. It’s a constant cat and mouse game. Expect to be fooled.

If it’s so bad, why use it?

Because in most use cases, the imperfections just don’t matter. If your face recognizer validates the person passing the airport gate 95% correct, this reduces a lot of manual work. If your face recognition authentication says that you are not you on the first try, you can take another shot on better lightning or camera angle (like moving the device a little). If your camera app gets a wrong focus point in bad lightning for a second, and then corrects it when lightning improves – doesn’t really matter. Or the intelligent ad fails once in a while to detect the gender or age right – you’ll get an ad meant for some other target group. Which is exactly what radio and tv do all the time.

To simplify, the methods are: Try again with another (hopefully better) image, or use a series of images. If still uncertain, include human in the loop to make final decision.

What to expect next?

All the previous can be extended to video analysis as well. Video is just a bunch of images in a time series: Who appeared on this videoclip at which timeframes? What are the persons’ ages and gender? Who was talking and when? Add speech-to-text analysis, and you’ll have a transcript of the conversations.  Analysing video takes a bit more compute power, but hey – that’s what cloud is for.

We are also on the verge of “try before you buy” apps. Without visiting the brick and mortar store, you can check with an app how these eyeglass frames would look on you, or which color lipstick or eyelash color would be perfect – just with selfie image.

Face recognition can also be used to enhance machine-to-people experience. Your automatic reception terminal (or receptionist robot) can identify you from your face image, and help you more quickly register, and guide you to your host.

The future is now. Enjoy!