
[ad_1]
iOS implementation of Vision Framework’s new face capture quality request
During WWDC 2019, Apple made several exciting new developments in its vision framework. Not only have they improved face tracking and image classification, but they have also introduced interesting new features such as Affordability, Built-in Animal Classification Model, and advanced APIs for working with Core ML classification models. In new releases, the ability to Compare Face Capture Quality A set of images is one of the most promising features that surfaced this year.
The introduction of face capture quality has given Vision’s face technology a big boost. It shows how much Apple is investing in computer vision to make photo capturing and processing smarter and easier than ever.
The Face Capture Quality Metric uses a model that has been trained on a wide range of images (different exposures, lighting, facial expressions, etc.). Vision Request analyzes the image in a shot and assigns it a metric score. The score depends on facial expressions (negative people get a lower score), lighting, focus, and blurring of the image.
Using these metric scores, we can compare different images to find the image in which the face looks best. This is something that will soon be coming to many custom selfie-based applications.
Face capture quality not only helps build smart camera-based applications, as shown in docs, but it also helps to bring machine learning intelligence to video processing. The goal of this article is to make Live Photos smart (more on this later) by taking advantage of the face capture quality in our iOS application.
Live Photos was introduced in iOS with the iPhone 6S and is one of the most preferred camera modes. It redefined the way we view still images by providing live motion effects.
The idea is to find the best frame from a Live Photo that has a human face. we will use the new VNDetectFaceCaptureQualityRequest
To run our Vision requests for the class on a number of live photos that were deliberately captured in poor/blurry conditions to get the best frame out of it.
However, you can extend the same code and concept to videos as well. Live Photos essentially consist of video, as we’ll see next.

Live Photos is made up of an image and a video strip that contains the actions taken when the image was captured. This gives a feeling of being there in a moment while looking at them.
Under the hood, Live Photos consists of a prime photo paired with a video resource asset file. We can change the main photo by selecting any video frame from preview-edit mode in Photos app.
To access the main photo or video in your code, you have to use the class PHAssetResourceManager
, which holds the asset resource. We will use it in our implementation in the next few sections.
Before we delve deeper into the implementation, let’s lay out the blueprint. We will be using the following classes and components at different stages of our application:
- One
ImagePickerController
To select a Live Photo from the camera or photo library. PHAssetResource
To retrieve video resource and store it in temporaryFileManager
directory.- using a
CollectionView
To display video frames with face quality metric values from Vision request. - Lastly, we’re going to convert the frame with the highest face capture quality to a . will display in
UIImageView
,
The following example shows a high-level overview of how the implementation is connected – from live photo capturing to video extraction to vision face capture quality requests:

Now that we’ve laid out our action plans, let’s kickstart implementation by setting up the user interface.
The following code installs the Buttons and ImageViews in our ViewController.swift
file:
If the above code looks big, it is because instead of using storyboard, I created the UI programmatically.
One of the buttons above is responsible for starting the image picker, while the other handles the vision request, which we’ll see later.
@objc func onButtonClick(sender: UIButton){let imagePicker = UIImagePickerController()imagePicker.sourceType = .photoLibraryimagePicker.mediaTypes = [kUTTypeImage, kUTTypeLivePhoto] as [String]imagePicker.delegate = selfpresent(imagePicker, animated: true, completion: nil)}
In the above code, we set ImagePickerController
To access Live Photos from the Photos library. for ImagePicker
For this to work correctly, make sure you have added the privacy usage description to “Photo Usage” in your info.plist file.
Live Photos are like this PHLivePhoto
, The following code is used to handle Live Photo selected from Image picker:
In the above code, we are filtering the image picker results to return Live Photos images by checking if the result contains onePHLivePhoto
example in info
Dictionary.
inside processLivePhoto
function, we will extract the video resource from the live photo, save it to a temporary URL FileManager
And extract image frame from video.
PHAssetResourceManager.default().writeData
Responsible for writing video buffers to the URL. Once the resource is written videoUrl
The imagesFromVideos
The function gets triggered for property observers:
var videoUrl : URL? {
didSet{
DispatchQueue.global(qos: .background).async {
guard let videoURL = self.videoUrl else{ return }
self.imagesFromVideo(url: videoURL)
}
}
}
The following code extracts a certain number of frames (based on video duration) and puts them in an array:
generateCGImagesAsynchronously
is responsible for extracting multiple frames from a video asynchronously NSValue
(time) specified.
by using asset.duration
And numberOfFrames
, we set the time interval between each frame extracted. in present, numberOfFrames
The vision is set to 12 to limit the number of requests we can execute. It seems fine for Live Photos that aren’t longer than 3 seconds, though you can play with this number if you’re processing video.
There are some properties which we have defined at the beginning of the above code snippet. setCustomData
used to populate our CollectionView
, For that, we need to set up our CollectionView
first.
before we start our build CollectionView
Here is a glimpse of the application in half step:

Of course, the horizontal collection view seen in the above screen recording has not yet been implemented.
we had left setupCollectionView
Works when initially setting up other UI components. It’s time to implement it.
In the above code, we have set up a horizontal collection view and registered a CustomCell
the square on it, which holds the layout UICollectionViewCell
View collection cells, data sources, and delegate methods
The following code a. sets the collection view cells by adding UIImageView
And a Label
For this.
CustomData
The class holds the data for each cell. This is our data source CollectionView
, The following code defines it:
public struct CustomData {var faceQualityValue: String = ""var frameImage: UIImage}
Next, we need to define our delegate methods CollectionView
,
Now, it’s time to handle the vision request.
Our vision handler will take each picture from here CollectionView
and drive VNDetectFaceCaptureQualityRequest
to get over them faceCaptureQuality
score metric. We will only display the highest face capture quality image UIImageView
,
The following code triggers a vision request when the button (the one with the eye icon) is pressed to trigger the selector method:
I ran the vision request above on a few Live Selfie photos (intentionally blurry, with awkward poses and expressions) to determine the best frame. Here are the results:

The above results show how Vision helps to automatically determine the best face captured from a given set of images (i.e. video frames). The face capture quality vision request is very fast and accurate for short videos, as we have with Live Photos.
That’s why we’ve discussed the new changes introduced to Vision’s face technology with iOS 13 and macOS 15 (specifically the Face Capture quality) and created a full-fledged iOS application from scratch that uses this new feature on Live Photos. Is. Full source code is available in GitHub repository,
Face capture quality is an exciting feature with a wide variety of use cases – from photo editing to anomaly detection (detect if a video/Live Photo contains a human face).
Only time will tell if Apple decides to introduce this feature to its built-in Live Photos feature for smart editing. Until then, you can try to improve upon the above application, perhaps best by storing the frame as Live Photo’s main frame (which is displayed in the photo library).
It wraps up this piece. I hope you enjoy reading it.
[ad_2]
Source link
#Computer #Vision #iOS #Determine #Facial #Expression #Live #Photos #Anupam #Chughu