Facial Recognition with Python and Elasticsearch: quick Tutorial!
In this fast tutorial I’ll be showing how to setup a pipeline aimed at using facial recognition technology based on Python and Elasticsearch. Basically, the tool will let you store “facial vectors” (calculated with Python’s OpenCV library and Dlib) on your Elasticsearch instance and then using the database to search for likely matches starting from an input image.
First, you will need to download the latest version (7.6.2) of Elasticsearch and Kibana. Once you’ve downloaded the files, just run Elasticsearch and Kibana (which you will be using to setup the database) and wait for both to be ready. You should be able to open Kibana on localhost:5601, but depending on your machine settings you may find it’s using another port.
When Kibana and Elasticsearch are rolling, open Kibana and go to the Dev Tools tab (bottom left). At this point, you will need to create the correct mapping (dense vector datatype) with the same exact format I used in the image:
“face_recognition” is the name I chose for my database index. You can set up any name you want. It’s important that you leave the “dims” key as it is (128).
Now click on the small up-right arrow. Your index will now be created and the right part of the screen will populate with “acknowledged: True”, etc.
Note that even if you’ve created an index, you will not be able to use Kibana to view it since there is no data stored within.
Since you will probably store many images overtime, you will need to edit the Elasticsearch index’s properties in order to be able to query all the database each time. Open the Kibana dashboard, go to management > index management. Now select your index and open the “Edit settings” tab. You will need to add “index.max_result_window”:”1000000", as in the picture:
Now we’ll use a Python script and a set of images to upload some facial vectors in the face_recognition index (source code here):
Before launching the above code, make sure that you have configured the correct port for your Elasticsearch instance. It should be 9200. Make sure that you have all the necessary Python libraries installed (Elasticsearch, Dlib, Imutils, face_recognition, cv2) and that in the last row of code, the correct index name is set (face_recognition in my case). If you have a GPU, you can change the encoding model to “cnn” for better results. Since I have a normal CPU, I have left it as “hog”, which is much faster but far less accurate.
Now you’re ready to encode some images and automatically store the facial vectors into Elasticsearch!
Run the code passing as parameter a folder containing some images:
Python will start encoding all the faces in your dataset into facial vectors and will automatically load the images into Elasticsearch. In my example, I saved only 4 images in the “dataset” folder. You can have literally thousands of thousands! Now, if you need to check the data manually into Elasticsearch, you will need to create an Index Pattern on Kibana. Check this guide to see how. I am not going into this here as this is not really the scope of the tutorial.
After you have loaded some data into Elasticsearch, you can use the following script to check if an input persona of your choice corresponds to one of the “vectorialized” faces into your database (source code here):
In order to run the script, just pass as a parameter the path of a set of images portraying the same subject. The code will automatically generate the mean facial vector for all the input images and iterate through the database matching the vectorialized input against the face encodings that you have previously stored. The code will then output a probability score (from 0 to 1) for each of the stored images. You can change the “size” value in the above script if you want the program to display more or less than the top 50 scores (if you have millions of stored faces you’ll probably want to view only the top 50 or 100 scores):
For example, in my case I chose a set of pictures of Keanu Reeves, stored them in a folder, and passed it through my Elasticsearch database searching for other possible images where Reeves is present. The script prints to the terminal a confidence score, the higher the better:
C:\Users\Hp>python3 “C:\Users\Hp\Desktop\Face Recognition Elasticsearch\search.py” -i C:\Users\Hp\Desktop\dataset\reeves
[*] Possible match: C:\Users\Hp\Desktop\Face Recognition\dataset\1810749.jpg >> score: 0.7381948
[*] Possible match: C:\Users\Hp\Desktop\Face Recognition\dataset\actor-0273.jpg >> score: 0.5912682
This output means that Keanu Reeves is present in the image 1810749.jpg with a 73% probability. The chances he’s in actor-0273.jpg are fewer (59%). In my experience, when a probability score is higher than 70% there are good chances that you’ve got a match. However, I’ve found that the reliability of this approach depends a lot on multiple variables, such as the quality of the images and the power of your CPU/GPU. Since the quality of the sources you gather through OSINT is out of your control, it is important to generate the best possible mean facial encoding to pass through the database. So, if you have more than 10/20 pictures of your person of interest, store ’em all in the dataset to generate a better vector.
That’s all. This method can be useful if you have scraped many pictures from social media, or if you converted some videos to frames and you need to check if a person of interest is present. The same can be done with frames from security cameras video streams, etc. It is also possible to adapt the scripts to be used as local transforms in Maltego for graph analysis. The limit is your imagination.
If you want to deepen your knowledge about how to easily exploit facial recognition technology with Python, check Adrian Rosebrock’s Pyimagesearch website. He’s probably one of the most respected computer vision experts and has a lot of free material on his website! You can easily customize his scripts to use for your purposes.
In the next tutorial, I show (full code available) how to cluster unique faces in order to analyze lots of images while creating a graph for OSINT analysis.