Instance recognition, image classification, object detection, semantic segmentation, and face recognition.
Recognition is where computer vision meets understanding. Given an image, can we determine what's in it? This seemingly simple question has driven decades of research.
What is this chapter about? We explore the major recognition tasks: classifying whole images, detecting and localizing objects, segmenting images pixel-by-pixel, and recognizing faces. Each builds on the deep learning foundations from Chapter 5.
Why does this matter? Recognition powers countless real-world applications:
How the topics connect: We start with image classification—the simplest task where we assign one label to an entire image. Then object detection adds localization—where are objects? Semantic segmentation provides pixel-level understanding. Finally, face recognition shows how these ideas apply to a specific, important domain.
Click any topic to jump in
Feature extraction, softmax, and cross-entropy — assigning a single category label to an entire image.
Where are objects? What label per pixel?
IoU, anchor boxes, NMS, and mAP — localizing and classifying multiple objects with bounding boxes.
Pixel-wise classification with encoder-decoder networks — assigning a class label to every pixel in the image.
Embeddings, triplet loss, and ArcFace — mapping faces to a metric space for verification and identification.
This chapter is part of PixelBank Premium. Create a free account, then upgrade to read the full lesson — concepts, walkthroughs, and exercises.