Subjects/Technology/Data and AI/Machine Learning/Computer vision

Computer vision Study Guide

Study Guide

📖 Core Concepts Computer Vision (CV) – The study of how computers can obtain high‑level understanding from images or video (e.g., recognize objects, infer 3‑D structure, make decisions). Image Understanding – Converts raw pixel data into symbolic descriptions (objects, actions, scene layout) using geometry, physics, statistics, and learning. Scope of CV Tasks – Acquire → process → analyze → understand → produce numerical or symbolic output (classification, pose, decision). Hierarchy of Abstraction Low‑level: edges, textures, regions. Mid‑level: boundaries, surfaces, volumes. High‑level: objects, scenes, events. Distinctions Image Processing: input → transformed image (e.g., filtering). Computer Vision: input → analysis/decision (may output a description, not an image). Machine Vision: CV in controlled, real‑time industrial settings (fixed lighting, actuator integration). --- 📌 Must Remember Recognition vs. Identification – Recognition = class label (e.g., “car”). Identification = specific instance (e.g., “my red Toyota”). Detection – Scan whole image, return locations (bounding boxes) of objects of interest. Pose Estimation – Find 3‑D position + orientation of an object relative to the camera. Optical Flow – Apparent 2‑D motion field of each pixel between consecutive frames. Egomotion – Rigid 3‑D motion of the camera itself (rotation + translation). SLAM (Simultaneous Localization & Mapping) – Builds a metric map while estimating the robot/vehicle’s pose. Segmentation – Partition image into meaningful regions (foreground/background, object parts). Co‑segmentation – Jointly segment the same object across multiple images/videos. Scale‑Space – Multi‑scale representation that reveals structures at appropriate spatial scales (e.g., Gaussian pyramid). --- 🔄 Key Processes Pre‑processing – Build a scale‑space, normalize illumination, denoise. Feature Extraction – Detect edges, corners, blobs, or dense descriptors (SIFT, ORB). Detection / Segmentation Detection: slide‑window or region‑proposal → confidence scores → bounding boxes. Segmentation: assign each pixel to a region (thresholding, graph‑cut, CNN‑based masks). Higher‑Level Processing (application‑specific) Recognition: classification of detected regions (CNN, SVM). Parameter Estimation: pose, size, shape fitting (PnP, ICP). Motion Analysis: tracking, optical flow, egomotion estimation. Scene Reconstruction: triangulate points → point cloud → surface mesh. Decision Making – Pass/fail, match/no‑match, alert generation. Typical recognition pipeline: $$\text{Image} \;\xrightarrow{\text{pre‑process}} \;\xrightarrow{\text{features}} \;\xrightarrow{\text{detect/segment}} \;\xrightarrow{\text{classify \& estimate pose}} \;\xrightarrow{\text{decision}}$$ --- 🔍 Key Comparisons Computer Vision vs. Image Processing CV → analysis → symbolic output. Image Processing → transformation → another image. Computer Vision vs. Computer Graphics Graphics: model → image. CV: image → model. Machine Vision vs. General Computer Vision Machine Vision: real‑time, controlled lighting, actuator feedback. CV: broader research scope, may tolerate offline processing. Recognition vs. Detection vs. Identification Detection: “where is an object?” (bounding box). Recognition: “what is it?” (class label). Identification: “which specific instance?” (ID). Optical Flow vs. Egomotion Optical flow: pixel‑wise apparent motion. Egomotion: 3‑D rigid motion of the camera; derived from flow + depth. --- ⚠️ Common Misunderstandings “CV = Deep Learning” – Classic geometry‑based methods (e.g., PnP, SLAM) remain essential, especially where data is scarce. “Image processing yields decisions” – Pure image processing stops at an enhanced image; decisions require higher‑level interpretation. “Optical flow gives absolute object speed” – It provides relative pixel motion; depth is needed for real‑world speed. “Segmentation always produces perfect object masks” – Occlusions, similar textures, and lighting can cause leakage or missing parts. “Medical imaging is a separate field” – It heavily relies on CV techniques (CNNs for disease detection, registration for multimodal scans). --- 🧠 Mental Models / Intuition “Seeing → Understanding → Acting” – Imagine a human looking at a scene: first low‑level edges appear, then parts are grouped, then objects are recognized, finally actions are decided. Scale‑Space as “Zoom Levels” – Small Gaussian blur = fine details; large blur = coarse structures. Different tasks (edge detection vs. object detection) operate at different “zoom levels”. Feature Hierarchy – Corners are interest points where two edges meet; clusters of corners form keypoints that survive across scale/rotation, serving as reliable anchors. --- 🚩 Exceptions & Edge Cases Real‑time constraints – In machine vision, algorithmic complexity must be bounded; lightweight descriptors (ORB) may replace heavy CNNs. Extreme lighting / motion blur – Standard edge detectors fail; need robust preprocessing (de‑blurring, HDR techniques). Textureless regions – Optical flow becomes ambiguous; incorporate global priors or use feature tracking instead. Occlusion in Pose Estimation – If keypoints are hidden, pose may be under‑determined; use model‑based fitting or multiple views. --- 📍 When to Use Which | Situation | Preferred Method | |-----------|------------------| | Fast, controlled industrial inspection | Machine‑vision pipeline → simple filters + fixed‑pattern detectors | | Variable illumination, complex objects | Deep‑learning based detection/segmentation (e.g., Mask RCNN) | | Need exact 3‑D geometry from few images | Classical multi‑view stereo + bundle adjustment | | Tracking many points in texture‑rich video | Sparse optical flow (Lucas‑Kanade) or dense flow if GPU available | | Estimating camera motion (SLAM) in unknown environment | Visual‑odometry + loop‑closure (feature‑based SLAM) | | Identifying a specific individual (face ID) | Face embedding + nearest‑neighbor search (recognition + identification) | | Restoring heavily degraded images | Model‑based restoration (e.g., non‑local means, deep de‑blurring) | --- 👀 Patterns to Recognize Edge → Corner → Keypoint – A chain that often indicates a robust feature for matching. Repeating texture + uniform color – Signals potential failure of pure intensity‑based flow; consider gradient‑based or feature‑based tracking. Sharp intensity gradient + high curvature – Likely object boundary → good seed for segmentation. Temporal consistency of masks – When masks change slowly across frames, co‑segmentation can exploit this for better stability. Large motion vectors + blurred edges – Indicates motion blur → image restoration needed before reliable feature extraction. --- 🗂️ Exam Traps “All computer‑vision systems output a class label.” – Many output continuous values (pose, depth, flow) or binary decisions (pass/fail). “Optical flow directly yields depth.” – Depth requires additional constraints (stereo baseline, known motion). “Machine vision is just a hardware problem.” – Algorithmic design (real‑time detection, lighting normalization) is equally critical. “Segmentation always precedes detection.” – In many pipelines, detection (region proposals) comes first, then segmentation refines the region. “Deep learning eliminates the need for preprocessing.” – Pre‑processing (normalization, scale‑space) still improves robustness and training stability. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or