Skip to content

    Computer Vision in Deep Learning: An Introductory Guide

    on December 6, 2021

    Computer vision is one of the most advanced and fascinating fields of data science, allowing computer technology to see as humans do and to represent data back to humans through visual imagery. While advances in computer vision have exploded just in the past couple of years, its roots and its first breakthroughs go back nearly 60 years. 

    What Is Computer Vision?

    Computer vision, sometimes referred to as CV, is a field that uses deep learning to allow computers to autonomously model the human visual system. With CV, computers are able to gain insight and understanding from digital images and video. CV uses an understanding of biology to replicate the human visual system using machine learning (ML) models, to create autonomous systems that can perform some of the same tasks that the human visual system can perform and to improve on that performance. At the same time, advances in computer vision algorithms have been providing us with scientific insights into how the human visual system works. 

    The History of Computer Vision

    Computer vision has its roots in the work of Larry Roberts, who explored the possibility of extracting 3D geographical information from 2D images of blocks back in the 1960s. This was followed by low-level tasks like edge detection and segmentation. Another major breakthrough was made in the 1970s by David Marr, who applied low-level image-processing algorithms to 2D images to create a 2.5D sketch using binocular stereo, then applied high-level techniques to turn these into 3D model representations. These both represented bottom-up approaches, creating 3D models from 2D images.

    Vision and imaging technology has advanced considerably since these pioneer days. In many cases there is no need to create 3D object models, and a top-down approach to visual problems is more efficient. Purposive vision, for example — used in areas like vehicle navigation — doesn’t need a full 3D model of an object, merely the position and movement of objects in the vehicle’s vicinity. Computer vision is often integrated with other related fields, such as:

    • Image processing: Raw images need to be processed before they can be analyzed.
    • Photogrammetry: Cameras need to be calibrated before they can begin measuring distances between points.
    • Computer graphics: 3D models of images need to be rendered.

    Compared to human vision, computer vision tends to be brittle: One algorithm will work in some scenarios but not others. Another issue that has plagued the field is that compared to human visual systems, algorithms have not been able to capture all of the nuances involved, particularly when it comes to recognizing human faces in different environments, in different lighting and while allowing for changes in features over the years. 

    However, data scientists continue to improve models. In 2020, for example, the Chinese facial-recognition company Hanwang achieved 99.5% recognition accuracy for full-face images. However, when the population began wearing masks due to the coronavirus, this rate of accuracy dropped in half. The company was able to retrain its models using vector analysis to predict what people in the database of 1.2 billion people would look like with a mask, and now has 95% accuracy, even when people are wearing masks, winter hats and scarves.

    How Is Deep Learning Used in Computer Vision?

    Many of the advances in CV over the past few years are largely due to the growing adoption of deep learning in CV modeling. Many of the traditional approaches to CV rely on extraction of features from images, like edges and corners. This requires human intervention during development and requires more labor, and more trial and error, when more and more features are required. This human “feature engineering” is not necessary when deep learning is used, because deep learning models can distinguish features autonomously.

    Deep learning models are provided with a dataset of images with classes of objects labeled in the images where they are present (such as a bird, a child or a cloud). Inside the machine, neural networks discover patterns in the images to automatically determine which features identify the different classes. This requires less time from data engineers, but requires extensively more CPU and GPU resources. Deep learning is also much more problem-specific and cannot be transferred to problems that are not closely related to the training dataset.

    Traditional CV modeling techniques are still used in many circumstances where specific class knowledge isn’t required, like image stitching or 3D mesh reconstruction. When image distinctions aren’t complex, or when images need to be sorted according to basic shape or color, traditional CV modeling is more efficient and less resource-intensive than deep learning.

    5 Computer Vision Tools

    Before you can begin working on a CV application, you will need access to the right CV libraries. If you have already explored ML, you’re likely to find that there are resources already available to you. For those who are just embarking on their journey in ML, some of the most popular tools for computer vision today include:


    An open-source library of ML and CV software, it has over 2,500 algorithms, ranging from red-eye removers to face-detection algorithms. OpenCV interfaces include C++, Python, Java and MATLAB.


    Another open-source tool, also compatible with multiple languages, TensorFlow is a popular ML platform that allows users to develop computer vision ML models for a variety of common tasks. For those who find it too resource-intensive, TensorFlow Lite is a good alternative for working on smaller models.


    A parallel computing platform and API interface developed by NVIDIA. It offers GPU-accelerated image, video and signal processing functions ideal for computer vision It supports multiple languages and comes with over 5,000 primitives for image and signal processing.


    A popular paid programming platform that includes a computer vision toolbox with multiple functions, algorithms and apps.


    Specifically designed for object detection in real time, this software tool is exceptionally fast and highly accurate. More specific in its scope, YOLO doesn’t have as large a community as other tools.

    Computer Vision in Action

    In just the past few years, society has been inundated with a growing number of computer vision applications. You need only look at your phone to see amazing examples, like the viral face filters that turn people into cartoon characters on Instagram, with real-time features that stay glued to your face in precisely the right places even as you turn your head and move. 

    Another amazing example of CV technology is FaceApp, which can transform images of human faces with filters that include aging and changing genders. Algorithms within the app also remove wrinkles and blemishes and even show users what they would look like with facial hair. 

    PoseNet, which can determine the posture of someone in 3D based on only a 2D image or video, also utilizes computer vision. By tracking key points in the human skeleton, it can then render poses in a variety of other applications, including augmented-reality apps, games, animation and training robots, or for activity recognition in sports or surveillance-system analysis.

    Explore Computer Vision

    Whether you are developing CV models using traditional ML approaches and labeled datasets or using the latest deep learning algorithms based on neural networks, you’ll need to ensure you have the right tools and resources. Model-driven organizations rely on state-of-the-art enterprise MLOps platforms. 

    David Weedmark is a published author who has worked as a project manager, software developer and as a network security consultant.

    Other posts you might be interested in

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.