The renaissance of Artificial Intelligence (AI) is underway and so is the race for data, processing power, intelligent algorithms and the brightest minds. Simply put AI solutions are algorithms that process and analyze large datasets to identify patterns. These intelligent algorithms are often put together by some of the brightest minds in mathematics and computer science, and require an extensive IT infrastructure and processing power.

Data-rich tech giants, such as Google, Amazon and Microsoft are positioned well to develop the infrastructure and the intelligent algorithms that will fuel the backbone of AI renaissance. To expedite this renaissance, the tech giants and open source community are democratizing AI by placing their breakthroughs in the public domain. The cloud providers are starting to offer comprehensive AI solutions as a service, while the open source community is continuously developing and publishing AI frameworks and models for public use.

The increasing availability of the infrastructure and the building blocks necessary for the development of AI applications have empowered a new generation of entrepreneurs and visionaries. This empowerment in global scale has created the next technological frontier of our time, leading to widespread implementation of intelligent systems across industries across industries. The regulatory community also plays an important role in preparing the grounds for the adoption and the regulation of these technologies in the coming years.

In this post we explore a number of AI cloud services in the field of computer vision as well as some of the open source libraries that are available in the public domain.


Computer Vision in the Cloud

Computer vision is a discipline of AI that deals with AI applications that comprehend visual images. These applications often analyze, detect and extract meaningful insights from images and video segments similar to how humans perform these tasks. Computer vision algorithms are often developed via multi-layer neural networks and deep learning frameworks that require an extensive IT infrastructure and computing power. As a result the major cloud providers have leveraged their existing computing infrastructure and have built computer vision services that are accessible to the public via their cloud APIs. These APIs substantially reduce the cost and time of deployment for new application developers and are creating a new revenue stream for the cloud providers. The cloud providers offer a number of computer vision services. They all provide the four functions of insight analysis, text extraction, object localization, and brand and landmark detection.

Insight Analysis extracts valuable information from an image such as, objects present, activities being performed, color schemes and the overall mood of an image. To see insight analysis live in action, click here and upload an image into Google’s Vision API.

Text extraction extracts both handwritten and typed text content from images. A great example for text extraction is in the field of autonomous vehicles, where cars extract speed limits and other information from road signs to adjust speed accordingly.

Object localization APIs detect specific objects and their corresponding coordinates within an image. Object localization APIs are often capable of detecting and locating various objects such as people, animals, cars, etc., within images.

Brand and landmark detection refers to the detection of logos, landmarks and even celebrities within images.

Though the cloud providers have much in common in their AI service offerings, they differ in application specific solutions, confidence scores and model accuracies.

Google’s Vision API offers topical entity search. Leveraging Google’s powerful image search algorithm, the API discovers relevant topical entities to the image such as news events. Google claims to support millions of entities as part of their image search.

Amazon Rekognition provides APIs for the analysis of human facial features in photos. These features include moods, emotions and facial gestures, such as smile, frown, etc.

Microsoft Vision API has the ability to analyze videos in near-time. It provides scene specific insights continuously over the course of the video.

A growing number of organizations are leveraging intelligent cloud services to incorporate AI into their product offerings. Some examples of these emerging products are:

Marinus Analytics uses Amazon Rekognition to provide agencies and investigators with the tools to identify and locate the victims of human trafficking.

Prisms Skylabs uses Microsoft’s Computer Vision API to help organizations to search through their camera networks for particular events, items, and people.

ZSL, an international animal conservation charity uses Google Vision API to search, analyze and annotate millions of images captured through their camera traps in the wild. Their efforts will help us in understanding of how to conserve the world’s wildlife effectively.


Open Source Frameworks

The open source community has also made great contributions to the advancements in computer vision and AI. For instance, a number of open source libraries and frameworks have made computer vision and AI accessible to hobbyists and entrepreneurs alike. These libraries and frameworks greatly reduce the development cycle of new AI applications and empower smaller organizations to leverage the expertise of the larger open source community. Some of the more established open source libraries are Open CV, Simple CV, CCV, SOD, and PoseNet. At Optima AI, We have evaluated PoseNet, a human posture detection library in one of our case studies.

At Optima AI, we are continuously exploring opportunities where the advancements in the field of artificial intelligence can be applied to real world challenges. We welcome partnering with experts that would like to push the boundaries of science and technology within their industry.