Image

AIScale Machine Learning Acceleration

Smallest and most efficient FPGA based Neural Network Engine IP Core

The scalable Solution for Low Cost Edge Machine Learning Inference for Embedded Vision


Your algorithms provide the  RAZOR-SHARP view and we help you to recognize WHAT you see

 
 

We are committed to provide our clients and partners universal, easy-to-use, efficient, scalable, flexible and lowest power FPGA based machine learning inference platforms. Our AIScale architecture in combination with our DeepCompressor serves clients in the fields of computer vision, robotics, speech recognition, surveillance systems as well as data centers. Neural network acceleration from edge- to server devices.

Kortiq´s novel way of mapping calculations to hardware resources in combination with highly advanced compression methods, which offer a significant reduction in required external memory transfer size and power, enable our clients in the above industries to achieve fast turnaround from idea to product, with having an efficient and economic solution in mind.

AIScale in a nutshell

Apples or Pears?

SOLVE THIS CLASSIFICATION CHALLENGE WITH KORTIQ AISCALE

1

The Challenge

Detecting and recognizing objects might be a simple task for a human, but when it comes to automatic detection and recognition by a high-end embedded vision system it might become very challenging to solve this problem efficiently.

2

Capturing

Razor-sharp images delivered by high-end cameras are the solid basis to master the challenge. But traditional image processing algorithms and pattern recognition might not be up to solving that task. It becomes very complicated.

3

Classification

What makes the pear different from the apple. To train a Convolutional Neural Network with a lot of different pictures from apples and pears makes it much easier to classify the pear and can increase the accuracy in detection. TensorFlow and CPU clusters or GPUs will help speed up the training phase.

NEXT

Apples or Pears?

CONVOLUTIONAL NEURAL NETWORK CAN HELP TO MASTER THE CHALLENGE

FEATURE MAPS

= "PEAR"

CONVOLUTION LAYERS

FULLY-CONNECTED
LAYERS

POOLING LAYERS

INPUT IMAGE

NEXT

Integrate KORTIQ AIScale

With KORTIQ AIScale CNN Hardware Accelerator IP we can help you to recognize WHAT you see

Integration of AIScale CNN with ZYNQ SOC

NEXT

Integrate KORTIQ

KORTIQ AIScale Deep Compressor will massively shrink your selected network

Trained Network

Compression of trained network
with KORTIQ AIScale DeepCompressor

Compressed Network

NEXT

Integrate KORTIQ

With KORTIQ AIScale we help you to recognize WHAT you see

Compressed Network

Translate to FPGA with
TensorFlow2AIScale translator

= "PEAR"

NEXT

Apples or Pears or even Persons ?

MEET THE CHALLENGE AND SOLVE IT WITH KORTIQ

AGAIN

AIScale Advantages

APPLICATIONS

EMBEDDED VISION AND ROBOTICS IN INDUSTRIAL MARKETS

In a first step we are focusing on embedded- and computer vision and robotics in the industrial markets (Industry 4.0, IoT) to support new features such as Image Classification, Object Recognition, Object Tracking, Face Recognition and others that deep learning neural networks can bring to many of manufacturing, automation control and robotics applications. Using e.g. a cost optimized Xilinx Zynq device and a pre-trained CNN running on our implemented AIScale Neural Network Engine IP, all integretad in a high quality Smart Camera, can help improve reliability, lead to higher quality and yield.

OUR CUSTOMERS

KNOW CAMERA SYSTEMS AND IMAGE PROCESSING OR SIMPLY LOOK FOR AN OPTIMIZED ENGINE FOR THEIR NEURAL NETWORKS

Our clients know how to build a machine- or computer vision system. They are in image processing algorithms, video analytics and know how to create a high-end camera system choosing the right software and components such as lenses, image sensors, housing, semiconductor components and more. Now they are looking for a technology enabler to add machine learning tasks, a partner who focuses exactly on this piece of CNN hardware IP that enables them to get started with e.g. a Image Recognition feature immediately by integrating one true re-configurable, easy-to-use hardware with small footprint.

AISCALE CNN ACCELERATOR

SMART AND EASY TECHNOLOGY ENABLER

Designed by our team with 10+ years experience in Machine Learning Algorithms and FPGA design, our hard-wired, easy-to-use and very small AIScale CNN Accelerator is designed to support all different types of CNN such as state-of-the-art CNN as well as the ones you design. Simply initialize and run your pre-trained network with two functions. No need to generate different hardware architectures or special SW programming. AIScale CNN accelerator has a very small footprint based on coarse-grained, re-configurable computing principle for cost optimized, highly efficient, flexible and scalable FPGA based solutions.

Comparison @AIScale V1.0 (May 2018)

Image

CNNs: AlexNet, VGG-16, Yolo-Tiny and KortiqY3

KY3 CNN Total # of Parameters: 3.946.416

KY3 CNN Total Number of Operations per Input Image: 428.603.392

AlexNet CNN Total # of Parameters: 60.963.848

AlexNet CNN Total # of Operations/Input Image: 725.508.992

VGG-16 CNN Total # of Parameters: 138.353.320

VGG-16 CNN Total # of Operations/Input Image: 15.476.385.792

YOLO-Tiny CNN Total # of Parameters: 15.855.536

YOLO-Tiny CNN Total # of Operations/Input Image: 3.491.231.744

Image

Smart and easy: a Two-Function-Interface is all you need

CNN.INIT

FIRST: INITIALIZE RECONFIGURABLE STRUCTURE

Use a dedicated IF fuction to initialize the network.

Your network can be any CNN e.g. ResNet, AlexNet, Tiny Yolo, VGG16 …

AIScale will be configured based on pre-trained network models using TensorFlow, AIScale DeepCompressor and AIScale TF2AIScale Translator.

No need to generate different hardware architectures per CNN

No need for SW programming (C, C++, OpenCL)

No need to learn how to use specific libraries

No need to learn which functions to use with what parameters

CNN.RUN

SECOND: RUN THE NETWORK

Once configured and initialized, the AIScale accelerator will act as.

3D CONVOLUTION

DEPTHWISE CONVOLUTION

POOLING

ADDING

FULLY-CONNECTED

CONCATENATION layer

based on the chosen network structure. Activation functions are executed as a post-processing step of each layer

Video - People Detection with Zync 7020

VIDEO – Kortiq Small and Efficient CNN Accelerator: Powered by Xilinx

Kortiq provides an easy to use, scalable and small form factor CNN accelerator. The device supports all types of CNN and dynamically accelerates different layer types found in the network. The Xilinx Zynq family of SoCs and MPSoCs help Kortiq devices achieve targeted performance levels and flexibility, while being cost-effective.

All Programmable @AIScale V1.0 (May 2018)

Image

The AIScale Compute Core (MAC)

AIScale CC (MAC)

FIRST: INITIALIZE RECONFIGURABLE STRUCTURE

The Re-configurable Compute Core is the heart of our AI Scale accelerator and provides exceeding flexibility and scalability. The small footprint is based on coarse-grained true re-configurable computing principle and architecture.

AIScale CC supports and processes Convolutional-, Pooling-, Adding- and Fully-Connected layers. Based on your needs in size, frames per second or accuracy the accelerator can be parameterized from very few CC to several 100 CC.

Make advantage of a hardwired, optimized network with opportunity to switch between different CNN solutions based on customers needs using pre-trained network parameters. It can be structured for low latency and custom memory allocations.

AIScale Application Example

Colleague Classification @ 27fps with AIScale Hardware Accelerator IP using 32 Compute Cores @ 120 MHz with our KortiqY3 network.

This can e.g. be implemented in a cost optimized Zynq device.

AIScale Product Package

AIScale DeepCompressor

Tensorflow2AIScale Translator

AIScale CNN Hardware Accelerator IP

AIScaleCDP2 IP Core Preliminary Datasheet