AIScale Machine Learning Acceleration
Smallest and most efficient FPGA based Neural Network Engine IP Core
The scalable Solution for Low Cost Edge Machine Learning Inference for Embedded Vision
Your algorithms provide the RAZOR-SHARP view and we help you to recognize WHAT you see
We are committed to provide our clients and partners universal, easy-to-use, efficient, scalable, flexible and lowest power FPGA based machine learning inference platforms. Our AIScale architecture in combination with our DeepCompressor serves clients in the fields of computer vision, robotics, speech recognition, surveillance systems as well as data centers. Neural network acceleration from edge- to server devices.
Kortiq´s novel way of mapping calculations to hardware resources in combination with highly advanced compression methods, which offer a significant reduction in required external memory transfer size and power, enable our clients in the above industries to achieve fast turnaround from idea to product, with having an efficient and economic solution in mind.
AIScale in a nutshell
Detecting and recognizing objects might be a simple task for a human, but when it comes to automatic detection and recognition by a high-end embedded vision system it might become very challenging to solve this problem efficiently.
Razor-sharp images delivered by high-end cameras are the solid basis to master the challenge. But traditional image processing algorithms and pattern recognition might not be up to solving that task. It becomes very complicated.
What makes the pear different from the apple. To train a Convolutional Neural Network with a lot of different pictures from apples and pears makes it much easier to classify the pear and can increase the accuracy in detection. TensorFlow and CPU clusters or GPUs will help speed up the training phase.
EMBEDDED VISION AND ROBOTICS IN INDUSTRIAL MARKETS
In a first step we are focusing on embedded- and computer vision and robotics in the industrial markets (Industry 4.0, IoT) to support new features such as Image Classification, Object Recognition, Object Tracking, Face Recognition and others that deep learning neural networks can bring to many of manufacturing, automation control and robotics applications. Using e.g. a cost optimized Xilinx Zynq device and a pre-trained CNN running on our implemented AIScale Neural Network Engine IP, all integretad in a high quality Smart Camera, can help improve reliability, lead to higher quality and yield.
KNOW CAMERA SYSTEMS AND IMAGE PROCESSING OR SIMPLY LOOK FOR AN OPTIMIZED ENGINE FOR THEIR NEURAL NETWORKS
Our clients know how to build a machine- or computer vision system. They are in image processing algorithms, video analytics and know how to create a high-end camera system choosing the right software and components such as lenses, image sensors, housing, semiconductor components and more. Now they are looking for a technology enabler to add machine learning tasks, a partner who focuses exactly on this piece of CNN hardware IP that enables them to get started with e.g. a Image Recognition feature immediately by integrating one true re-configurable, easy-to-use hardware with small footprint.
AISCALE CNN ACCELERATOR
SMART AND EASY TECHNOLOGY ENABLER
Designed by our team with 10+ years experience in Machine Learning Algorithms and FPGA design, our hard-wired, easy-to-use and very small AIScale CNN Accelerator is designed to support all different types of CNN such as state-of-the-art CNN as well as the ones you design. Simply initialize and run your pre-trained network with two functions. No need to generate different hardware architectures or special SW programming. AIScale CNN accelerator has a very small footprint based on coarse-grained, re-configurable computing principle for cost optimized, highly efficient, flexible and scalable FPGA based solutions.
Comparison @AIScale V1.0 (May 2018)
CNNs: AlexNet, VGG-16, Yolo-Tiny and KortiqY3
KY3 CNN Total # of Parameters: 3.946.416
KY3 CNN Total Number of Operations per Input Image: 428.603.392
AlexNet CNN Total # of Parameters: 60.963.848
AlexNet CNN Total # of Operations/Input Image: 725.508.992
VGG-16 CNN Total # of Parameters: 138.353.320
VGG-16 CNN Total # of Operations/Input Image: 15.476.385.792
YOLO-Tiny CNN Total # of Parameters: 15.855.536
YOLO-Tiny CNN Total # of Operations/Input Image: 3.491.231.744
Smart and easy: a Two-Function-Interface is all you need
FIRST: INITIALIZE RECONFIGURABLE STRUCTURE
Use a dedicated IF fuction to initialize the network.
Your network can be any CNN e.g. ResNet, AlexNet, Tiny Yolo, VGG16 …
AIScale will be configured based on pre-trained network models using TensorFlow, AIScale DeepCompressor and AIScale TF2AIScale Translator.
No need to generate different hardware architectures per CNN
No need for SW programming (C, C++, OpenCL)
No need to learn how to use specific libraries
No need to learn which functions to use with what parameters
SECOND: RUN THE NETWORK
Once configured and initialized, the AIScale accelerator will act as.
based on the chosen network structure. Activation functions are executed as a post-processing step of each layer
Video - People Detection with Zync 7020
Kortiq provides an easy to use, scalable and small form factor CNN accelerator. The device supports all types of CNN and dynamically accelerates different layer types found in the network. The Xilinx Zynq family of SoCs and MPSoCs help Kortiq devices achieve targeted performance levels and flexibility, while being cost-effective.
All Programmable @AIScale V1.0 (May 2018)
The AIScale Compute Core (MAC)
AIScale CC (MAC)
FIRST: INITIALIZE RECONFIGURABLE STRUCTURE
The Re-configurable Compute Core is the heart of our AI Scale accelerator and provides exceeding flexibility and scalability. The small footprint is based on coarse-grained true re-configurable computing principle and architecture.
AIScale CC supports and processes Convolutional-, Pooling-, Adding- and Fully-Connected layers. Based on your needs in size, frames per second or accuracy the accelerator can be parameterized from very few CC to several 100 CC.
Make advantage of a hardwired, optimized network with opportunity to switch between different CNN solutions based on customers needs using pre-trained network parameters. It can be structured for low latency and custom memory allocations.
AIScale Application Example
Colleague Classification @ 27fps with AIScale Hardware Accelerator IP using 32 Compute Cores @ 120 MHz with our KortiqY3 network.
This can e.g. be implemented in a cost optimized Zynq device.
AIScale Product Package
AIScale CNN Hardware Accelerator IP
AIScaleCDP2 IP Core Preliminary Datasheet
85659 Forstern, Germany
Phone: +49 8124 91890 03
Fax: +49 8124 91890 55
Geschäftsführer: Ullrich Nake, Harald Weiss
Commercial Register B München: HRB 226267