OUR TECHNOLOGY

lemurian labs technology

Targeted Industry Sectors

Our Spatial Processing Unit (SPU) is built to supercharge any vision related problem

Commercial Robots

Real-time computing localized to a robot garners more computing power, is more secure, and adds a layer of cost savings.

Commercial Drones

By implementing a more efficient and faster processor, a drone will be able to extend its flight dramatically, with a longer battery life.

Surveillance & Tracking

Localized processing greatly reduces cloud costs, enabling on-device processing which gives the user faster more efficient data.

The Spatial Processing Unit

The world's first processor designed from the ground up to accelerate the entire robotics pipeline end-to-end. The SPU delivers jaw-dropping performance out of the box, without sacrificing precision, number of cameras, camera resolution, frame rate, or workload. When building with the SPU, engineers no longer need to compromise.

lemurian lab spatial processing unit

1.
Sensor Cluster

The sensor cluster processes sensor streams on-chip, dynamically configuring streams as required

2.
Deep Learning Cluster

The deep learning cluster is a matrix math engine capable of running any kind of deep neural network model

3.
Decision Making Cluster

The decision making cluster is a parallel data flow engine for classical and reinforcement learning, making it ideal for path planning and navigation

4.
Actuation Cluster

The actuation cluster drives actuators directly from on-chip for lower latency, and is tailored for adaptive control and safety

TOPs/W Is A Misleading Metric

A lot of companies talk about TOPs/W, which is the theoretical maximum performance that a processor is capable of, but this doesn't give software engineers much to go on. What matters is how well the available performance can be accessed from software and how productive engineers will be using the processor.

We have taken a software-first approach to architecting our processor, and optimizing it for insanely high inferences/second/watt without imposing restrictions on the kinds of workloads engineers want to run. This way engineers don't have to waste precious time performance engineering and can instead focus on the more important task of improving their models.

We Custom Built New Math For AI

Deep learning and reinforcement learning are dominated by matrix math and therefore very computationally intensive. Most hardware performance gains have come from optimizing the hardware to run only very specific workloads and from reducing the accuracy of the number format (INT8, INT4, analog, etc). But as they are now learning, this is the wrong approach.

Safe, full autonomy requires floating point-like precision to properly cover the distribution of weights in neural networks. However, >16-bit floating point formats are over-precise and over-sized, not to mention costly in silicon. We have discovered that neural network stability is negatively impacted by quantizing to INT8, leading to misclassifications in the real world.

Number formats are a centrally important parameter in processor design as it affects overall performance and hardware complexity with impacts on storage requirements, processing performance, and power dissipation.We sought out to create a number format that would best meet the needs of AI, while delivering performance and efficiency gains.

We call it PAL8 (parallel adaptive logs). With this new format, we are freed of the constraints all others are forced to adhere to, and we can design the right processor to enable the era of autonomous things.

The Goodness of PAL8

6.2x

Less area than INT8

3x

Faster than INT8

42%

The power consumption of INT8

100%

The precision of FP16