Deep learning was enabled by hardware, and its progress is now limited by software

Models and data set growth are increasingly outpacing hardware progress, making the job of the AI developer more complicated than a multidimensional game of Tetris. Moreover, when you are able to find that ephemeral best fit, the cost and power consumption are just plain unsustainable.

At Lemurian we work in service of YOU, the AI Developer. You already have to think about dataset construction, neural network architecture, how to set up training runs, and how to serve models. These are all complicated tasks. You shouldn’t also have to deal with the brittleness of existing software stacks that keep breaking whenever they try to push boundaries.

We have a visceral understanding of this problem because we have walked miles in the shoes of the AI developer. We know that the problem is multidimensional. We know that one or two improvements will not be enough. We have to reimagine computing from the lens of the AI developer. That means a true truly unified platform that shields you from the hardware complexity and creates a step function improvement in performance, efficiency and developer productivity

Our Tech makes it so you never need to pore through reams of hardware documents again

01

Our Full Stack Solution



Lemurian Labs delivers a full software stack that ingests pytorch and can run it on any hardware at any scale, allowing you to seamlessly go from training to inference. execution designed to improve model performance and developer productivity. As opposed to other stacks that need to be custom built, using APIs to connect 4-10 different tools, this is the only stack developers will need to target and optimize their models and workloads. Our software also manages the execution at runtime, truly making it the only fully unified stack.

We have taken a first principles approach to building, from the ground up, the software stack that AI developers need so they never have to deal with brittle APIs again, and can get their models performantly running on any hardware, anywhere.

technologytechnology

Ingest Pytorch models and execute on any hardware

We created a compiler for developers so they never have to redo the insipid work of optimizing the same model for a different hardware.

02

Hardware-Aware Performant Portability

Our compiler is hardware agnostic, supporting all types of compute, ie GPUs, CPU, NPU, etc  for“hardware-aware compilation” articulating hardware at the top of the stack to make more intelligent task mapping decisions. We decompose the model in to task graphs and decompose the target hardware to basic compute blocks based on their individual ISAs. Mapping task graphs to ISAs is tedious and slow when done manually, but a perfect use case for automation through our software.

Getting a model to meet performance and cost targets requires a specialized skillset and knowledge, from a deep understanding of the model architecture and the target hardware memory structure and ISA. If a different hardware platform is targeted, that entire workflow starts all over again, beginning with the optimization engineer learning another architecture and ISA.

Model portability that can take in any previously optimized model and retarget to new/different hardware with no human intervention, delivering the same or better performance. There is no tool or stack of tools today that can deliver this.

technologytechnology

Port optimized models to any hardware and maintain performance

We solved a 250 year old math problem

03

New Data Type for More Efficient and Reliable AI

We solved a 250 year old math problem to create a breakthrough in computing efficiency. Our innovative logarithmic data type not only has better representation than floating-point, it enables an astounding increase in efficiency. Additionally, the smaller, more efficient data representation can be used to significantly increase memory efficiency when used as a compression technique. This smaller representation retains accuracy and precision, while making memory constrained hardware architectures more efficient.

This boost allows us the potential to break free from legacy approaches to parallel computing. By revolutionizing the way you compute, we empower you to achieve more with less.

8-bit number line spectra for floating point, PAL, INT and traditional Log Number Systems

8-bit number line spectra for FP, PAL, INT, LNS.

We use website use cookies to improve your experience. Privacy Policy