ZSE — Zero-dependency Server Engine for LLM Inference
A production LLM inference engine that owns the full stack — no PyTorch, no Triton, no transformers. Models load in seconds, not minutes, and serve in a fraction of the memory other engines need.
From Nagercoil, we build low-level systems for AI — GPU kernel compilers and inference engines that run anywhere a GPU runs. Open source, reproducible, and pushed to production.
A production LLM inference engine that owns the full stack — no PyTorch, no Triton, no transformers. Models load in seconds, not minutes, and serve in a fraction of the memory other engines need.
Write a GPU kernel once as a plain Python function. locomp compiles it through an SSA intermediate representation into native Metal, CUDA, HIP or RISC-V vector code — one kernel runs everywhere, no rewrites.
We rebuild from first principles — kernel compilers, inference engines, formats — instead of stacking dependencies. Fewer layers, more control.
Every claim is reproducible on real hardware. We publish the benchmarks, the scripts and the honest limitations alongside the wins.
Our research ships as Apache-2.0 software you can read, run and build on — from a laptop GPU to a data-center cluster.
We work with researchers, hardware vendors and open-source backers pushing efficient AI infrastructure forward.