Research

Research that ships as software.

From Nagercoil, we build low-level systems for AI — GPU kernel compilers and inference engines that run anywhere a GPU runs. Open source, reproducible, and pushed to production.

Systems · Inference2026
v2.0 · Open source

ZSE Zero-dependency Server Engine for LLM Inference

A production LLM inference engine that owns the full stack — no PyTorch, no Triton, no transformers. Models load in seconds, not minutes, and serve in a fraction of the memory other engines need.

LLM servingINT4 quantizationCUDAROCmMetalRAG
30.2×
Faster cold start vs vLLM (T4)
5.79 GB
VRAM for Qwen2.5-7B on a T4
~5 MB
Pip install · zero ML deps
40.6k
Lines of code · 0 dependencies
Compilers · GPU2026
v1.0 · Open source

locomp A Python GPU Kernel Compiler for Apple Silicon, NVIDIA, AMD & RISC-V

Write a GPU kernel once as a plain Python function. locomp compiles it through an SSA intermediate representation into native Metal, CUDA, HIP or RISC-V vector code — one kernel runs everywhere, no rewrites.

CompilerSSA IRMetalCUDAROCmRISC-V RVV
4
Hardware backends from one source
85×
Faster GELU vs PyTorch (A100)
2.6×
Faster GELU vs MLX (Apple M1)
227+
Tests across M1, A100, RISC-V
How we work

Our research principles

01

Own the full stack

We rebuild from first principles — kernel compilers, inference engines, formats — instead of stacking dependencies. Fewer layers, more control.

02

Numbers, not marketing

Every claim is reproducible on real hardware. We publish the benchmarks, the scripts and the honest limitations alongside the wins.

03

Open by default

Our research ships as Apache-2.0 software you can read, run and build on — from a laptop GPU to a data-center cluster.

Want to collaborate or sponsor?

We work with researchers, hardware vendors and open-source backers pushing efficient AI infrastructure forward.