LLM engineer

Дата размещения вакансии: 15.01.2026
Работодатель: Hi, Rockits!
Уровень зарплаты:
з/п не указана
Город:
Москва
Требуемый опыт работы:
От 3 до 6 лет

We are looking for LLM/ ML Infrastructure engineer experienced with Rust/C++ and CUDA for remote work.

Our client is building a decentralized AI infrastructure focused on running and serving ML models directly on user-owned hardware (on-prem / edge environments).

A core component of the product is a proprietary “capsule” runtime for deploying and running ML models. Currently, some components rely on popular open-source solutions (e.g., llama.cpp). Still, the strategic goal is to replace community-driven components with in-house ML infrastructure to gain complete control over performance, optimization, and long-term evolution.

In parallel, the company is developing:

  • its own network for generating high-quality, domain-specific datasets,

  • fine-tuned compact models for specialized use cases,

  • a research track focused on ranking, aggregation, accuracy improvements, and latency reduction.

The primary target audience is B2B IT companies.

The long-term product vision is to move beyond generic code generation and focus on high-performance, hardware-aware, and efficiency-optimized code generation.

ML Direction

1. Applied ML Track (Primary focus for this role)

  • Development of ML inference infrastructure

  • Building and evolving proprietary runtime capsules

  • Porting and implementing ML algorithms on a custom architecture

  • Low-level performance optimization across hardware platforms

2. Research Track

  • ML research with published papers

  • Improvements in answer quality and inference efficiency

  • Experiments with aggregation, ranking, and latency reduction

👉 This position is primarily focused on the applied ML / engineering track.

Role

This is a strongly engineering-oriented ML role focused on inference, performance, and systems-level implementation rather than model experimentation.

📌 Approximately 90% of the work is hands-on coding and optimization.

You will

  • Implement ML algorithms from research papers into production-ready code

  • Port existing ML inference algorithms to the company’s proprietary architecture

  • Develop and optimize inference

  • Optimize performance, memory usage, and latency

  • Integrate and adapt open-source ML solutions (LLaMA, VLMs, llama.cpp, etc.)

  • Contribute to the foundational architecture of the ML platform

Key Responsibilities

Inference Infrastructure Development:

○ Design and implementation of a cross-platform engine for ML model inference

○ Development of low-level components in Rust and C++ with focus on maximum performance

○ Creation and integration of APIs for interaction with the inference engine

Performance Optimization:

○ Implementation of modern optimization algorithms: Flash Attention, PagedAttention, continuous batching

○ Development and optimization of CUDA kernels for GPU-accelerated computations

○ Profiling and performance tuning across various GPU architectures

○ Optimization of memory usage and model throughput

Model Operations:

○ Implementation of efficient model quantization methods (GPTQ, AWQ, GGUF)

○ Development of memory management system for working with large language models

○ Integration of support for various model architectures (LLaMA, Mistral, Qwen, and others)

We are waiting from you

  • Strong proficiency in Rust or C++

  • Hands-on experience with GPU / hardware acceleration, including:

    • CUDA, AMD or Metal (Apple Silicon)

  • Solid understanding of:

    • LLM principles

    • core ML algorithms

    • modern ML approaches used in production systems

  • Ability to read ML research papers and implement them in code

  • Ability to write clean, efficient, highly optimized code

  • Interest in systems-level ML and low-level performance optimization

  • High level of autonomy:

    • take existing algorithms from research or open-source,

    • understand them deeply,

    • adapt and integrate them into a new architecture

  • Fruent English

What The Company Offers

  • Remote-first setup (work from anywhere)

  • Dubai working hours

  • High level of ownership and autonomy

  • Flat structure

  • Salary in cryptocurrency

  • An opportunity to create a great product that will break the AI market