Research

Active research areas and the projects that explore them. Each interest is backed by hands-on implementation and experimentation.

LLM Architecture

Designing more efficient, capable, and interpretable language model architectures. Exploring alternatives to the standard transformer stack including state-space models, linear attention, and mixture-of-experts systems.

Key Papers

Attention Is All You NeedMamba: Linear-Time Sequence ModelingMixtral of Experts

Related Projects

#1Build a Tokenizer from Scratch #2One-Hot Vectors, Learned Embeddings, and Semantic Geometry #3Sinusoidal, Learned, RoPE, and ALiBi Positional Methods #4Scaled Dot-Product Attention #5Multi-Head Attention #6Build a Transformer Decoder Block #10Speculative Decoding #11KV Cache

Efficient Inference

Optimizing LLM inference through algorithmic improvements (FlashAttention, speculative decoding), system optimizations (continuous batching, KV cache management), and hardware-aware design.

Key Papers

FlashAttention-2Speculative DecodingvLLM: Easy, Fast, and Cheap LLM Serving

Related Projects

#9Sampling Dashboard #10Speculative Decoding #11KV Cache #12MQA, GQA, and MLA #15FlashAttention Comparison #16FLOPs, Memory, Bandwidth, Precision

Model Compression

Reducing model size and inference cost through quantization, pruning, knowledge distillation, and architecture search. Focus on maintaining quality while achieving dramatic speedups.

Key Papers

GPTQ: Accurate Post-Training QuantizationAWQ: Activation-aware Weight QuantizationThe Case for 4-bit Precision

Related Projects

#15FlashAttention Comparison #26Quantize a Model

Long Context Systems

Extending transformer context windows to millions of tokens through architectural innovations, memory systems, and position encoding advances. Applications in document analysis, code understanding, and multi-turn conversation.

Key Papers

LongformerRing AttentionYaRN: Efficient Context Window Extension

Related Projects

#13Sliding-Window Attention #14RoPE Scaling and Memory Systems

Agentic AI

Building autonomous systems that can plan, reason, use tools, and interact with environments. Focus on reliability, safety, and capability in multi-step reasoning tasks.

Key Papers

ReAct: Synergizing Reasoning and ActingReflexion: Self-Reflective AgentsToolformer

Related Projects

#29RAG from Scratch #30Agent Loops and Tool Use

Multimodal Models

Extending language models to understand and generate across vision, audio, and other modalities. Focus on efficient alignment and unified representation learning.

Key Papers

CLIPLLaVAFlamingo

Related Projects

#31Tiny Vision-Language Adapter

Interpretability

Understanding the internal mechanisms of language models through mechanistic interpretability, feature visualization, and circuit tracing. Goal: making AI systems understandable and auditable.

Key Papers

A Mathematical Framework for Transformer CircuitsSparse Autoencoders Find Highly Interpretable FeaturesMechanistic Interpretability

Related Projects

#32Sparse Autoencoders and Circuits #33Red-Team and Safety Evaluation Suite

Open Source AI

Contributing to open, reproducible, and accessible AI research. Building tools, datasets, and models that democratize access to state-of-the-art AI capabilities.

Key Papers

The PileOpenLMOLMo

Related Projects

#21Pretraining Data Pipeline #22Synthetic Data Generation #34Train, Tune, Quantize, Serve, Evaluate, and Document a Complete Language Model System

Research Philosophy

I believe the most impactful research in AI comes from deep, end-to-end understanding of systems. Rather than optimizing isolated metrics, I focus on understanding the fundamental mechanisms that make language models work—and where they break.

My approach combines rigorous engineering with scientific curiosity. Every project starts with a question, proceeds through systematic experimentation, and ends with documented insights that inform the next question.

I am particularly interested in making AI systems more efficient, interpretable, and safe. The ultimate goal is not just to build bigger models, but to build better systems that we can understand, trust, and deploy responsibly.