Atandra Bharati

🧠

Featured

DeepSeek-V3-Lite

Full reimplementation of the DeepSeek-V3 architecture from scratch — a 27-layer transformer with 1 dense block and 26 Mixture-of-Experts blocks (2B effective parameters, 64 routed + 2 shared experts, top-6 activated). Implements Multi-Head Latent Attention with KV-cache compression (10–20×), custom FP8 Triton kernels, Multi-Token Prediction, and a complete post-training pipeline.

MLA with decoupled RoPE (YaRN), absorption trick, KV cache compression 10–20×
FP8 E4M3FN training — custom Triton kernels for block-wise quant, GEMM, dequant
Full post-training: SFT, GRPO (group_size=8), R1 distillation, speculative decoding
FSDP distributed training across 8× RTX 5090 GPUs with safetensors checkpointing

PyTorch Transformers MoE MLA Triton FP8 GRPO FSDP

View on GitHub →

🎨

Featured

Stable Diffusion

Stable Diffusion built entirely from scratch — custom UNet (~860M parameters), DDPM and DDIM schedulers, frozen VAE (stabilityai/sd-vae-ft-mse) and CLIP text encoder, trained on the LAION-2B-en-aesthetic dataset with a full 5-stage data pipeline. Pre-encodes latents into RAM (~42 GB) for fast training.

Custom UNet with cross-attention, spatial transformers, ch_mults=(1,2,4,4), 8 attention heads
DDPM (scaled_linear, β=0.00085–0.012, T=1000) + DDIM (25–50 steps, CFG=7.5)
Dual-GPU VAE encoding with process isolation (CUDA_VISIBLE_DEVICES per worker)
bfloat16 + torch.compile, fused AdamW, EMA (decay=0.9999), DataParallel (2× RTX PRO 4500)

PyTorch Diffusion UNet DDPM/DDIM Multi-GPU W&B

View on GitHub →

👁️

Vision Language Model

PaliGemma-inspired VLM built from scratch — SigLIP vision encoder, GQA-based language decoder with RoPE positional encodings and GeGLU activations, connected via a trainable linear projector. Trained on COCO 2014 captioning on a P100 GPU.

SigLIP vision encoder with patch embedding and learned positional encodings
GQA language decoder with GeGLU FFN and RoPE, fully from scratch
Linear projector bridging vision and language spaces

PyTorch VLM GQA RoPE SigLIP

View on GitHub →

🔄

Face Aging CycleGAN

Bidirectional face age transformation (young ↔ old) using a conditional CycleGAN with AdaIN style normalization, multi-scale discriminator, and VGG perceptual loss. Trained on RTX 6000 Ada with Weights & Biases experiment tracking.

AdaIN (Adaptive Instance Normalization) for age-conditioned style transfer
Multi-scale discriminator for improved texture fidelity
VGG perceptual loss + cycle-consistency + identity loss

PyTorch CycleGAN AdaIN W&B

View on GitHub →

💬

GPT From Scratch

Decoder-only Transformer language model built entirely from scratch with PyTorch. Implements multi-head self-attention, causal masking, layer normalization, and positional embeddings. Trained on Tiny Shakespeare as a character-level LM.

Full decoder-only transformer with causal multi-head attention from scratch
Character-level tokenization on Tiny Shakespeare
Foundational study of the GPT architecture and training dynamics

PyTorch Transformer LLM Python

View on GitHub →

🌍

Neural Machine Translation

Full encoder-decoder Transformer for English → Italian neural machine translation. BPE tokenizer trained from scratch on the opus_books corpus. Training loss reduced from 6.17 → 2.28 over 20 epochs on a P100 GPU.

Complete encoder-decoder transformer with cross-attention, from scratch
Custom BPE tokenizer trained on opus_books corpus
Loss: 6.17 → 2.28 over 20 epochs (P100 GPU)

PyTorch Seq2Seq BPE NMT

View on GitHub →

🎭

Face Generation VAE

β-VAE for 128×128 face generation on the CelebA dataset. Uses bilinear upsampling in the decoder, KL annealing schedule, and a 512-dimensional latent space. Tracked with Comet ML.

β-VAE with 512-dim latent space and KL annealing for stable training
Bilinear upsampling decoder for sharp 128×128 reconstructions
Comet ML experiment tracking with reconstruction + latent visualizations

PyTorch VAE Generative AI Comet ML

View on GitHub →

🤖

DCGAN Face Generation

Deep Convolutional GAN trained on CelebA to generate 64×64 face images from scratch using PyTorch. Implements the standard DCGAN architecture with spectral normalization and Comet ML experiment tracking.

DCGAN with transposed convolution generator and strided conv discriminator
Trained on CelebA (200K+ images) for 64×64 face synthesis
Comet ML tracking with generated sample grids at each epoch

PyTorch GAN Generative AI Comet ML

View on GitHub →

About Me

Technical Skills

Languages

Frameworks & Libraries

ML Architectures

Training & Infrastructure

Tools & Experiment Tracking

Post-Training Techniques

Projects

DeepSeek-V3-Lite

Stable Diffusion

Vision Language Model

Face Aging CycleGAN

GPT From Scratch

Neural Machine Translation

Face Generation VAE

DCGAN Face Generation

Education

Bachelor of Technology (B.Tech) — Civil Engineering