⚡ Transformer From Scratch
A complete Transformer language model built from scratch in PyTorch
Loading model...
📊
Model Overview
—
Parameters
—
Vocab Size
—
d_model
—
Heads
—
Layers
—
Device
✍️
Generate Text
Prompt
To be or not
Decoding Method
Greedy
Top-k
Temperature
Temperature
0.8
Top-k
10
Max Tokens
100
✨ Generate
📜
Generated Output
Generated text will appear here...
🏋️
Train Model
Epochs
d_model
64
128
256
Heads
2
4
8
Layers
2
4
6
🚀 Start Training
🏗️
Architecture
Input Tokens
↓
Embedding × √d_model
↓
+ Positional Encoding
↓
Decoder Layer × N
Self-Attn → Add&Norm → FFN → Add&Norm
↓
Layer Norm
↓
Linear → Logits
↓
Softmax → Next Token