One-Hot Vectors, Learned Embeddings, and Semantic Geometry

Explored the progression from sparse one-hot representations to dense learned embeddings. Implemented embedding layers, analyzed semantic geometry through cosine similarity and analogical reasoning, and visualized high-dimensional embedding spaces using t-SNE and UMAP.

View Code

What I Built

Key Concepts

One-Hot EncodingDense EmbeddingsSemantic GeometryCosine SimilarityAnalogical Reasoningt-SNEUMAP

Architecture

Embedding Layer

Similarity Engine

Visualization Pipeline

Analogy Solver

Results

Trained embeddings on 100M tokens capturing king - man + woman ≈ queen with 0.89 cosine similarity. Embedding space shows clear semantic clustering.

Key Learnings

Embedding dimension is a trade-off between expressiveness and efficiency
Semantic geometry emerges naturally from co-occurrence statistics
Initialization scale matters significantly for embedding training stability

Challenges

High-dimensional visualization without losing structure
Handling rare tokens with few training examples
Debiasing embedding spaces

Back to Roadmap