Part 1
Completed

One-Hot Vectors, Learned Embeddings, and Semantic Geometry

Explored the progression from sparse one-hot representations to dense learned embeddings. Implemented embedding layers, analyzed semantic geometry through cosine similarity and analogical reasoning, and visualized high-dimensional embedding spaces using t-SNE and UMAP.

What I Built

Explored the progression from sparse one-hot representations to dense learned embeddings. Implemented embedding layers, analyzed semantic geometry through cosine similarity and analogical reasoning, and visualized high-dimensional embedding spaces using t-SNE and UMAP.

Key Concepts

One-Hot EncodingDense EmbeddingsSemantic GeometryCosine SimilarityAnalogical Reasoningt-SNEUMAP

Architecture

1
Embedding Layer
2
Similarity Engine
3
Visualization Pipeline
4
Analogy Solver

Results

Trained embeddings on 100M tokens capturing king - man + woman ≈ queen with 0.89 cosine similarity. Embedding space shows clear semantic clustering.

Key Learnings

  • Embedding dimension is a trade-off between expressiveness and efficiency
  • Semantic geometry emerges naturally from co-occurrence statistics
  • Initialization scale matters significantly for embedding training stability

Challenges

  • High-dimensional visualization without losing structure
  • Handling rare tokens with few training examples
  • Debiasing embedding spaces