Part 1
Completed
One-Hot Vectors, Learned Embeddings, and Semantic Geometry
Explored the progression from sparse one-hot representations to dense learned embeddings. Implemented embedding layers, analyzed semantic geometry through cosine similarity and analogical reasoning, and visualized high-dimensional embedding spaces using t-SNE and UMAP.
What I Built
Explored the progression from sparse one-hot representations to dense learned embeddings. Implemented embedding layers, analyzed semantic geometry through cosine similarity and analogical reasoning, and visualized high-dimensional embedding spaces using t-SNE and UMAP.
Key Concepts
One-Hot EncodingDense EmbeddingsSemantic GeometryCosine SimilarityAnalogical Reasoningt-SNEUMAP
Architecture
1
Embedding Layer2
Similarity Engine3
Visualization Pipeline4
Analogy SolverResults
Trained embeddings on 100M tokens capturing king - man + woman ≈ queen with 0.89 cosine similarity. Embedding space shows clear semantic clustering.
Key Learnings
- Embedding dimension is a trade-off between expressiveness and efficiency
- Semantic geometry emerges naturally from co-occurrence statistics
- Initialization scale matters significantly for embedding training stability
Challenges
- High-dimensional visualization without losing structure
- Handling rare tokens with few training examples
- Debiasing embedding spaces