AI, Bias, and Vectors
Why language isn't just words: how biased patterns in training data become embedded in AI models through mathematical representations
Modern AI models like ChatGPT are built on a mathematical trick: turning words into vectors. Words live in a high-dimensional space, and their relationships are captured through distance and direction. Sounds clean, right?
Here's the catch: if biased patterns exist in the training data (and they do), they get baked right into the math.
So if a model sees "man" near "engineer" and "woman" near "receptionist" often enough, it learns that pattern—without knowing it's bias.
This isn't just theoretical. A 2016 paper showed that word embeddings (like those used in early LLMs) associated "man : computer programmer" as "woman : homemaker." That bias persists through the stack.
It's not the math that's biased. It's the data.
And if we don't catch it, it becomes part of the product.
If you're working with embeddings, LLMs, or even building search or recommendations—be intentional. Bias shows up where you don't expect it.
#AI #MachineLearning #LLMs #BiasInAI #NLP #WordEmbeddings #DataScience #LLM #ResponsibleAI