The difference between “homeobox” and “Hox” genes

Update: This post was just a quick rant. If I knew it would become so popular I would have put in more effort. I get so many emails about this piece and it gets linked to so often that it just goes to show how much of a problem this is in education. I’m glad it’s helping people. I keep changing where I blog so I’m leaving this version here on Medium so people can always link to it.

This is a big pet peeve. Let’s get straight to business: the terms “homeobox” and “Hox” are not interchangeable. They do mean different things. I’m correct in saying that Amphioxus (Branchiostoma lanceolatum) has 15 Hox genes but I’m also correct in pointing out that it has over 130 homeobox genes.

Gene names can be very confusing and difficult to remember, so there are many abbreviations and acronyms in biology. For example, the gene insulin-like growth factor 1 is abbreviated to Igf1. I don’t know if that makes it easier to remember but it certainly makes it easier to write about! I believe the use of abbreviations is partly responsible for the incredible confusion over homeobox and Hox genes, and I do mean incredible.

It’s obviously a confusing topic for students, or anyone new to evo-devo, developmental genetics, or gene regulation… but it’s so much worse than that. Science communicators make the mistake, professional publications make the mistake, academics make the mistake and they do it often. I think the reason it keeps happening is that the word “Hox” appears to be a shortened “Homeobox”. All over the internet you will see the terms used interchangeably, and sometimes with the apparently shortened version in brackets: “Homeobox (Hox)”. This otherwise decent glossary at Epigenesys manages to dump the terms homeotic, homeobox, and Hox into one single paragraph and glossary entry, which is of little help to a confused student seeking clarity.

If you’re a student you may have watched science videos by YouTubers like Hank Green and seen the same “Homeobox (Hox)” terminology and the two words being used interchangeably. The first Google result for “homeodomain” (ignoring Wikipedia) is R&D Systems saying, “The DNA sequence that encodes the homeodomain is called the homeobox and homeobox-containing genes are known as hox genes”. This is wrong. A homeobox-containing gene is not necessarily a Hox gene. So let’s clear this up.

First, let’s go over the facts, and what the real difference is, before we discuss why these confusing names have been chosen. Scientists discovered that there are some genes that contain a very conserved region of DNA we now call the homeobox. When I say very conserved, I really mean it. You have homeobox genes, the birds outside do, the grass outside does… even yeast does. The origin of homeobox genes is truly ancient, definitely pre-dating the origin of animals. The 180-base-pair homeobox codes for a 60-residue chain known as the homeobox domain (or homeodomain). In plain English: the region of the gene is known as a homeobox and the region of the resulting protein is the homeodomain. The explanation for why it is so conserved across organisms, through hundreds of millions of years of evolution, is that its function restricts its evolution. The homeobox domain binds DNA (or RNA), allowing a protein with a homeodomain to act in gene regulation. For example, these proteins can be used to turn genes on and off during development. It’s an invention of evolution that’s persisted through the origin of the fungi, plants, and us animals, and the homeobox itself hasn’t changed much at all. So there’s your definition of a “homeobox gene”. It isn’t a specific gene, it’s a huge and ancient group of genes that all contain the homeobox, a region of DNA that codes for a domain which can bind to DNA.

Every Hox gene is a homeobox gene, but not every homeobox gene is a Hox gene. The homeobox genes have diversified so much through evolutionary history that there are now distinct classes of them. The most famous is definitely the family of Hox genes. The terms make sense when you consider their history. When scientists first discovered the homeobox domain, they found it because they happened to be studying animals that had mutated Hox genes. These mutants often had body parts in the wrong place, and were described as “homeotic mutants”. When they identified the genes causing the mutations, they discovered that they all shared a common motif, so they named it the homeobox. This is one of the most incredible discoveries in biology, as they quickly realised that the homeobox is found in genes from humans, flies, jellyfish, daffodils, yeast, and so on. However, the actual genes they had discovered were a distinct group of homeobox genes, which we now call the Hox genes. They definitely are homeobox genes, and they regulate other genes.

Think about the confusion here. Hox genes are a distinct family of homeobox genes. Scientists discovered the homeobox motif by investigating which genes caused homeotic mutations. What they had found were the Hox genes so calling the Hox genes homeotic is fine, that’s the effect they have. However, they didn’t understand at the time that the homeobox motif is found in many genes that aren’t Hox genes. Many homeobox genes have absolutely nothing to do with body parts growing in the right or wrong places but when they named the homeobox, they only knew of the Hox genes they were discovering via the homeotic mutants. This is where almost all the confusion stems from. Despite homeobox genes in general having “homeo” in their name, most don’t cause homeotic mutants if modified. The Hox genes, a specific family of homeobox genes, are great examples of genes that can cause homeotic mutants.

In us bilaterian animals, one of the main roles of the Hox genes is to specify anteroposterior identity to your body. It’s a complicated system, but we’ll keep it simple. The Hox genes play a role in determining which body parts grow where on the body. So by messing with them you can make limbs grow in the wrong places. But as I’ve mentioned there are plenty of other non-Hox homeobox genes. There are entirely different families with entirely different roles. The Hox genes control the body plan along the anterior to posterior axis in us bilaterian animals, but there’s still some uncertainty over their precise role in non-bilaterian animals. The Hox genes do appear to be unique to animals and don’t find them in plants and fungi. Of course you still find they have other homeobox genes, just not the Hox genes, which appear to have arisen very early in animal evolution (there is evidence that sponges had Hox genes too, but have since lost them).

We know so much about homeobox genes, especially the Hox cluster, that we could discuss it all day. The evolution of the Hox, ParaHox, and NK clusters is quite fascinating, as are the roles of these gene families in a developing animal. Understanding the interplay between development and evolution can provide unique insight into both, which is incredibly exciting so do not be put off by the unfortunate naming conventions and mistakes made by others.

To summarise: Hox genes are homeobox genes as they contain the homeobox, but homeobox genes include Hox genes, ParaHox genes, NK genes etc. The terms are not interchangeable. It’s such an easy mistake to make that it appears in books, academic websites, and helpful videos on YouTube. Just keep it in mind and focus on what exactly is being discussed. It’s not necessarily wrong to describe a mobile phone as technology, but the terms aren’t interchangeable. You can’t go around describing technology as mobile phones. It makes no sense to say, “the electron microscope is a wonderful mobile phone”. Homeobox and Hox genes work the same way. You can describe a Hox gene as a homeobox gene because that’s exactly what it is. Just note that the terms aren’t interchangeable.