Observing the growth of humanity from the tender stages of infancy to the independent phase of adulthood is undoubtedly a wondrous journey. As time passes, infants grow dramatically, teenagers go through their rebellious growth spurts, and young adults display powerful memory and learning abilities. Just like organisms in nature, human growth, maturation, and aging processes all rely on their internal biological instructions—DNA.
In the past twenty-plus years of scientific exploration, researchers have discovered a curious phenomenon: in different species, there are genes unique to each species. These unique genes’ origins are the essential mystery that forms the creation of new species and the emergence of diverse life forms.
Professor M. Mar Albà from the Evolutionary Genomics group at the Research Institute of the Hospital del Mar in Spain and her research team have delved deep into this scientific enigma. Their research centers on the transcription process of animal DNA, that is, the transcription of DNA into RNA, also known as the transcriptome. In this process, most of the DNA stays inside the cell nucleus, is transcribed into RNA, and then the RNA encoding proteins enters the cytoplasm where it is translated into proteins and performs functions. However, there is a great number of non-coding RNAs (RNAs that are not translated into proteins), some of which act within the cytoplasm, while many remain in the nucleus to perform their functions.
After analyzing the transcriptomes of humans, chimpanzees, macaques, and mice, Albà and her team found 2714 new genes in humans and chimpanzees that were absent in macaques and mice. These so-called de novo genes have the capability to express new proteins. Interestingly, in the genomes of humans and chimpanzees, nearly 90% of the de novo genes did not come from traditionally protein-coding DNA regions, but were located in so-called introns and intergenic regions.
Since the late 1950s, scientists have been using the concept of “junk DNA” to refer to those DNA sequences that do not code for proteins, or non-coding DNA. Japanese scientist Susumu Ohno published an influential paper in 1972 where he used the term “junk DNA” and proposed the idea that 90% of mammalian genomes are composed of this functionless constituent, encompassing structures like pseudogenes, transposons, and viral fragments. However, as scientific research has progressed, scientists have increasingly discovered that this non-coding DNA actually carries specific functions, including regulating gene expression and participating in the construction of protein complexes. More recently, researchers have even found in fruit flies that certain non-coding DNA sequences have the potential to evolve into completely new genes that can code for proteins.
DNA transcribed into long non-coding RNAs (i.e., lncRNAs) is a biological process where these lncRNAs are normally located in the cell nucleus. To further express protein, the lncRNAs need to move into the cytoplasm for expression.
Recent studies show that some lncRNAs can transform into messenger RNAs that code for proteins, expanding our understanding of gene function. A research team from Peking University, by comparing the genomes of humans, chimpanzees, and rhesus monkeys, identified 74 cases of lncRNA transformation into coding RNA. This transformation involves specific mutations in the original non-coding DNA sequences, allowing the transcribed RNA to leave the nucleus and express proteins on the ribosomes in the cytoplasm.
The newly produced genes are in stark contrast with their original lncRNAs, and their original functional networks will no longer apply to the new proteins they produce. Of these 74 new genes, 29 are shared by humans and chimpanzees, indicating that these emerged after their separation from rhesus monkeys in the course of evolution. Additionally, humans have developed 45 unique de novo genes in the 6 million years after separating from chimpanzees.
To further explore the function of these uniquely human de novo genes, scientists focused on studying the new gene ENSG00000205704. They found that this gene is active in the neural progenitor cells of the human brain and expresses a protein only 107 amino acids in length. The researchers attempted gene silencing and overexpression, as well as experiments involving transformation of embryonic stem cells into neural stem cells and generating cortical-like organs. The results indicated that silencing the gene led to a delay in neuronal maturation, while overexpression accelerated it.
When the ENSG00000205704 gene was introduced and expressed in a mouse model, the mice exhibited increased neuron generation, leading to larger brain size and cortical expansion. This suggests that the protein encoded by ENSG00000205704 can extend the immature state of neural stem cells, thus increasing the number of cell divisions before mature neurons form. Conversely, deletion of the gene caused neural stem cells to mature prematurely, leading to a reduced number of mature neurons. This research further confirms that the ENSG00000205704 gene aids the formation of larger brains by affecting the rate of neuronal maturation.
Researchers point out in a paper that after introducing an exogenous de novo gene into mice, a significant enlargement of the brain and cerebral cortex was observed. This suggests that de novo genes might rapidly exert their functions through new interactions with the existing gene network. Furthermore, the research team has also presented preliminary findings on how long non-coding RNAs (lncRNAs) can transform into messenger RNAs (mRNAs) with protein-coding capabilities.
The regulation of gene expression in organisms is affected by changes in the internal environment. In these regulatory behaviors, whether there is a mechanism to convert lncRNA into messenger RNA has also become a focus of scientific research. Findings reveal that there are multiple specific binding sites within lncRNAs, which can bind to certain specific proteins; this binding can change the destiny of lncRNAs, determining whether they remain in the nucleus or move to the cytoplasm.
Some studies have shown that a particular type of RNA-binding protein can promote the transfer of lncRNAs from the nucleus to the cytoplasm. Meanwhile, another study has revealed an intriguing phenomenon—within the cytoplasm, over 70% of lncRNAs were found to have the potential to be translated into proteins, despite their low expression levels. This finding not only further illustrates the multifunctionality of lncRNAs but also indicates that these RNAs provide a rich resource for the screening and innovation of new proteins in biological evolution.
The existence of such mechanisms reflects the “preparedness” strategy of organisms when adapting to environmental changes. This strategy undoubtedly enhances the adaptability of organisms to environmental challenges and lays a solid foundation for the continuous development of life.