Legume family tree

The most comprehensive study of the family tree for legumes, the plant family that includes beans, soybeans, peanuts, and many other economically important crop plants, reveals a history of whole-genome duplications. The study also helps to uncover the evolution of genes involved in nitrogen fixation – a key trait likely important in the evolutionary spread and diversification of legumes and vital for their use as “green manure” in agriculture.

To reconstruct the family tree, researchers compared the DNA sequence of more than 1500 genes from 463 different legume species, including 391 newly sequenced species, that span the diversity of this large plant family.

A paper describing the study, led by Penn State Professor of Biology Hong Ma, appears in the May 2021 issue of the journal Molecular Plant.

“Legumes make up the third-largest family of flowering plants and are incredibly diverse – ranging from tiny herbs to giant trees,” said Ma, who is the Huck Distinguished Research Professor of Plant Molecular Biology at Penn State. “They are essential food crops for both humans and livestock, can be used as lumber, and have many other uses. Maybe most importantly, they can ‘fix’ nitrogen – extracting the vital nutrient from the atmosphere and storing it in nodules on their roots in a symbiotic relationship with soil bacteria – making them important as green manure to improve soil health.”

Artist's conception of legume family tree

Illustration of a tree representing the legume family tree with branches representing the six subfamilies. On each branches are flowers or pods of species belonging to the subfamilies. The lines extending from the nutrient bag on the upper left corner indicate the positions of some of the proposed whole-genome duplications.

IMAGE: Yiyong Zhao, Chien-Hsun Huang, and Hong Ma

There are over 19,000 species in the legume family divided into six subfamilies and then further divided into narrower and narrower groupings based on their evolutionary relationships. There are 765 genera – the grouping one level above species – of which the team sampled members of 333. To build the family tree, the team analyzed gene sequences from the transcriptomes – the portion of the genome that is expressed as genes – of most of the 463 species and a small number of shallowly sequenced whole genomes from across legume diversity.

“This is the largest study of this kind for a single plant family,” said Ma. “We went to great lengths to sample as many species as we could to get a broad representation of the legume family, but it is often difficult to get well-preserved specimens that we can extract DNA or RNA from, especially for species found in remote locations. Having this broad representation of species allowed us to build the most detailed nuclear-gene family tree for legumes to date.”

In addition to helping researchers understand the evolution and diversification of legumes, the new legume family tree helps to clarify the relationship between crop plants and their wild relatives. Although the close relatives of important agricultural crops are often known, studying more distant wild cousins could reveal traits that could be exploited to help plants thrive in changing environments and resist diseases or insect pests.

Across the legume family tree, the research team identified strong evidence for 28 separate whole-genome duplication events. Whole-genome duplications, evolutionary events that result in complete duplication of the entire genome, are fairly common among flowering plants and are thought to allow for functional innovation and evolutionary diversification. One of the duplication events that the team identified appears to have occurred in the ancestor of all members of the legume family.

“Because for most of the species in our study we used transcriptomes and do not have entire genome sequences, we consider these as ‘proposed’ genome duplication events,” said Ma. “These kinds of studies are kind of like solving a mystery. If you only have one or a few witnesses it might be difficult to convince a jury of your evidence, but if you have a hundred witnesses who have different perspectives and they all point to the same thing it becomes difficult to dismiss that evidence. In our case, the different species are like our witnesses. The size of our study allowed us to identify events that we might otherwise have dismissed.”

The two largest subfamilies account for over 17,000 legume species and include all of the species with the ability to fix nitrogen. Nitrogen is an important plant nutrient – most commercial fertilizers contain a mix of nitrogen, phosphorus and potassium – so the symbiotic relationship between some legumes and the microorganisms that allow them to assimilate nitrogen from the atmosphere using root nodules has spurred their success by allowing them to colonize areas with less fertile soil. The research team also identified clues to the evolution of the genes responsible for this important trait.

“Our data support the idea that nodulation and nitrogen fixation originated a single time early in the history of legumes and other related nitrogen-fixing plants and the whole-genome duplication event at the origin of legumes might have been crucial for the evolution of this process,” said Ma. “In addition to this duplication event, we are also able to see gene loss in plants that do not have the ability to nodulate, and evolutionary changes in genes that contributed to their role in nodulation.”

In addition to Ma, the research team includes Yiyong Zhao, Rong Zhang, Kaiwen Jiang, Ji Qi, Yi Hu, Jing Guo, Renbin Zhu, Taikui Zhang, Ashley N. Egan, Ting-Shuang Yi, and Chien-Hsun Huang. This research was funded by the National Natural Science Foundation of China, the Strategic Priority Research Program of Chinese Academy of the Sciences, the State Key Laboratory of Genetic Engineering, the Ministry of Education Key Laboratory of Biodiversity Science and Ecological Engineering at Fudan University, and Penn State.

/Public Release. View in full here.