UVA Health researchers have developed an important new tool to help scientists sort signal from noise when investigating the genetic causes of cancer and other diseases. In addition to advancing research and potentially speeding up new treatments, the new device could help improve cancer diagnosis by making it easier for doctors to detect cancer cells.
Developed by UVA’s Chongzhi Zhang, PhD, and his team and collaborators, the new tool is a mathematical model that will help ensure the integrity of “big data” about the building blocks of our chromosomes, the genetic material called chromatin. Chromatin – the assembly of DNA and proteins – plays an important role in directing the activity of our genes. When chromatin goes awry, it can turn a healthy cell into cancer or contribute to other diseases.
Scientists can now study chromatin within individual cells using a state-of-the-art technique called “single-cell ATAC-seq”, but this generates an enormous amount of data, with a lot of noise and bias involved. Zang’s new device bites him, saving the scientists from false leads and wasted efforts.
As of the best of times, large-scale, single-cell genomics research is like “hunting for a needle in a haystack,” says Zang. But his new tool will make it a whole lot easier by clearing out a lot of the bad grass.
Using the traditional way of analyzing the data, you may see some patterns that look like genuine signals of a particular chromatin state, but they are actually spurious because of the bias of the experimental technique. Such spurious signals can mislead scientists. We have developed a model to better capture and filter out such spurious signals, so that the real needle we are looking for can be more easily spotted in the haystack.”
Chongzhi Zhang, PhD, computational biologist with the UVA Center for Public Health Genomics and UVA Health Cancer Center
About Genomics Tools
Zang’s new device adopts a model from number theory and cryptology called “simplex encoding”. He and his colleagues used DNA sequences to code them into mathematical forms and eventually converted the complex genome sequence into a much simpler mathematical form. They can then compare the different variants to detect bias and noise in the sequence data that cannot be easily detected using traditional approaches.
“The complexity of DNA sequences increases exponentially when they get longer. They are difficult to model because a typical dataset consists of sequences from thousands to millions of cells,” said Shen Shun Hu, PhD, a research scientist in Zang’s lab. And said the lead author. this work. “But the simplex encoding model can give accurate estimates of sequence biases because of its beautiful mathematical property.”
Tests of the tool showed that it was significantly better at analyzing complex single-cell data to characterize different types of cells. This is important for both basic biology research and disease diagnosis, in which doctors must detect small numbers of disease cells within very large samples ranging from thousands to millions of cells.
“It was not easy to find the biases because they were confounded with the real signals and hidden in the big data,” Zang said. This may not be a big deal if people are only going to pick the strongest signals from a large number of cells. Huh.” who co-led several other single-cell genomics research, most recently studying coronary artery disease and gut development. “But when you look at single-cell data, there is no longer any low-hanging fruit. Signals are always weak at the individual cell level, and the effects of noise and biases can be devastating. Bias correction is often overlooked. but may be important in single-cell data analysis.”
To make their new tool widely available, the researchers have created free, open-source software and posted it online. The software can be found at https://github.com/zang-lab/SELMA and https://doi.org/10.5281/zenodo.7048767.
“We hope this tool can benefit the biomedical research community in studying chromatin biology and genomics, and ultimately help in disease research,” said Zhang. “It’s always exciting to see our partners use the tools we’ve developed in our own research to make important scientific discoveries.”
The researchers have published their findings in the scientific journal nature communication, (The article is open access, meaning it is free to read.) The team includes Sheng Shaun Hu, Lin Liu, Qi Li, Wenjing Ma, Michael J. Guertin, Clifford A. Mayer, Ke Deng, Tingting Zhang and Chongzhi Zhang. ,
Zang is part of UVA’s Departments of Public Health Sciences, Biochemistry and Molecular Genetics, and Biomedical Engineering. The Department of Biomedical Engineering is a collaboration of UVA’s School of Medicine and School of Engineering.
The work was supported by the National Institutes of Health, grants R35GM133712, K22CA204439 and R35GM128635; National Science Foundation, grant NSF-796 2048991; University of Pittsburgh Center for Research Computing; UVA Cancer Center; and NIH’s National Cancer Institute, Cancer Center Support Grant P30 CA44579.
University of Virginia Health System
hu, ss, and others. (2022) Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA. nature communication, doi.org/10.1038/s41467-022-33194-z.