In a groundbreaking Nature paper, researchers have developed synthetic regulatory sequences that could prevent targeted gene therapies from having effects in unwanted cell types.
More than methylation
While methylation is the most well-known regulator of gene expression, it isn’t the only thing that determines what is to be expressed when. Cis-regulatory elements (CREs), so called because they sit near the DNA sequences they regulate, are responsible for expressing the genes that are specific to each cell type [1]. While they are technically non-coding, as they do not directly code for functional proteins, CREs are critical to epigenomic function.
Manipulating existing CREs in engineered cells is one thing, but it’s not clear if the CREs generated by evolution will always be the ideal candidates for specific therapeutic applications in specific cell types. A decade ago, researchers began seriously asking if it might be possible to generate CREs to more precisely do what we want [2]. Having functional control of CREs would allow therapies to apply only to specific cell types, potentially offering massive improvements to gene therapies that aren’t yet good enough for clinical use [3].
However, the number of potential sequences that could be inserted into a mere 200 base pairs of DNA is far larger than the number of atoms in the universe. Basic computational algorithms, therefore, will not suffice to find CREs that work. A substantial amount of previous work has gone into this topic between then and now, attempting to discover why CREs work the way they do and looking to develop a regulatory ‘grammar’ and a more complete understanding [4]. Very recently, researchers have developed CREs for use in Drosophila flies [5].
However, fruit flies aren’t mice, let alone people, and it was unclear if this process could create sequences for use in cells that can be transplanted into larger animals. These researchers appear to have done it.
A new algorithm with real-world effects
Previous work was focused on looking at the epigenetic downstream effects of CREs, but these researchers used MPRA, a system that can accurately gauge the directeffects of any given CRE. To train their model, Malinois, these researchers used sequences derived from three cell lines: bone marrow cells, liver cells, and nerve cancer cells. Even without being directly informed as to their effects, Malinois was able to accurately predict the activity of more than sixty thousand existing, natural CREs. Its predictions of epigenetic behavior were in line with experimental results in all three cell types.
Malinois, however, is just a prediction algorithm. To actually generate new CRE sequences, the researchers developed Computational Optimization of DNA Activity (CODA), which can be used with multiple algorithms. Their intention was to develop sequences that have maximal effects on one of the three cell types and minimal effects on the other two.
At first, the algorithm was attracted to certain motifs, yielding 36,000 of similar-looking sequences. However, after an algorithmic tweak to penalize re-use, CODA created 15,000 more synthetic sequences and compared them to 12,000 natural sequences.identified primarily by location and 12,000 more natural sequences identified by Malinois.
The location-based sequences were found to be less specific and have less effect than the Malinois-identified sequences, but the synthetic sequences were stronger still, having more specificity to each of the desired cell types. Even when CODA’s preferred motifs weren’t used, 92.4% of its generated sequences were still specific to a cell type, compared to 73.6% of the Malinois-identified sequences and only 40.6% of the location-identified sequences. Under far more stringent conditions for specificity, more than half of CODA’s sequences made the cut, while far fewer of the natural sequences did.
These synthetic sequences were found to be higher in useful content than the natural sequences, and rather than being mostly activatory for the desired cell types, these synthetic sequences actively repressed activation in off-target types.
To confirm their cellular findings, the researchers injected living zebrafish and mouse embryos with gene therapies that used these synthetic CREs. The therapies were found to be specific to cell type in these living animals, both before and after birth.
This represents a sea-change for researchers of gene therapies. There are always plenty of cell types for which expressing a gene therapy modification would be highly negative. If therapies that use these synthetic sequences can prevent this from happening, it bodes well for a wide variety of potential therapies, including those that target age-related diseases.
Literature
[1] Donohue, L. K., Guo, M. G., Zhao, Y., Jung, N., Bussat, R. T., Kim, D. S., … & Khavari, P. A. (2022). A cis-regulatory lexicon of DNA motif combinations mediating cell-type-specific gene regulation. Cell genomics, 2(11).
[2] Levo, M., & Segal, E. (2014). In pursuit of design principles of regulatory sequences. Nature Reviews Genetics, 15(7), 453-468.
[3] Deverman, B. E., Ravina, B. M., Bankiewicz, K. S., Paul, S. M., & Sah, D. W. (2018). Gene therapy for neurological disorders: progress and prospects. Nature Reviews Drug Discovery, 17(9), 641-659.
[4] Movva, R., Greenside, P., Marinov, G. K., Nair, S., Shrikumar, A., & Kundaje, A. (2019). Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS One, 14(6), e0218073.
[5] de Almeida, B. P., Schaub, C., Pagani, M., Secchia, S., Furlong, E. E., & Stark, A. (2024). Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature, 626(7997), 207-211.