Design and structure of overlapping regions in PCA via deep learning
Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and constru...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co., Ltd.
2025-06-01
|
Series: | Synthetic and Systems Biotechnology |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405805X24001595 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis. |
---|---|
ISSN: | 2405-805X |