NJU-China #

Pymaker: AI modified yeast promoter #

Data is the biggest obstacle in the application of artificial intelligence. For biotechnology, acquiring data means a significant investment of manpower, time, and financial resources. In reality, often only tens of thousands of data points are accessible, while high-performance AI requires data on a scale of at least a million. To cross over the gap of data requirement, we innovatively employed the paradigm of ‘pre-train + fine-tuning’, and aimed at the key component of synthetic biology—promoters. Our AI model shows high ability in learning the inherent features of promoter sequences based on limited experimental data. Meanwhile, our AI model can successfully generate promoter sequences with specific expression rates, making highly efficient optimization to yeast promoters. Furthermore, our AI model proves that our paradigm of transfer learning has strong scalability and can be successfully expended to almost every aspect of synthetic biology, present a realistic answer for using AI in biology.

データは人工知能の適用における最大の障害です。バイオテクノロジーにおいて、データを取得することは人手、時間、財政資源の大幅な投資を意味します。現実には、数万のデータポイントしかアクセスできないことが多く、高性能AIは少なくとも百万規模のデータが必要です。データ要件のギャップを越えるために、私たちは革新的に「事前訓練+微調整」のパラダイムを採用し、合成生物学のキーコンポーネントであるプロモーターを目指しました。私たちのAIモデルは、限定的な実験データに基づいてプロモーターシーケンスの固有の特徴を学習する能力が高いことを示しています。同時に、私たちのAIモデルは特定の発現率を持つプロモーターシーケンスを成功裡に生成することができ、酵母プロモーターに対して非常に効率的な最適化を行います。さらに、私たちのAIモデルは転送学習のパラダイムが強力なスケーラビリティを持ち、合成生物学のほぼすべての側面に成功裡に拡大できることを証明しており、生物学においてAIを使用する現実的な答えを提示しています。

reference: