Abstract:
The discovery of novel pesticide molecular structures is the primary driver for overcoming pest resistance and ensuring sustainable agricultural development. Traditional computer-aided drug design methodologies often rely on restricted chemical libraries and human expertise, which significantly limits the exploration of the extensive chemical space. Although deep generative models have demonstrated considerable potential in de novo design, existing frameworks frequently encountered difficulties such as structural instability, insufficient chemical diversity, and suboptimal alignment with target-specific biological activities. This study aimed to develop an advanced computational paradigm by integrating multi-scale structural representation and latent space optimization to generate pesticide candidates characterized by high chemical rationality and potent bioactivity. A novel molecular generation model was developed by integrating latent diffusion sampling with fragment-based heterogeneous graphs. The methodology involved constructing a Heterogeneous Graph Neural Network (HGNN) synergized with a Variational Autoencoder (VAE) to map atomic-level topology and fragment-level semantics into a unified latent space. Molecular fragmentation was performed using the Breaking of Retrosynthetically Interesting Chemical Substructures (BRICS) algorithm to ensure chemical validity. The encoder utilized three Graph Convolutional Network (GCN) layers with a hidden dimension of 300 to process atomic, bond, pharmacophore, and reaction features. Subsequently, a Latent Diffusion Model (LDM) employing a one-dimensional U-Net architecture with six residual layers was introduced to implement a "noising-denoising" mechanism through 1,000 training steps, effectively alleviating mode collapse. Finally, a Prefix-tuning strategy was integrated into a four-head Transformer-based decoder to guide conditional generation toward specific targets, including insect Acetylcholinesterase (AChE) and plant Acetolactate Synthase (ALS). Systematic evaluations demonstrated the superior performance of the proposed framework. In the AChE-targeted generation task, the generated molecules achieved a validity rate of 100.00%, a novelty rate of 100.00%, and a uniqueness rate of 98.50%. On benchmark datasets QM9 and ZINC, the model attained a New/Sample metric of 64.2% and 98.5%, respectively, significantly outperforming baselines such as MolGPT and GeoBFN. Ablation studies confirmed that synergistic modeling of atom-level and fragment-level views was essential for capturing fine-grained topology and high-order semantics. The distributions of physicochemical properties, including the Logarithm of Partition Coefficient (LogP), Topological Polar Surface Area (TPSA), and Molecular Weight (MW), showed high consistency with real-world pesticides. Molecular docking revealed that 62.81% of the candidates exhibited a binding affinity lower than -7.0 kcal/mol with the AChE protein (PDB: 6XYU). Furthermore, the model reproduced critical interaction patterns with essential residues, such as Glutamic Acid 485 and Tyrosine 498, with hydrogen bond lengths ranging from 2.4 to 3.3 Å. Prefix-tuning required only 8,576 trainable parameters, significantly reducing training time while avoiding overfitting. The proposed model successfully integrated multi-scale representation and latent diffusion to enhance molecular diversity and innovation. The results indicated that the framework effectively captured target-specific structure-activity relationships while maintaining high chemical rationality. This research provides a scalable tool for targeted bioactive molecule design, offering a new paradigm to overcome data scarcity and accelerate the discovery of environment-friendly agrochemicals.