Informatics and Drug Discovery | Department of Chemistry

Informatics and Drug Discovery

This course and the associated computer lab deal with Bioinformatics and Cheminformatics, applied to the search for new drugs with specific physiological effects (in silico Drug Discovery). Students will learn the general principles of structure-activity relationship modeling, docking & scoring, homology modeling, statistical learning methods and advanced data analysis. They will gain familiarity with software for structure-based and ligand-based drug discovery. Some coding and scripting will be required. At the end of the course, students will be expected to present a completed piece of software of significant utility and/or an analysis of experimental data from the published literature. Students will be encouraged to seek avenues for publication of their most significant results.


  1. Introduction
    • Drug Discovery in the Information-rich age
    • Introduction to Pattern recognition and Machine Learning
    • Supervised and unsupervised learning paradigms and examples
    • Applications potential of Machine learning in Chem- & Bioinformatics
    • Introduction to Classification and Regression methods, and types of classification and regression:
    • KNN and Linear Discriminant analysis
  2. Representation of Chemical and Biochemical Structures
    • Drug Discovery in the Information-rich age
    • Sequence Descriptors
    • Text mining
    • Representations of Molecular Structures
    • Characterizing 2D structures with Descriptors and Fingerprints
    • Searching 2D Chemical Databases
    • Chemical File Formats and SMARTS
    • Topological Indices
    • Substructural Descriptors
    • Molecular Fingerprints
    • Physicochemical Descriptors
    • Descriptors from Biological Assays
    • Representation and characterization of 3D Molecular Structures
    • Calculation of Structure Descriptors
    • Pharmacophores
    • Molecular Interaction Field Based Models
    • Local Molecular Surface Property Descriptors
    • Quantum Chemical Descriptors
    • Shape Descriptors
    • Protein Shape Comparisons
    • 3D Motif Models
    • Representation of Chemical Reactions and Databases
  3. Analysis and Visualization
    • Molecular Similarity Analysis
    • Molecular Quantum Similarity Measures
    • Cluster and Diversity analysis
    • Network graphs from Molecular Similarity
    • 3D visualization tools
    • Self-Organized Maps
    • Semantic technologies and Linked Data
  4. Mapping Structure to Response: Predictive Modeling
  • Linear Free Energy Relationships
  • Quantitative Structure-Activity Relationships (QSAR) Modeling
  • Ligand-Based and Structure-Based Virtual High Throughput Screening
  • 3D Methods - Pharmacophore Modeling and alignment
  • ADMET Models
  • Activity Cliffs
  • Structure Based Methods, docking and scoring
  • Site Similarity Approaches and Chemogenomics
  • Model Domain of Applicability assessment

5. Data Mining and Statistical Methods

  • Linear and Non-Linear Models
  • Feature selection
  • Partial Least-Squares Regression
  • Introduction to Neural Nets, Bayesian Methods and Kernel Methods
  • Support vector machines classification and regression and application to chemo & bioinformatics
  • Random forest Principal Component analysis and SVD
  • Data preprocessing and different performance measures in Classification & Regression
  • Introduction to evolutionary computing
  • Deep Learning and Convolutional Neural Nets
  • Data Fusion
  • Model Validation
  • Interpretation of Statistical Models
  • Best Practices in Predictive Cheminformatics


  1. Johann Gasteiger, Thomas Engel,Chemoinformatics: A Textbook (Wiley-VCH, 2003)
  2. Jürgen Bajorath (Editor), Chemoinformatics and Computational Chemical Biology (Methods in Molecular Biology) (Humana Press, 2004)
  3. An Introduction to Chemoinformatics by Leach & Gillet
Course Code: 
Course Credits: 
Course Level: