Visual and Interactive Descriptor Analysis


VIDEAN (Visual and Interactive DEscriptor ANalysis) is a software tool with the following features:

  • combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed.
  • domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory.
  • coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors.

Example 1

This dataset contains 122 volatile organic compounds with their logPliver values. This example has 17 descriptors which were selected as candidates by 4 different models.

M.J. Martínez, I. Ponzoni, M.F. Díaz , G.E. Vázquez, A.J. Soto: "Visual Analytics in Cheminformatics: User-Supervised Descriptor Selection for QSAR Methods". Journal of Cheminformatics,7:39,2015.

Example 2

This dataset has 77 high molecular weight polymers jointly with their elongation at break values. This example has 41 descriptors which were selected as candidates by 10 different models.

M.J. Martínez, I. Ponzoni, M.F. Díaz , G.E. Vázquez, A.J. Soto: "Visual Analytics in Cheminformatics: User-Supervised Descriptor Selection for QSAR Methods". Journal of Cheminformatics,7:39,2015.

Tensile Strength at Break

The dataset has 66 polymers jointly with their "tensile strength at break" values. This case study has 9 descriptors which were selected as candidates by 7 different models.

F. Cravero, M.J. Martinez, G. E. Vazquez, M. F. Díaz, I. Ponzoni: "Intelligent Systems for Predictive Modelling in Cheminformatics: QSPR Models for Material Design using Machine Learning and Visual Analytics Tools"

External data

Use your own dataset to load VIDEAN with!
If you do not know how to format your files, read this and download this.

Cases Studies at CIB-CSIC


SR - HIA (Classification)
This dataset has 202 compound jointly with their HIA values: Absorb (0) and Not Absorb (1). This case study has 5 descriptors.

SR - HIA (Regression)
This dataset has 202 compound jointly with their HIA values. This case study has 7 descriptors.

SR - BBB
This dataset has 108 compound jointly with their logBB values. This case study has 7 descriptors.

SR - ee
This dataset has 282 compound jointly with their ee values. This case study has 10 descriptors.

IC50 for inhibitors BACE1 - Model 3
This dataset has 369 compound jointly with their IC50 values. This case study has 3 descriptors.

IC50 for inhibitors BACE1 - Model 30
This dataset has 369 compound jointly with their IC50 values: Low(0), Med(1) and High(2). This case study has 24 descriptors.

IC50 for inhibitors BACE1 - Model 12 (Classification)
This dataset has 369 compound jointly with their IC50 values: Low(0), Med(1) and High(2). This case study has 23 descriptors.

IC50 for inhibitors BACE1 - Model 12 (Regression)
This dataset has 369 compound jointly with their IC50 values. This case study has 23 descriptors.

HIA (human intestinal absorption)
This dataset has 202 compound jointly with their HIA values. This case study has 28 descriptors which were selected as candidates by 10 different models.

logBB (Blood-Brain barrier penetration)
This dataset has 108 compound jointly with their logBB values. This case study has 34 descriptors which were selected as candidates by 10 different models.