Differential Reaction Fingerprint
Introduction
Reaction transformation information is very useful for reaction data mining, it can be used for reaction classification and reaction similarity search and retrieval of knowledge for synthesis design. Traditional reaction classification techniques use atom-mapping to find the transformation between reactants and products. However correctly do a reaction atom mapping is difficult and time-consuming. Reaction vector is another type of reaction fingerprint and it has been used for clustering and similarity assessment of metabolic reactions or for de novo design of synthetically feasible molecules, but it is not easy to and store in database. Differential Reaction fingerprint(DRFP) is a very useful strategy to encode and represent the transformation of a reaction(also called reaction center), it doesn't need atom-mapping to differentiate between reactants and reagents. And, it is easy to extract and store in database. This article will introduce differential reaction fingerprint and talk about it's application for reaction data mining.
Definition of Differential Reaction Fingerprint
In general, the differential reaction fingerprint is defined as the difference of the chemical fingerprints of reactants and products. DRFP is very similar to Chemaxon reaction fingerprint, but that not only contains reaction center fingerprints but also store reactant and product features. DRFP in this article can be considered as a simplified representation of Chemaxon reaction fingerprint. As shown in below figure, reaction, it is a example of DRFP(only consider one bond neighborhood). Firstly, get all one bond neighborhood circular fingerprints of reactants and products, then remove common fingerprints to obtain the reaction center fingerprints, finally encode reaction center fingerprints using an arbitrary hash function with a sufficiently low collision probability and folded into a fix-length (1024 or 2048 in Ferrocene toolkit, first half is reactants DRFP, second half is products DRFP) binary vector to generate DRFP.
Application
Reaction Classification
Reaction classification is very useful in chemical data mining, many approaches were applied for it, but most of these approaches are based atom mapping, while the atom mapping problem is known to be NP hard. DRFP is easy to calculate and it doesn't need atom mapping, and it also contains the reaction transformation in information, so it is a good idea to use DRFP for reaction classification. Reymond et al have applied it to reaction classification and reach a good performance(DRFP + multilayer perceptron (MLP) classifier).
Reaction Search
If reactions are classified, them can be used for reaction search. But if you don't want to do classification firstly or you don't familiar with reaction classification algorithms, you also can using DRFP to do a reaction search directly. Because the DRFP contains the transformation information, so if two two DRFPs are similar, the original reactions are also similar and they are the same reaction class with a high probability.