Leo He

Introduction

Cahn–Ingold–Prelog (CIP) priority system is used to unambiguously assign the handedness of stereogenic units in organic compounds. The priority of attached ligands is established by the application of ‘Sequence Rules’. It was created by three chemists: R.S. Cahn, C. Ingold, and V. Prelog, the key paper of CIP priority system was published in 1966 and was revised further in later several decades. Now, it was incorporated into the rules of the International Union of Pure and Applied Chemistry (IUPAC) Nomenclature of Organic Chemistry(BB 2013). This post will describe the CIP priority system as much detailed as possible.

Preliminary

Cahn-Ingold-Prelog (CIP) stereodescriptors

‘R’ and ‘S’, to designate the absolute configuration of tetracoordinate (quadriligant) chirality centers;
‘r’ and ‘s’, to designate the absolute configuration of pseudoasymmetric centers;
‘M’ and ‘P’, to specify the absolute configuration of an axial or planar entity using the helicity rule;
‘m’ and ‘p’, to specify the absolute configuration of a pseudoasymmetric entity using the helicity rule;
‘seqCis’ and ‘seqTrans’(some software using z and e), to describe the configuration of enantiomorphic double bonds;
‘seqcis’ and ‘seqtrans’(‘E’='seqtrans' and ‘Z’='seqcis') are used to describe ‘cis/trans-isomers’ at diastereomorphic double bonds.

Capitalized CIP stereodescriptors are variant on reflection in a mirror (i.e. ‘R’ becomes ‘S’ and ‘S’ becomes ‘R’); lower case CIP stereodescriptors are invariant on reflection in a mirror (i.e. ‘r’ remains ‘r’ and ‘s’ remains ‘s’).

The ‘E’ and ‘Z’ stereodescriptors have been classified as non-CIP stereodescriptors. The reason is that they do not distinguish between geometrically diasteromorphic double bonds whose descriptors are reflection invariant (‘common’ double bonds) from the geometrically enantiomorphic double bonds whose stereodescriptors are reflection variant. In the CIP system, reflection variant descriptors are capitalized (for example ‘R’ and ‘S’) and reflection invariant descriptors are lower-case descriptors (for example ‘r’ and ‘s’). The fact that ‘E’ and ‘Z’ are capitalized is contrary to their reflection invariant status. Hirschman and Hanson proposed to use the descriptors ‘seqcis’, ‘seqtrans’, ‘seqCis’, and ‘seqTrans’ as CIP descriptors.

Hierarchical digraphs

In order to establish the order of precedence of ligands in a stereogenic unit, the atoms of the stereogenic unit are rearranged in a hierarchical diagram, called a ‘digraph’ or ‘tree-graph’, representing the connectivity (topology) and make-up of atoms; a digraph originates from the core of the stereogenic unit and is developed by indicating the various branches representing ligands. A digraph must be established for each stereogenic unit generating several digraphs when several stereogenic units are present in a molecule.

Digraph of stereocenter 3.

Double and triple bonds

If an atom is double-bonded or triple-bonded to another atom, the double and triple bonds are split into two and three bonds respectively. (C) and (N) are duplicate atom representations of the atoms at the other end of the double or triple bond

Rings and ring systems

To correctly detect CIP, a cyclic molecule must be expanded into an acyclic digraph by traversing bonds in all possible paths starting at the stereocenter. When the traversal encounters an atom through which the current path has already passed, a duplicate atom is generated in order to keep the tree finite. A single atom of the original molecule may appear in many places (some as phantoms, some not) in the tree.

Mancude rings and ring systems

Mancude rings, i.e., rings or ring systems having the maximum number of noncumulative double bonds, are treated as Kekulé structures. For mancude heterocycles, each duplicate atom is given an atomic number that is the mean of what it would have if the double bonds were located at each of the possible positions. For mancude hydrocarbons, it is immaterial which Kekulé structure is used because ‘splitting’ the double bonds gives the same result in all cases. Without averaging the atomic number in Rule 1a, bellow two chemically equivalent Kekulé structures give different descriptor assignments at the stereocenter.

‘C-1’ is doubly bonded to one or the other of the nitrogen atoms and never to carbon, so its added duplicate atom has an atomic number of 7 (that of nitrogen). ‘C-3’ is doubly bonded either to ‘C-4’ (atomic number 6) and to ‘N-2’ (atomic number 7); so its added duplicate atom has an atomic number of 6½, as it is for ‘C-8’. But ‘C-4a’ may be doubly bonded to ‘C-4’, ‘C-5’ and ‘N-9’, so its added duplicate atom has an atomic number of 6⅓

Exploration of a hierarchical digraph

Digraphs are constructed to show the ranking of atoms according to the topological distance i.e., number of bonds, from the core of the stereogenic unit (i.e., center) and their evaluation by the Sequence Rules.

Atoms lie in spheres and atoms of equal distance from the core of the stereogenic unit are in the same sphere; spheres are identified as I, II, III, and IV.
Atoms in the nth sphere have precedence over those in the (n + 1)th sphere.
The ranking of each atom in the nth sphere depends in the first place on the ranking of atoms of the same branch in (n - 1)th sphere, and then the application of the Sequence Rules to it.
Those atoms in the nth sphere which are of equal rank with respect to those in the (n − 1)th sphere in the same branch are ranked by means of the Sequence Rules, first by the exhaustive application of Sequence Rule 1; if no decision is reached, Sequence Rule 2 is exhaustively applied, and so on.

Ranking of ligands: Application of the Sequence Rules

In general, Ligands are ranked sphere by sphere, branch by branch in a breadth-first fashion, Then two ligands are compared atom by atom, in order of that ranking. Sequence rules are applied as follows:

each rule is applied in accordance with a hierachical digraph
each rule is applied exhaustively to all ligands being compared;
the ligand that is found to have precedence (priority) at the first occurrence of a difference in a digraph retains this precedence (priority) regardless of differences that occur later in the exploration of the digraph;
precedence (priority) of an atom in a group established by a rule does not change on application of a subsequent rule.

Auxiliary descriptors

Temporary “auxiliary descriptors” are assigned solely on the basis of a given digraph for a particular stereogenic unit in question and may or may not be the “final” descriptors ultimately used to describe those centers in the end. Below shows an example where only a minority of the auxiliary descriptors are the same as the final descriptors for the corresponding atoms.

It is important to note that full digraphs are necessary for the analysis of all stereogenic units. Descriptors specified in digraphs may correspond to the final descriptors or to temporary (auxiliary) descriptors used only for ranking ligands and never appearing as final descriptors.

Generation of auxiliary descriptors must start from highest sphere, toward to root. In this way, all auxiliary descriptors in higher spheres than the one being determined are already assigned. This is sufficient, as the descriptor for an auxiliary center does not depend upon any descriptor between it and the root, as the priority of a ligand leading back to the digraph root will always be ranked by Rule 1a, with no need to consider auxiliary centers. This postulate follows from the fact that auxiliary centers are always offset from the root of a digraph, and so the path back to the root is always unique in connectivity and atomic numbers.

Pseudoasymmetry

Stereogenic units are called pseudoasymmetric (center, axis or plane) when they have distinguishable ligands ‘a’, ‘b’, ‘c’, ‘d’, two and only two of which are nonsuperposable mirror images of each other (enantiomorphic). Reflection of pseudoasymmetric centers is superimposable. These enantiomorphic ligands are represented by ‘╒ and ╕’ as designated by Prelog and Helmchen. The ‘r/s’ and ‘m/p’ stereodescriptors describing a pseudoasymmetric stereogenic unit are invariant on reflection in a mirror (for example ‘r’ remains ‘r’, and ‘s’ remains ‘s’), but are reversed by the exchange of any two ligands (‘r’ becomes ‘s’, and ‘s’ becomes ‘r’). Lower case stereodescriptors are used to describe pseudoasymmetric stereogenic units. Only when Rule 5 has been used, a pesuodommetric descriptor can be assigned because Rule 5 do a final check for enantiomorphic ligands.

Sequence rules

Rule 1a

Higher atomic number precedes lower.
Rule 1a is simple to understand, except for the special cases(mancude ring or rings systems), when a duplicate atom is involved in multiple resonance structures, a average atomic number of it should be used. It is sad to have to say that averaging the atomic number is a difficult procedure to describe, and there are not exact definitions for how to apply it. BB 2013 only mentioned several simple examples, such as benzene, pyridine or cyclopentadienyl anion. Without averaging atomic number in the Rule 1a, bellow two chemically equivalent Kekulé structures give different assignments at the stereocenter.

Rule 1b

A duplicate atom node whose corresponding nonduplicated atom node is the root or is closer to the root ranks higher than a duplicate atom node whose corresponding nonduplicated atom node is farther from the root.
Rule 1b is not sufficient.The problem is that although Rule 1b was designed to solve a problem with ring-closure duplicate nodes, the rule as stated also applies to multiple-bond duplicate nodes and Kekulé structure. To avoid this problem, a revision of Rule 1b is to assign to a multiple-bond duplicate node the distance to the root of its corresponding attached atom, not its corresponding duplicated atom.

Rule 2

Higher atomic mass number precedes lower.
Rule 2 is also not sufficient. When e one atom has an isotope indicated and one does not, and also (again) when several alternative Kekulé structures are involved. The problem is that “mass number” is always an integer -- the sum of the number of protons and neutrons in the nucleus, and can't calculate the mass number of a natural composition for the element. How to deal with this issue, where the term “mass number” is replaced with “atomic mass”. BB 2013 also mention that using atomic mass to arrange the ligands, an example is in BB 2013 Section P-92.3, it consider I precedes I¹²⁵.

Rule 3

When considering double bonds and planar tetraligand atoms ‘seqcis’ = ‘Z’ precedes ‘seqtrans’ = ‘E’ and this precedes nonstereogenic double bonds.
The descriptors ‘E’ and ‘Z’ are used to describe ‘cis/trans-isomers’ at diastereomorphic double bonds. The application of Rule 3 leads to the specification of the configuration of compounds containing sets of ‘cis’ and ‘trans’ double bonds when the direct application of Sequence Rules 1 or 2 does not permit a conclusion to be reached. Auxiliary stereodescriptors are used when direct assignment of configuration cannot be made to double bonds, so before applying Rule 3 all auxiliary descriptors should be labeled. Placement of Rule 3 before Rule 4a ensures that only enantiomorphic (seqCis and seqTrans) comparisons involving double-bonds and cumulenes with an odd number of double bonds are left to consider in Rules 4 and 5.

Rule 4a

Chiral stereogenic units precede pseudoasymmetric stereogenic units and these precede nonstereogenic units.
Rule 4a is, (R or S) > (r or s), (M or P) > (m or p), and (seqCis or seqTrans) > (seqcis or seqtrans), and that all of these have higher priority than digraph nodes with no auxiliary descriptor. The purpose of Rule 4a is to ensure that all comparisons in Rule 4b and later are of the same general type:: R vs. S, M vs. P, or seqCis vs. seqTrans in Rules 4b; r vs s or m vs. p in Rule 4c; R vs. S or M vs. P in Rule 5. In addition, application of Rule 4a guarantees that the lists of ranked descriptors that are being compared in Rule 4b are of equal length.

Rule 4b

When two ligands have different descriptor pairs, then the one with the first chosen like descriptor pairs has priority over the one with a corresponding unlike descriptor pair.

Like descriptor pairs are: ‘RR’, ‘SS’
Unlike descriptor pairs are: ‘RS’, ‘SR’

Rule 4b is by far the most difficult rule to comprehend and implement. A new methodology has recently been described by Mata and Lobo to replace that described by Prelog and Helmchen. The rule for pairing stereodescriptors is as follows: A reference descriptor for chirality centers, identified as R or S (not associated with any node of the digraph and designated here with a bold font, for example, any of R, M, or secqCis can be assigned R for the purpose of processing Rules 4b), is chosen in each ligand and is:

the one associated with the highest rank node corresponding to a chiral unit in the ligand;
the one that occurs the most in the set of equivalent highest rank nodes; or
sequentially both descriptors (R and S), if these occur in the same number in the set of equivalent highest ranked nodes:
(i) If the number of reference descriptors is different in both ligands then the ligand with one reference descriptor has priority over the ligand with two reference descriptors;
(ii) If both ligands have the same number of reference descriptors, then the reference descriptor is paired with each one of the descriptors, identified as R or S, associated with nodes corresponding to chiral units, respecting their connectivity and hierarchy in the digraph.

In this way, all discussion can be expressed in terms of “equal” or “not equal” to a reference R or S, rather than “like” vs. “unlike”. When assigning seqCis/seqTrans and M/P auxiliary descriptors, which involve multiple atoms, it is critical that an implementation assign those descriptors to the node that is closest to the root. Otherwise the second phase of Rule 4b may fail.

The application of Rule 4b is more complex than previous rules, and it can be divided into three steps. First, ligands are ranked by Rule 1 - 4a, and choose reference descriptors for ligands. Second, using reference descriptors re-rank the nodes in a way that may cross digraph branches, and the hierarchy used in the comparison of the pairs of descriptors is established. Third, the nodes are scanned in rank order for auxiliary descriptor similarity to reference descriptors.

Follow is an example for criterion 2, it is more complex than above case. For branch A, three child nodes of 4 in sphere II are equivalent, two of them are ‘S’ and the other is ‘R’. Thus, the reference descriptor is ‘S’; it is the one that occurs most in the set of equivalent highest ranked nodes. Similarly, in the right branch the reference descriptor is ‘S’. The hierarchy used in the comparison of the pairs of descriptors is established as follows. After reordering the three nodes in sphere II of the digraph (nodes bonded to ‘C-4’ in the left branch and to ‘C-6’ in the right branch), the nodes are no more equivalent. Those that form like pairs have precedence over the one that forms an unlike pair. Similarly, the ranking in branch B gives precedence to like pairs.

The reordering of the digraph is always required when applying the Sequence Rules. Before comparison according to Sequence Rule 4b, partial digraphs 1, 2, and 3 below (nodes at top of the digraph are equivalent or higher ranked than those nodes closer to the bottom of the digraph) are all valid to represent branch A. However, after comparison of sphere II only digraph 1 represents the hierarchy of the nodes.

Rule 4c

‘r’ precedes ‘s’ and ‘m’ precedes ‘p’.
If the use of Rules 4b does not decide the ranking of all ligands of a stereogenic unit, it means that there are only three possibilities: (1) There is no ligand chirality; (2) two or more ligands have identical chirality descriptors; or (3) the two ligands each have sub-branches with opposite chirality. Rule 4c takes care of case (3), where we assign r over s, and m over p.

Rule 5

An atom or group with descriptor ‘R’, ‘M’, and ‘seqCis’ has priority over its enantiomorph ‘S’, ‘P’ or ‘seqTrans’.
Rule 5 does a final check for enantiomorphic ligands. If all ligands are finally distinguished after application of Rule 5, an additional test should be done to count the number of pairs of enantiomorphic ligands. The final descriptor will be r/s, m/p, or, in the case of akenes, seqCis/seqTrans, if and only if this number is one, otherwise it will be R/S, M/P, or seqcis/seqtrans (Z/E).

A simple way of using Rule 5 is that using both R and S reference descriptors, then comparing like/unlike sequences with R descriptor and S descriptor respectively, if there is an odd number of pairs that reverses priority, two ligands of pair are enantiomorphic, otherwise they are diasteromorphic. In the procedure of Rule 5, can't directly use the lists generated when using Rule 4b, new pair lists should be detected, because priorities may have changed after application of Rule 4c.

Above is a more complex example for Rule 5, the pair lists are equal in the procedure of Rule 4b, so ligands can't be distinguished by Rule 4b. After application of Rule 4c, the priorities of ligands are changed, new lists are generated in the procedure of Rule 5, and the right-hand ligand has higher priority for both R-reference and S-reference, therefore these two ligands are diasteromorphic, the stereogenic unit is asymmetric, and the ultimate descriptor will be S, not s.

Rule 6 (proposed)

An undifferentiated reference node has priority over any other undifferentiated node
Early on in the development of the CIP System, the key paper (the original idea of Rule 6 is also come from it, see reference 4) has mentioned that for C₂, D₂, C₃ and S₄ symmetry compounds, it need additional consideration to assign the stereo descriptors. But there is no a standard CIP rule to handle these compounds, only simple spiro structure is mentioned in BB 2013. So Rule 6 was proposed by Hanson et al in 2018, and it can take care of all these cases.

After application of Rule 5, if there are two or three or four pairs of identical ligands, Rule 6 can be applied. The solution for all such cases is simply to select one node of any one of the undistinguished ligands for promotion to higher rank. Basically, by arbitrarily breaking the symmetry in this way, the problem is immediately resolved upon inspection of the digraph.

Bellow is two examples of the analysis of compounds by Rule 6, which sets Node 1 to be higher priority than Node 2. This single change decides also the priority 3 > 4, due to the presence of a ring connection from Node 3 back to Node 1 and from Node 4 back to Node 2.

After application of Rule 6, there two possibilities: (a) There are still two undistinguished ligands. Such will be the case, for example, with simple acyclic compounds, such as CH2Cl2 or CHCl3. The center remains without descriptor. (b) All ligands are distinguished. The center receives a descriptor. Such will be the case only for compounds that have rings that involve the root atom and three or more ligands. A full application of Rule 6 tests all possible promotions, though this is necessary only for certain symmetries. Any matching R and S pairs are ignored; if a descriptor remains, it is valid.

Incompleteness of CIP priority system

Although many efforts have been made to revise the CIP priority system, CIP priority system can't capture all stereochemistry differences, for some special cases. The reconstruction problem affects the use of CIP descriptors for unique naming. The distribution of CIP descriptor labels of bellow two compounds are identical although the molecules have different configurations. For homo-substituted cycloalkanes, the number of such ambiguities usually increases with ring size starting at size eight.

Software

There are many softwares can detect the CIP descriptors of chiral compounds, such as ACD/ChemSketch, MarvinSketch, ChemDraw, Biovia Draw, RDKit, Indigo, Centres. John Mayfield et al. have done a comparison for these software, more details can get from reference 3. A full rules implement is also included in Ferrocene, bellow examples are generated by Ferrocene CIP detection, anyone is interested in it, can try it on this page.

stereochemistry

Cahn-Ingold-Prelog (CIP) Priority System