chemfp.jcmapper_types module

Generate fingerprints using jCompoundMapper

This interface requires a local Java installation, one of the jCompoundMapper JAR files, the CDK jar, and the JPype package for bridging between Python and Java.

How to get started with JCompoundMapper

1) Get either jCMapperCLI.jar or jCMapperLibOnly.jar from https://jcompoundmapper.sourceforge.net/ . Alternatively a copy of jCMapperCLI.jar is available by unzipping jCMapperCLI.zip from https://github.com/dahvida/NP_Fingerprints/tree/main/Scripts/FP_calc .

  1. Download a recent version of the CDK from https://cdk.github.io/ .

3) Specify both JAR location on your CLASSPATH, with the CDK jar before the jCMapperCLI or jCMapperLibOnly. This is important as the jCMapperCLI includes old version of the CDK which chemfp does not support. By placing the CDK jar first, the newer CDK is used instead of the older one. For example, I use the CLASSPATH:

$HOME/jars/cdk-2.9.jar:$HOME/jars/jCMapperCLI.jar

jCompoundMapper depends on CDK’s old AtomContainer implementation, which is no longer the default, but can be enabled by starting the JVM with the “-DCdkUseLegacyAtomContainer=t” flag before loading the CDK. Unfortunately, CDK’s own fingerprint types do not work with the old AtomContainer implementation making it impossible to use both the CDK and jCompoundMapper fingerprint types at the same time.

4) Install JPype, which is a Python bridge to the JVM. See https://jpype.readthedocs.io/en/latest/ for details. It’s what chemfp uses to be able to work with both the CDK and JCompoundMapper. If you use pip you can install it with:

pip install JPype1

On the chemfp side, when it needs the CDK, and if the JVM isn’t already running, it first checks the CLASSPATH. If either of ‘jCMapperCLI.jar’ or jCMapperLibOnly.jar’ are present, it sets the backwards-compatibility flag before starting the JVM. This will cause the CDK to print the following warning to stderr:

[WARN] Using the old AtomContainer implementation.

The jCompoundMapper fingerprint types

The jCompoundMapper fingerprint types are available in cdk2fps using the --type option or, indirectly, with the --using flag, which gets the type string from the “#type=” header of the fingerprint file. They are also available from the Python API through chemfp.get_fingerprint_type().

The suppported fingerprint types and default type strings are:

  • Depth-First Search (DFS) - “jCMapper-DFS hashsize=4096 searchDepth=7 atomLabel=ELEMENT_NEIGHBOR”

  • All Shortest Paths (ASP): - “jCMapper-ASP hashsize=4096 searchDepth=8 atomLabel=ELEMENT_NEIGHBOR”

  • Local Path Environments (LSTAR): - “jCMapper-LSTAR hashsize=4096 searchDepth=6 atomLabel=ELEMENT_NEIGHBOR”

  • Topological Molprint-like fingerprints (RAD2D) - “jCMapper-RAD2D hashsize=4096 searchDepth=3 atomLabel=ELEMENT_SYMBOL”

  • 2-point topological pharmacophore pairs (PH2) - “jCMapper-PH2 hashsize=4096 searchDepth=8”

  • 3-point topological pharmacophore triples (PH3) - “jCMapper-PH3 hashsize=4096 searchDepth=5”

  • 2-point topological atom type pairs (AP2D) - “jCMapper-AP2D hashsize=4096 searchDepth=8 atomLabel=ELEMENT_NEIGHBOR”

  • 3-point topological atom type triplets (AT2D) - “jCMapper-AT2D hashsize=4096 searchDepth=5 atomLabel=ELEMENT_NEIGHBOR”

The generated fingerprint hashsize bits long, which must be a positive integer. Most fingerprint types takes a searchDepth which must be a non-negative integer. It specifies the maximum path length, circular environment radius, or shell radius to consider.

Most of the fingerprint types support alternative ways to assign a label to a given atom type, based on different atom and extended atom properties, which in turn affects fingerprint generation. The supported atomLabel methods are:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

For more information see “jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints” by Hinselmann, Rosenbaum, Jahn, Fechner, and Zell, J. Cheminform. 3, 3 (2011) https://doi.org/10.1186/1758-2946-3-3

class chemfp.jcmapper_types.JCMapper_AP2D_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper AP2D (topological atom pairs) fingerprint type, version 1

A topological fingerprint type using the shortest path distance between all pairs of atom labels. See jCMapper-PH2 for a variant using pharmacophore types.

The jCMapper-AP2D/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 5)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DAtomPair (which internally describes it as “2-Point Atom Pairs 2D”) followed by using a FeatureMap to get the hashed fingerprint.

name: str = 'jCMapper-AP2D/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_ASP_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper ASP (All Shortest Paths) fingerprint type, version 1

This is a DFS (depth-first search) variant where only the shortest paths between a part of atoms are used, rather than using all paths, like jCMapper-DFS does.

The jCMapper-ASP/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the maximum number of bonds to visit (default: 8)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DAllShortestPath followed by using a FeatureMap to get the hashed fingerprint.

name: str = 'jCMapper-ASP/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_AT2D_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper AT2D (topological atom triples) fingerprint type, version 1

A topological fingerprint type using the shortest path distance between all triples of atom labels. See jCMapper-PH3 for a variant using pharmacophore types.

The jCMapper-AT2D/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 5)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DAtomTriple (which internally describes it as “3-Point Atom Pairs 2D”) followed by using a FeatureMap to get the hashed fingerprint.

name: str = 'jCMapper-AT2D/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_DFS_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper DFS (Depth-First Search) fingerprint, version 1

All-path encoding of a modified graph traversal proposed by:

Ralaivola L, Swamidass SJ, Saigo H, Baldi P: Graph kernels for chemical informatics. Neural Networks. 2005, 18 (8): 1093-1110. https://doi:10.1016/j.neunet.2005.07.009

See jCMapper-ASP for a variant which only uses the shortest paths between two pairs.

The JCMapper-DFS/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the maximum number of bonds to visit (default: 8)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.DepthFirstSearch.

name: str = 'jCMapper-DFS/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_LSTAR_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper LSTAR (Local Path Environments) fingerprint type, version 1

This is a radial fingerprint similar to the the jCMapper-RAD2D fingerprint type, expect that all paths up to searchDepth are stored in a shell, and bond information is included.

The jCMapper-LSTAR/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 8)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DLocalAtomEnvironment (a radial “Local Path Environments” fingerprint) followed by using a FeatureMap to get the hashed fingerprint.

name: str = 'jCMapper-LSTAR/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_PH2_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper PH2 (pharmacophore pair) fingerprint type, version 1

A path-based pharmacophore fingerprint type using atom labels based on potential pharmacophore patterns and the shortest path distance between all pairs of atom labels.

The jCMapper-PH2/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 8)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DPharmacophore2Point (which internally describes it as “2-Point Pharmacophore Pairs 2D”) followed by using a FeatureMap to get the hashed fingerprint.

This method uses atom labels based on the following potential pharmacophore points: 1) hydrogen-bond donor; 2) hydrogen-bond acceptor; 3) positive; 4) negative; and 5) lipophilic. See the jCompoundMapper paper for details.

name: str = 'jCMapper-PH2/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_PH3_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper PH3 (3-point pharmacophore) fingerprint type, version 1

A path-based pharmacophore fingerprint type using atom labels based on potential pharmacophore patterns and the shortest path distance between all triplets of atom labels.

The jCMapper-PH3/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 5)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DPharmacophore3Point (which internally describes it as “3-Point Pharmacophore Pairs 2D”) followed by using a FeatureMap to get the hashed fingerprint.

This method uses atom labels based on the following potential pharmacophore points: 1) hydrogen-bond donor; 2) hydrogen-bond acceptor; 3) positive; 4) negative; and 5) lipophilic. See the jCompoundMapper paper for details.

name: str = 'jCMapper-PH3/1'

the fingerprint name

class chemfp.jcmapper_types.JCMapper_RAD2D_v1(fingerprint_kwargs)

Bases: JCMapperFingerprintType

jCompoundMapper RAD2D fingerprint type, version 1

This is a Molprint2D-like radial topological fingerprint based on:

Bender A, Mussa HY, Glen RC, Reiling S (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci. 44(5):1708-1718. https://doi.org/10.1021/ci0498719

The method characterizes the atoms in the radial environment up to a given depth. See jCMapper-LSTAR for a variant which includes bond information.

The jCMapper-RAD2D/1 FingerprintType parameters are:

  • hashsize - the number of bits in the fingerprint (default: 4096)

  • searchDepth - the number of bonds to visit (default: 8)

  • atomLabel - the atom label scheme to use (default: ‘ELEMENT_NEIGHBOR’)

hashsize must be a positive integer, searchDepth must be a non-negative integer, and atomLabel must be one of:

  • “CDK_ATOM_TYPES”: CDK atom types (eg, ‘C.sp2’, ‘O.minus’)

  • “ELEMENT_SYMBOL”: element symbol (eg, ‘C’, ‘O’)

  • “ELEMENT_NEIGHBOR”: element and number of heavy atom neighbors (eg, ‘C.2’)

  • “ELEMENT_NEIGHBOR_RING”: element, ring type, and number of heavy atom neighbors (eg, ‘C.a.2’)

  • “DAYLIGHT_INVARIANT”: “Atomic number, number of heavy atom neighbors, valency minus the number of connected hydrogens, atomic mass, atomic charge, number of connected hydrogens” (eg, ‘6.2.3.12.0.1’ for a carbon in a benzole ring)

  • “DAYLIGHT_INVARIANT_RING”: DAYLIGHT_INVARIANT followed by a flag if the atom is in a ring (eg, ‘6.2.3.12.0.1.1’)

This uses de.zbit.jcmapper.fingerprinters.topological.Encoding2DMolprint followed by using a FeatureMap to get the hashed fingerprint.

name: str = 'jCMapper-RAD2D/1'

the fingerprint name

chemfp.jcmapper_types.is_available() bool

Return True if jCompoundMapper fingerprints are available, else False