chemfp.rdkit_types module

This module should not be imported directly.

It contains internal implementation details of RDKit fingerprint generation.

This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.

class chemfp.rdkit_types.FixedSizeFingerprint(fingerprint_kwargs)

Bases: RDKitBaseFingerprintType

This is a fixed-size fingerprint type

class chemfp.rdkit_types.RDKitAtomPairFingerprint_v1(fingerprint_kwargs)

Bases: RDKitBaseAtomPairFingerprintType

RDKit atom pair fingerprints, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect

The RDKit-AtomPair/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • minLength - minimum bond count for a pair (default: 1)

  • maxLength - maximum bond count for a pair (default: 30)

Note: this version was only available in ancient (pre-2012) versions of RDKit. Chemfp no longer supports those versions of RDKit.

name: str = 'RDKit-AtomPair/1'

the fingerprint name

class chemfp.rdkit_types.RDKitAtomPairFingerprint_v2(fingerprint_kwargs)

Bases: RDKitBaseAtomPairFingerprintType

RDKit atom pair fingerprints, version 2

RDKit implements two APIs to generate the AtomPair fingerprints. This RDKit-AtomPair/2 fingerprint type works with the older function-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect

The RDKit-AtomPair/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • minLength - minimum bond count for a pair (default: 1 bond)

  • maxLength - maximum bond count for a pair (default: 30, max: 63)

  • nBitsPerEntry - number of bits to use in simulating counts (default: 4)

  • includeChirality - if 1, chirality will be used in the atom invariants (default: 0)

  • use2D - if 1, use a 2D distance matrix, if 0 use the 3D matrix from the first

    set of conformers, or return an empty fingerprint if no conformers (default: 1)

  • fromAtoms - a list of atom indices which must be in the pair

You should migrate to the generator-based version 3 type described at RDKit-AtomPair/3.

name: str = 'RDKit-AtomPair/2'

the fingerprint name

class chemfp.rdkit_types.RDKitAvalonFingerprintType_v1(fingerprint_kwargs)

Bases: VariableSizeFingerprint

Avalon fingerprints

The Avalon Cheminformatics toolkit is available from https://sourceforge.net/projects/avalontoolkit/ . It is not part of the core RDKit distribution. Instead, RDKit has a compile-time option to download and include it as part of the build process.

The Avalon fingerprint are described in the supplemental information for “QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets”, Peter Gedeck, Bernhard Rohde, and Christian Bartels, J. Chem. Inf. Model., 2006, 46 (5), pp 1924-1936, DOI: 10.1021/ci050413p. The supplemental information is available from https://pubs.acs.org/doi/suppl/10.1021/ci050413p

It uses a set of feature classes which “have been fine-tuned to provide good screen-out for the set of substructure queries encounted at Novartis while limiting redundancy.” The classes are ATOM_COUNT, ATOM_SYMBOL_PATH, AUGMENTED_ATOM, AUGMENTED_BOND, HCOUNT_PAIR, HCOUNT_PATH, RING_PATH, BOND_PATH, HCOUNT_CLASS_PATH, ATOM_CLASS_PATH, RING_PATTERN, RING_SIZE_COUNTS, DEGREE_PATHS, CLASS_SPIDERS, FEATURE_PAIRS and ALL_PATTERNS.

name: str = 'RDKit-Avalon/1'

the fingerprint name

class chemfp.rdkit_types.RDKitBaseAtomPairFingerprintType(fingerprint_kwargs)

Bases: VariableSizeFingerprint

Base class for the RDKitAtomPair fingerprint types

class chemfp.rdkit_types.RDKitBaseFingerprintType(fingerprint_kwargs)

Bases: ThreadsafeFingerprinterMixin, FingerprintType

from_inchi(content: str | bytes, *, sanitize: bool = True, removeHs: bool = True, logLevel: int | None = None, treatWarningAsError: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, errors: str = 'strict')

Generate a fingerprint from an InChI string and its id

This is equivalent to calling:

mol = fptype.toolkit.parse_inchi(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph

  • logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API

  • treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error

  • delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_inchistring(content: str | bytes, *, sanitize: bool = True, removeHs: bool = True, logLevel: int | None = None, treatWarningAsError: bool = False, errors: str = 'strict')

Generate a fingerprint from an InChI string

This is equivalent to calling:

mol = fptype.toolkit.parse_inchistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph

  • logLevel (an integer, or None to disable logging completely (default: None)) – the log level for the InChI API

  • treatWarningAsError (Boolean (default: False)) – treat any InChI warnings as an error

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_molfile(content: str | bytes, *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, errors: str = 'strict')

Generate a fingerprint from a molfile

This is equivalent to calling:

mol = fptype.toolkit.parse_molfile(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph

  • strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_sdf(content: str | bytes, *, sanitize: bool = True, removeHs: bool = True, strictParsing: bool = True, includeTags: bool = True, errors: str = 'strict')

Generate a fingerprint from an SDF record

This is equivalent to calling:

mol = fptype.toolkit.parse_sdf(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • removeHs (Boolean (default: True)) – If true, remove simple hydrogens from the molecular graph

  • strictParsing (Boolean (default: True)) – If true, require stricter adherence to the SDF specification

  • includeTags (Boolean (default: True)) – if true, extract the struture data tag fields

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_smi(content: str | bytes, *, sanitize: bool = True, cxsmiles: bool = True, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, errors: str = 'strict')

Generate a fingerprint from a SMILES string and its id

This is equivalent to calling:

mol = fptype.toolkit.parse_smi(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string

  • delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_smiles(content: str | bytes, *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')

Generate a fingerprint from a SMILES string

This is equivalent to calling:

mol = fptype.toolkit.parse_smistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

from_smistring(content: str | bytes, *, sanitize: bool = True, cxsmiles: bool = True, errors: str = 'strict')

Generate a fingerprint from a SMILES string

This is equivalent to calling:

mol = fptype.toolkit.parse_smistring(content, ..., errors=errors)
fp = fptype.from_mol(mol) if (mol is not None) else None
Parameters:
  • sanitize (Boolean (default: True)) – If true, sanitize the molecule after parsing

  • cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string

  • errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a fingerprint byte string

module = <module 'chemfp.rdkit_toolkit>'
software: _OptionalStr = ...

a description of the RDKit and chemfp software packages used

class chemfp.rdkit_types.RDKitBasePatternFingerprint(fingerprint_kwargs)

Bases: VariableSizeFingerprint

class chemfp.rdkit_types.RDKitBaseTorsionFingerprintType(fingerprint_kwargs)

Bases: VariableSizeFingerprint

class chemfp.rdkit_types.RDKitFingerprintType_v1(fingerprint_kwargs)

Bases: VariableSizeFingerprint

RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint

The RDKit-Fingerprint/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • minPath - minimum number of bonds (default: 1)

  • maxPath - maximum number of bonds (default: 7)

  • nBitsPerHash - number of bits to set for each path hash (default: 2)

  • useHs - include information about the number of hydrogens on each atom? (default: True)

Note: this version is only available in ancient (pre-2014) versions of RDKit

name: str = 'RDKit-Fingerprint/1'

the fingerprint name

class chemfp.rdkit_types.RDKitFingerprintType_v2(fingerprint_kwargs)

Bases: VariableSizeFingerprint

RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 2

RDKit implements two APIs to generate the RDKit fingerprints. The RDKit-Fingerprints/2 fingerprint type works with the older function-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint

The RDKit-Fingerprint/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • minPath - minimum number of bonds (default: 1)

  • maxPath - maximum number of bonds (default: 7)

  • nBitsPerHash - number of bits to set for each path hash (default: 2)

  • useHs - include information about the number of hydrogens on each atom? (default: True)

  • branchedPaths - include both branched and unbranched paths (default: True)

  • useBondOrder - use both bond orders in the path hashes (default: True)

  • fromAtoms - a list of atom indices which must be part of the path enumeration

You should migrate to the generator-based version 3 type described at RDKitFingerprintType_v3.

name: str = 'RDKit-Fingerprint/2'

the fingerprint name

class chemfp.rdkit_types.RDKitFingerprintType_v3(fingerprint_kwargs)

Bases: VariableSizeFingerprint

RDKit’s Daylight-like fingerprint based on linear path and branched tree enumeration, version 3

RDKit implements two APIs to generate the RDKit fingerprints. The RDKit-Fingerprints/3 fingerprint type works with the newer generator-based API described at:

See https://rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html#rdkit.Chem.rdFingerprintGenerator.GetRDKitFPGenerator

Use version 2 for the older function-style API. (See RDKitFingerprintType_v2.)

The RDKit-Fingerprint/3 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • minPath - minimum number of bonds (default: 1)

  • maxPath - maximum number of bonds (default: 7)

  • countSimulation - simulate count fingerprints by setting more bits for higher counts (default: 0)

  • countBounds - list of minimum counts needed to set the corresponding bit during count simulation (default: None)

  • nBitsPerFeature - number of bits to set for each path feature (default: 2)

  • useHs - include information about the number of hydrogens on each atom? (default: True)

  • branchedPaths - include both branched and unbranched paths (default: True)

  • useBondOrder - use both bond orders in the path hashes (default: True)

  • fromAtoms - a list of atom indices which must be part of the path enumeration

name: str = 'RDKit-Fingerprint/3'

the fingerprint name

class chemfp.rdkit_types.RDKitMACCSFingerprintType_v1(fingerprint_kwargs: Dict[str, Any])

Bases: NoFingerprintParametersMixin, FixedSizeFingerprint

RDKit’s implementation of the 166 MACCS keys, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMACCSKeysFingerprint

The RDKit-MACCS166/1 fingerprints have no parameters.

This comes from an ancient version of RDKit which does not support MACCS key 44 (“OTHER”).

name: str = 'RDKit-MACCS166/1'

the fingerprint name

num_bits: int = 166
class chemfp.rdkit_types.RDKitMACCSFingerprintType_v2(fingerprint_kwargs: Dict[str, Any])

Bases: NoFingerprintParametersMixin, FixedSizeFingerprint

RDKit’s implementation of the 166 MACCS keys, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMACCSKeysFingerprint

The RDKit-MACCS166/2 fingerprints have no parameters. RDKit version added this version in late 2014 to support MACCS key 44 (“OTHER”).

name: str = 'RDKit-MACCS166/2'

the fingerprint name

num_bits: int = 166
class chemfp.rdkit_types.RDKitMorganFingerprintType_v1(fingerprint_kwargs)

Bases: VariableSizeFingerprint

RDKit Morgan (ECFP-like) fingerprints, version 1

RDKit implements two APIs to generate the Morgan fingerprints. The RDKit-Morgan/1 fingerprint type works with the old function-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html#rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect

Use version 2 for the newer generator-style API. (See RDKitMorganFingerprintType_v2.)

The RDKit-Morgan/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • radius - radius for the Morgan algorithm (default: 2)

  • useFeatures - use chemical-feature invariants (default: 0)

  • useChirality - use chirality information (default: 0)

  • useBondTypes - include bond type information (default: 1)

  • includeRedundantEnvironments - if 1, include redundant environments in the

    fingerprint (added in RDKit 2020-3) (default: 0)

  • fromAtoms - a list of atom indices to use as centers

In version 2, the radius default is 3, useChirality is renamed to includeChirality, and includeRedundantEnvironments did not appear until 2023.03.1.

When called with the equivalent parameters the two methods should give identifical fingerprints.

name: str = 'RDKit-Morgan/1'

the fingerprint name

class chemfp.rdkit_types.RDKitMorganFingerprintType_v2(fingerprint_kwargs)

Bases: VariableSizeFingerprint

RDKit Morgan (ECFP-like) fingerprints, version 2

RDKit implements two APIs to generate the Morgan fingerprints. The RDKit-Morgan/2 fingerprint type works with the newer generator-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html#rdkit.Chem.rdFingerprintGenerator.GetMorganGenerator

Use version 1 for the older function-style API. (See RDKitMorganFingerprintType_v1.)

The RDKit-Morgan/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • radius - radius for the Morgan algorithm (default: 3) (was 2 in v1!)

  • useFeatures - use chemical-feature invariants (default: 0)

  • countSimulation - simulate count fingerprints by setting more bits for higher counts (default: 0)

  • countBounds - list of minimum counts needed to set the corresponding bit during count simulation (default: None)

  • includeChirality - include chirality information in the bond invariants (default: 0)

  • useBondTypes - (default: 1)

  • includeRingMembership - if 1, include ring membership in the atom invariants (default: 1)

  • includeRedundantEnvironments - if 1, include redundant environments in the fingerprint (default: 0)

  • fromAtoms - list of atom indices to use as centers (default: None)

name: str = 'RDKit-Morgan/2'

the fingerprint name

class chemfp.rdkit_types.RDKitPatternFingerprint_v1(fingerprint_kwargs)

Bases: RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 1

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/1 fingerprint has no parameters.

Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.

name: str = 'RDKit-Pattern/1'

the fingerprint name

class chemfp.rdkit_types.RDKitPatternFingerprint_v2(fingerprint_kwargs)

Bases: RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 2

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/2 fingerprint has no parameters.

Note: this version is only available in ancient versions of RDKit. Chemfp no longer supports those versions of RDKit.

name: str = 'RDKit-Pattern/2'

the fingerprint name

class chemfp.rdkit_types.RDKitPatternFingerprint_v3(fingerprint_kwargs)

Bases: RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 3

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/3 fingerprint has no parameters. This version was released 2017.03.1.

Note: Chemfp no longer supports those versions of RDKit.

name: str = 'RDKit-Pattern/3'

the fingerprint name

class chemfp.rdkit_types.RDKitPatternFingerprint_v4(fingerprint_kwargs)

Bases: RDKitBasePatternFingerprint

RDKit’s experimental substructure screen fingerprint, version 4

See https://rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.PatternFingerprint

The RDKit-Pattern/4 fingerprint has no parameters. This version was introduced in August 2017 for the 2017.09.1 release.

name: str = 'RDKit-Pattern/4'

the fingerprint name

class chemfp.rdkit_types.RDKitSECFPFingerprintType_v1(fingerprint_kwargs)

Bases: VariableSizeFingerprint

SECFP fingerprints

The SMILES Extended Connectivity Fingerprint, as described in:

Probst, D., Reymond, J. A probabilistic molecular fingerprint for big data settings. J Cheminform 10, 66 (2018). https://doi.org/10.1186/s13321-018-0321-8 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0321-8

These are circular fingerprints which encode the circular region as a fragment SMILES, which is then hashed to produce the fingerprint bits.

The RDKit-SECFP/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • radius - analogous to the radius for the Morgan algorithm (default: 3)

  • rings - include ring membership (default: 1)

  • isomeric - use isomeric SMILES (default: 0)

  • kekulize - Kekulize the molecule and use Kekule SMILES (default: 0)

  • min_radius - minimum radius for the Morgan algorithm (default: 1)

name: str = 'RDKit-SECFP/1'

the fingerprint name

class chemfp.rdkit_types.RDKitTorsionFingerprintType_v1(fingerprint_kwargs)

Bases: RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 1

See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

The RDKit-Torsion/1 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • targetSize - number of bonds per torsion (default: 4)

Note: this version is only available in older (pre-2014) versions of RDKit Chemfp no longer supports those versions of RDKit.

name: str = 'RDKit-Torsion/1'

the fingerprint name

class chemfp.rdkit_types.RDKitTorsionFingerprintType_v2(fingerprint_kwargs)

Bases: RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 2

See https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

An implementation of Topological-torsion fingerprints, as described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

The RDKit-Torsion/2 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • targetSize - number of bonds per torsion (default: 4)

  • nBitsPerEntry - number of bits to set per entry (default: 4)

  • includeChirality - include chirality information (default: 0)

  • fromAtoms - a list of atom indices which must be part of the torsion

name: str = 'RDKit-Torsion/2'

the fingerprint name

class chemfp.rdkit_types.RDKitTorsionFingerprintType_v3(fingerprint_kwargs)

Bases: RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 3

RDKit implements two APIs to generate topological torsion fingerprints. This RDKitTorsionFingerprintType/3 fingerprint type works with the older function-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.AtomPairs.Torsions.html

The underlying algorithm is described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

This version started with RDKit 2023.03.1, which changed how includeChirality=1 works.

The RDKit-Torsion/3 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • targetSize - number of bonds per torsion (default: 4)

  • nBitsPerEntry - number of bits to set per entry (default: 4)

  • includeChirality - include chirality information (default: 0)

  • fromAtoms - a list of atom indices which must be part of the torsion

You should migrate to RDKit-Torsion/4 fingerprints. The nBitsPerEntry parameter has been replaced with count simulation, and new options added.

name: str = 'RDKit-Torsion/3'

the fingerprint name

class chemfp.rdkit_types.RDKitTorsionFingerprintType_v4(fingerprint_kwargs)

Bases: RDKitBaseTorsionFingerprintType

RDKit torsion fingerprints, version 4

RDKit implements two APIs to generate topological torsion fingerprints. This RDKitTorsionFingerprintType/4 fingerprint type works with the newer generator-based API described at:

https://rdkit.org/docs/source/rdkit.Chem.rdFingerprintGenerator.html#rdkit.Chem.rdFingerprintGenerator.GetTopologicalTorsionGenerator

The underlying algorithm is described in: R. Nilakantan, N. Bauman, J. S. Dixon, R. Venkataraghavan; “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Descriptors” JCICS 27, 82-85 (1987).

The RDKit-Torsion/4 FingerprintType parameters are:

  • fpSize - number of bits in the fingerprint (default: 2048)

  • torsionAtomCount - the number of atoms to include in the ‘torsions’ (default: 4)

  • countSimulation - simulate count fingerprints by setting more bits for higher counts (default: 1)

  • countBounds - list of minimum counts needed to set the corresponding bit during count simulation (default: None)

  • includeChirality - if 1, include chirality in the atom invariants (default: 0)

  • onlyShortestPaths - if 1, only include the shortest paths between the start and end atoms, not all paths (default: 0)

  • fromAtoms - a list of atom indices which must be … XXX what?

The “nBitsPerEntry” parameter from version 3 is no longer supported. It can be emulated using countBounds. For nBitsPerEntry=4 use [1,2,4,8] and for other values use list(range(1,nBitsPerEntry+1)).

name: str = 'RDKit-Torsion/4'

the fingerprint name

class chemfp.rdkit_types.VariableSizeFingerprint(fingerprint_kwargs)

Bases: RDKitBaseFingerprintType

This is a variable-size fingerprint type, specified by the user