chemfp.types module¶

class chemfp.types.BaseFingerprintType(fingerprint_kwargs: Dict[str, Any])¶

Bases: object

base_name: str¶: the part of the name before the ‘/’

compute_fingerprint(mol: _typing.Mol) → _typing.Fingerprint¶

Compute and return the fingerprint byte string for the toolkit molecule

Parameters:: mol – a toolkit molecule
Returns:: the fingerprint as a byte string

compute_fingerprints(mols: _typing.MolIterable) → _typing.FingerprintIter¶

Compute and return the fingerprint for each toolkit molecule in an iterator

This function is a slightly optimized version of:

for mol in mols:
  yield self.compute_fingerprint(mol)

Parameters:: mols – an iterable of toolkit molecules
Returns:: a generator of fingerprints, one per molecule

fingerprint_kwargs: _typing.FingerprintKwargs¶

fingerprinter_can_fail: bool = False¶: an internal flag indicating if the fingerprinter can raise an exception when processing a molecule

get_fingerprint_family() → FingerprintFamily¶

Return the fingerprint family for this fingerprint type

Returns:: a FingerprintFamily

get_type() → str¶

Get the full type string (name and parameters) for this fingerprint type

Returns:: a canonical fingerprint type string, including its parameters

make_fingerprinter() → Callable[[str | bytes], bytes]¶

Make a ‘fingerprinter’; a callable which takes a molecule and returns a fingerprint

Returns:: a function object which takes a molecule and return a fingerprint

make_id_and_molecule_fingerprint_parser(format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → Callable[[str | bytes], Tuple[str, bytes | None]]¶

Make a function which parses molecule from a record and returns the id and computed fingerprint

This is a very specialized function, designed for performance, but it doesn’t appear to give any advantage. You likely don’t need it.

Return a function which parses a content string containing structure records in the given format to get a molecule. Use the molecule to compute the fingerprint and get its id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.

The new function will return the (id, fingerprint) pair.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).

Parameters:

format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a function which takes a content string and returns an (id, fingerprint) pair

name: str¶: the fingerprint name

parse_id_and_molecule_fingerprint(content: str | bytes, format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → Tuple[str, bytes | None]¶

Parse the first molecule record of the content then compute and return the id and fingerprint

Read the first molecule from content, which contains records in the given format. Compute its fingerprint and get the molecule id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.

Return the id and fingerprint as the (id, fingerprint) pair.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).

Parameters:

content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a pair of (id string, fingerprint byte string)

parse_molecule_fingerprint(content: str | bytes, format: str, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → bytes | None¶

Parse the first molecule record of the content then compute and return the fingerprint

Read the first molecule from content, which contains records in the given format. Compute and return its fingerprint.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for the fingerprint, and “ignore” returns None for the fingerprint without any extra message.

Parameters:

content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

the fingerprint as a byte string

read_molecule_fingerprints(source: _typing.Source, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) → _typing.FingerprintIterator¶

Read fingerprints from a structure source as a FingerprintIterator

Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a Location instance. If None then a default Location will be created.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information

Returns:

a chemfp.FingerprintIterator which iterates over the (id, fingerprint) pair

read_molecule_fingerprints_from_string(content: _typing.Content, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args_: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) → _typing.FingerprintIterator¶

Read fingerprints from structure records in a string, as a FingerprintIterator

Iterate through the format structure records in content. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a Location instance. If None then a default Location will be created.

Parameters:

content – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information

Returns:

a chemfp.FingerprintIterator which iterates over the (id, fingerprint) pair

software: _OptionalStr = None¶: a description of the software package(s) used

toolkit: _Optional[_typing.ToolkitType] = None¶: a reference to the underlying toolkit wrapper

version: _OptionalStr¶: the part of the name after the ‘/’

class chemfp.types.FingerprintFamily(fingerprint_class)¶

Bases: BaseFingerprintFamily

A FingerprintFamily is used to create a FingerprintType or get information about its parameters

Two reasons to use a FingerprintFamily (instead of using chemfp.get_fingerprint_type() or chemfp.get_fingerprint_type_from_text_settings()) are:

figure out the default arguments;
given a text settings or parameter dictionary, use the keys from the default argument keys to remove other parameters before creating a FingerprintType (otherwise the creation function will raise an exception)

All fingerprint families have the following attributes:

name - the type name, including version
toolkit - the toolkit API for the underlying chemistry toolkit, or None

property base_name: str¶: The base fingerprint name, without the version

from_kwargs(fingerprint_kwargs: Dict[str, Any] | None = None) → FingerprintType¶

Create a fingerprint type; items in the fingerprint_kwargs dictionary can override the defaults

The dictionary values are native Python values, not string-encoded values:

>>> import chemfp
>>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
>>> fptype = family()
>>> fptype.get_type()
'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
>>> fptype = family.from_kwargs({"fpSize": 1024})
>>> fptype.get_type()
'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'

The function will raise an exception for unknown arguments.

Parameters:: fingerprint_kwargs (a dictionary where the values are Python objects) – the fingerprint parameters
Returns:: an object implementing the chemfp.types.FingerprintType API

from_text_settings(settings: Dict[str, str] | None = None) → FingerprintType¶

Create a fingerprint type; settings is a dictionary with string-encoded value that can override the defaults

The dictionary values are string-encoded values, not native Python values. This function exists to help handle command-line arguments and setting files.:

>>> import chemfp
>>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
>>> fptype = family.from_text_settings()
>>> fptype.get_type()
'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1'
>>> fptype = family.from_text_settings({"fpSize": "1024"})
>>> fptype.get_type()
'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'

The function will raise an exception for unknown arguments.

Parameters:: settings (a dictionary where the values are string-encoded) – the fingerprint text settings
Returns:: an object implementing the chemfp.types.FingerprintType API

get_defaults() → Dict[str, Any]¶

Return the default parameters as a dictionary

The dictionary values are native Python objects:

>>> import chemfp
>>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
>>> family.get_defaults()
{'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}

Returns:: an dictionary of fingerprint parameters

get_kwargs_from_text_settings(settings: Dict[str, str] | None = None) → Dict[str, Any]¶

Convert a dictionary of string-encoded fingerprint parameters into native Python values

String-encoded values (“text settings”) can come from the command-line, a configuration file, a web reqest, or other text sources. The fingerprint types need actual Python values. This method converts the first to the second:

>>> import chemfp
>>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint")
>>> family.get_kwargs_from_text_settings()
{'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
>>> family.get_kwargs_from_text_settings({"fpSize": "128", "maxPath": "5"})
{'maxPath': 5, 'fpSize': 128, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}

Parameters:: settings (a dictionary where the values are string-encoded) – the fingerprint text settings
Returns:: an dictionary of (decoded) fingerprint parameters

property name: str¶: The full fingerprint name, including the version

property toolkit: _Optional[_typing.ToolkitType]¶: The chemfp toolkit wrapper for the chemistry toolkit used to generate this fingerprint, or None

property version: str¶: The fingerprint version (the part after the ‘/’ in the full name)

class chemfp.types.FingerprintType(fingerprint_kwargs: Dict[str, Any])¶

Bases: BaseFingerprintType

The base to all fingerprint types

A fingerprint type has the following public attributes:

FingerprintType.name - the fingerprint name, including the version
FingerprintType.base_name - the fingerprint name, without the version
FingerprintType.version - the fingerprint version part of the name
FingerprintType.toolkit - the chemfp wrapper module to the appropriate chemistry toolkit
FingerprintType.software - a description of the software package(s) involved
FingerprintType.num_bits - the number of bits in this fingerprint type
FingerprintType.fingerprint_kwargs - a dictionary containing any fingerprint arguments

The built-in fingerprint types are:

chemfp.openbabel_types.OpenBabelFP2FingerprintType_v1 - OpenBabel-FP2/1 - Open Babel FP2
chemfp.openbabel_types.OpenBabelFP3FingerprintType_v1 - OpenBabel-FP3/1 - Open Babel FP3
chemfp.openbabel_types.OpenBabelFP4FingerprintType_v1 - OpenBabel-FP4/1 - Open Babel FP4
chemfp.openbabel_types.OpenBabelMACCSFingerprintType_v1 - OpenBabel-MACCS/1 - Open Babel 166 MACCS keys
chemfp.openbabel_types.OpenBabelMACCSFingerprintType_v2 - OpenBabel-MACCS/2 - Open Babel 166 MACCS keys
chemfp.openbabel_types.OpenBabelECFP0FingerprintType_v1 - OpenBabel-ECFP0 - Open Babel ECFP0
chemfp.openbabel_types.OpenBabelECFP2FingerprintType_v1 - OpenBabel-ECFP2 - Open Babel ECFP2
chemfp.openbabel_types.OpenBabelECFP4FingerprintType_v1 - OpenBabel-ECFP4 - Open Babel ECFP4
chemfp.openbabel_types.OpenBabelECFP6FingerprintType_v1 - OpenBabel-ECFP6 - Open Babel ECFP6
chemfp.openbabel_types.OpenBabelECFP8FingerprintType_v1 - OpenBabel-ECFP8 - Open Babel ECFP8
chemfp.openbabel_types.OpenBabelECFP10FingerprintType_v1 - OpenBabel-ECFP10 - Open Babel ECFP10
chemfp.openbabel_patterns.SubstructOpenBabelFingerprinter_v1 - ChemFP-Substruct-OpenBabel/1 - chemfp’s 881 CACTVS/PubChem-like keys implemented with Open Babel
chemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v1 - RDMACCS-OpenBabel/1 - chemfp’s own 166 MACCS keys implemented with Open Babel (does not include key 44)
chemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v2 - RDMACCS-OpenBabel/2 - chemfp’s own 166 MACCS keys implemented with Open Babel
chemfp.openeye_types.OpenEyeCircularFingerprintType_v2 - OpenEye-Circular/2 - OEGraphSim circular fingerprints
chemfp.openeye_types.OpenEyeMACCSFingerprintType_v2 - OpenEye-MACCS166/2 - OEGraphSim 166 MACCS keys
chemfp.openeye_types.OpenEyePathFingerprintType_v2 - OpenEye-Path/2 - OEGraphSim path fingerprints
chemfp.openeye_types.OpenEyeTreeFingerprintType_v2 - OpenEye-Tree/2 - OEGraphSim tree fingerprints
chemfp.openeye_types.OpenEyeMoleculeScreenFingerprintType_v1 - OpenEye-MoleculeScreen/1 - OEGraphSim molecule screens
chemfp.openeye_types.OpenEyeSMARTSScreenFingerprintType_v1 - OpenEye-SMARTSScreen/1 - OEGraphSim SMARTS screens
chemfp.openeye_types.OpenEyeMDLScreenFingerprintType_v1 - OpenEye-MDLScreen/1 - OEGraphSim MDL screens
chemfp.openeye_patterns.SubstructOpenEyeFingerprinter_v1 - ChemFP-Substruct-OpenEye/1 - chemfp’s 881 CACTVS/PubChem-like keys implemented with OEChem
chemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v1 - RDMACCS-OpenEye/1 - chemfp’s own 166 MACCS keys implemented with OEChem (does not include key 44)
chemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v2 - RDMACCS-OpenEye/2 - chemfp’s own 166 MACCS keys implemented with OEChem
chemfp.rdkit_types.RDKitFingerprintType_v2 - RDKit-Fingerprint/2 - RDKit path and tree fingerprint
chemfp.rdkit_types.RDKitMACCSFingerprintType_v2 - RDKit-MACCS/2 - RDKit 166 MACCS keys
chemfp.rdkit_types.RDKitMorganFingerprintType_v1 - RDKit-Morgan/1 - RDKit circular fingerprints
chemfp.rdkit_types.RDKitAtomPairFingerprint_v2 - RDKit-AtomPair/2 - RDKit atom pair fingerprints
chemfp.rdkit_types.RDKitTorsionFingerprintType_v2 - RDKit-Torsion/2 - RDKit torsion fingerprints
chemfp.rdkit_types.RDKitAvalonFingerprintType_v2 - RDKit-Avalon/2 - RDKit’s interface to the Avalon fingerprints
chemfp.rdkit_types.RDKitPatternFingerprint_v4 - RDKit-Pattern/4 - RDKit’s substructure pattern fingerprints
chemfp.rdkit_types.RDKitSECFPFingerprintType_v1 - RDKit-SECFP/1 - SMILES Extended Connectivity Fingerprint from Probst et al.
chemfp.rdkit_patterns.SubstructRDKitFingerprintType_v1 - ChemFP-Substruct-RDKit/1 - chemfp’s 881 CACTVS/PubChem-like keys implemented with RDKit
chemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v1 - RDMACCS-RDKit/1 - chemfp’s own 166 MACCS keys implemented with OEChem (does not include key 44)
chemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v2 - RDMACCS-RDKit/2 - chemfp’s own 166 MACCS keys implemented with OEChem

compute_fingerprint(mol: _typing.Mol) → _typing.Fingerprint¶

Compute and return the fingerprint byte string for the toolkit molecule

Parameters:: mol – a toolkit molecule
Returns:: the fingerprint as a byte string

compute_fingerprints(mols: _typing.MolIterable) → _typing.FingerprintIter¶

Compute and return the fingerprint for each toolkit molecule in an iterator

This function is a slightly optimized version of:

for mol in mols:
  yield self.compute_fingerprint(mol)

Parameters:: mols – an iterable of toolkit molecules
Returns:: a generator of fingerprints, one per molecule

from_mol(mol: _typing.Mol) → _typing.FingerprintOrNone¶

Return the corresponding fingerprint byte string for the molecule

Deprecated since version 4.2.

This is an alias for compute_fingerprint. The deprecation warnings will start with chemfp 5.0.

In chemfp 4.0 the compute_fingerprint method was documented as deprecated, for eventual removal, in favor of from_mol.

Experience revealed that the “from_” prefix was confusing, since it’s often used to indicate a class constructor, while the goal here is to, well, compute a fingerprint.

from_mols(mols: _typing.MolIterable) → _typing.FingerprintIter¶

Return the corresponding fingerprint byte strings for a stream of molecules

Deprecated since version 4.2.

This is an alias for compute_fingerprints. The deprecation warnings will start with chemfp 5.0.

In chemfp 4.0 the compute_fingerprints method was documented as deprecated, for eventual removal, in favor of from_mols.

Experience revealed that the “from_” prefix was confusing, since it’s often used to indicate a class constructor, while the goal here is to, well, compute fingerprints from molecules.

get_metadata(sources: _Optional[_typing.FilenameOrNames] = None) → _typing.Metadata¶

Return a Metadata appropriate for the given fingerprint type.

This is most commonly used to make a chemfp.Metadata that can be passed into a chemfp.FingerprintWriter.

If sources is a string or a list of strings then it will passed to the newly created Metadata instance. It should contain filenames or other description of the fingerprint sources.

Parameters:: sources (None, a string, or list of strings) – fingerprint source filenames or other description
Returns:: a chemfp.Metadata

get_type() → str¶

Get the full type string (name and parameters) for this fingerprint type

Returns:: a canonical fingerprint type string, including its parameters

make_fingerprinter() → Callable[[str | bytes], bytes]¶

Make a ‘fingerprinter’; a callable which takes a molecule and returns a fingerprint

Returns:: a function object which takes a molecule and return a fingerprint

make_id_and_molecule_fingerprint_parser(format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → Callable[[str | bytes], Tuple[str, bytes | None]]¶

Make a function which parses molecule from a record and returns the id and computed fingerprint

This is a very specialized function, designed for performance, but it doesn’t appear to give any advantage. You likely don’t need it.

Return a function which parses a content string containing structure records in the given format to get a molecule. Use the molecule to compute the fingerprint and get its id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.

The new function will return the (id, fingerprint) pair.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).

Parameters:

format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a function which takes a content string and returns an (id, fingerprint) pair

num_bits: int¶

parse_id_and_molecule_fingerprint(content: str | bytes, format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → Tuple[str, bytes | None]¶

Parse the first molecule record of the content then compute and return the id and fingerprint

Read the first molecule from content, which contains records in the given format. Compute its fingerprint and get the molecule id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.

Return the id and fingerprint as the (id, fingerprint) pair.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).

Parameters:

content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a pair of (id string, fingerprint byte string)

parse_molecule_fingerprint(content: str | bytes, format: str, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') → bytes | None¶

Parse the first molecule record of the content then compute and return the fingerprint

Read the first molecule from content, which contains records in the given format. Compute and return its fingerprint.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for the fingerprint, and “ignore” returns None for the fingerprint without any extra message.

Parameters:

content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

the fingerprint as a byte string

read_molecule_fingerprints(source: _typing.Source, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) → _typing.FingerprintIterator¶

Read fingerprints from a structure source as a FingerprintIterator

Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a Location instance. If None then a default Location will be created.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information

Returns:

a chemfp.FingerprintIterator which iterates over the (id, fingerprint) pair

read_molecule_fingerprints_from_string(content: _typing.Content, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) → _typing.FingerprintIterator¶

Read fingerprints from structure records in a string, as a FingerprintIterator

Iterate through the format structure records in content. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.

The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for self.toolkit.read_molecules.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a Location instance. If None then a default Location will be created.

Parameters:

content – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information

Returns:

a chemfp.FingerprintIterator which iterates over the (id, fingerprint) pair

software: _OptionalStr = None¶: a description of the software package(s) used

toolkit: _typing.OptionalToolkitType = None¶: a reference to the underlying toolkit wrapper

exception chemfp.types.FingerprintTypeError(name: str | None, reason: str)¶

Bases: FingerprintValueError

Raised when the fingerprint type string is invalid.

name: str | None¶

reason: str¶

exception chemfp.types.FingerprintTypeParameterError(type: str, parsed: TokenizedType, reason: str)¶

Bases: FingerprintValueError

Raised when one of the fingerprint type parameters is invalid.

get_fingerprint_family() → FingerprintFamily¶

parsed: TokenizedType¶

reason: str¶

type: str¶

exception chemfp.types.FingerprintTypeUnknownError(family_name: str, registry: FingerprintFamilyRegistry, toolkit_name: str | None = None)¶

Bases: FingerprintValueError

Raised when the fingerprint family is unknown.

add_aliases(aliases: Dict[str, str]) → None¶: Include aliases {alias name -> chemfp family name} in the help

family_name: str¶

get_all_names() → List[Tuple[str, str | None]]¶

Return a list of available fingerprint names as (name, alias) tuples

“This is part of the internal API”

get_help() → str¶: Get a help message about likely or possible names

get_suggestions(cutoff: float = 0.6) → List[Tuple[str, str | None]]¶

Return a list of likely fingerprint names as (name, alias) tuples

“This is part of the internal API”

include_help() → None¶

set_toolkit_name(toolkit_name: str) → None¶: Set the toolkit name used to get the list of available fingerprint types

exception chemfp.types.FingerprintUnavailableError(base_name: str, toolkit_name: str, version: str, reason: str)¶

Bases: FingerprintValueError

Raised when the fingerprint family is registered, but not available.

This may be if the underlying toolkit isn’t installed or isn’t licensed.

base_name: str¶

copy() → FingerprintUnavailableError¶

name: str¶

reason: str¶

toolkit_name: str¶

version: str¶

exception chemfp.types.FingerprintValueError¶: Bases: ValueError, ChemFPError

class chemfp.types.NoFingerprintParametersMixin(fingerprint_kwargs: Dict[str, Any])¶

Bases: object

This fingerprint type does not support parameters

class chemfp.types.ThreadsafeFingerprinterMixin(fingerprint_kwargs)¶

Bases: object

This fingerprint type is thread-safe.

The make_fingerprint() method always returns the same fingerprinter.

make_fingerprinter() → Callable[[str | bytes], bytes]¶