chemfp.types module
- class chemfp.types.BaseFingerprintType(fingerprint_kwargs: Dict[str, Any])
Bases:
object- base_name: str
the part of the name before the ‘/’
- compute_fingerprint(mol: _typing.Mol) _typing.Fingerprint
Compute and return the fingerprint byte string for the toolkit molecule
- Parameters:
mol – a toolkit molecule
- Returns:
the fingerprint as a byte string
- compute_fingerprints(mols: _typing.MolIterable) _typing.FingerprintIter
Compute and return the fingerprint for each toolkit molecule in an iterator
This function is a slightly optimized version of:
for mol in mols: yield self.compute_fingerprint(mol)
- Parameters:
mols – an iterable of toolkit molecules
- Returns:
a generator of fingerprints, one per molecule
- fingerprint_kwargs: _typing.FingerprintKwargs
- fingerprinter_can_fail: bool = False
an internal flag indicating if the fingerprinter can raise an exception when processing a molecule
- get_fingerprint_family() FingerprintFamily
Return the fingerprint family for this fingerprint type
- Returns:
- get_type(complete: bool = False) str
Get the type string (name and parameters) for this fingerprint type
By default this generates a canonical type string, which may exclude some parameters. For example, if version 1.0 had parameters A and B, the version 1.1 adds parameter C, but the fingerprints don’t change when C is in the default value, then the canonical fingerprint will exclude C so the 1.0 and 1.1 type strings are compatible.
Use complete=True to include all fingerprint parameters.
- Parameters:
complete (bool) – if True, always include all fingerprint parameters
- Returns:
a canonical fingerprint type string, including its parameters
- make_fingerprinter() Callable[[str | bytes], bytes]
Make a ‘fingerprinter’; a callable which takes a molecule and returns a fingerprint
- Returns:
a function object which takes a molecule and return a fingerprint
- make_id_and_molecule_fingerprint_parser(format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') Callable[[str | bytes], Tuple[str, bytes | None]]
Make a function which parses molecule from a record and returns the id and computed fingerprint
This is a very specialized function, designed for performance, but it doesn’t appear to give any advantage. You likely don’t need it.
Return a function which parses a content string containing structure records in the given format to get a molecule. Use the molecule to compute the fingerprint and get its id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.
The new function will return the (id, fingerprint) pair.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).
- Parameters:
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
a function which takes a content string and returns an (id, fingerprint) pair
- name: str
the fingerprint name
- parse_id_and_molecule_fingerprint(content: str | bytes, format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') Tuple[str, bytes | None]
Parse the first molecule record of the content then compute and return the id and fingerprint
Read the first molecule from content, which contains records in the given format. Compute its fingerprint and get the molecule id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.
Return the id and fingerprint as the (id, fingerprint) pair.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).
- Parameters:
content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
a pair of (id string, fingerprint byte string)
- parse_molecule_fingerprint(content: str | bytes, format: str, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') bytes | None
Parse the first molecule record of the content then compute and return the fingerprint
Read the first molecule from content, which contains records in the given format. Compute and return its fingerprint.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for the fingerprint, and “ignore” returns None for the fingerprint without any extra message.
- Parameters:
content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
the fingerprint as a byte string
- read_molecule_fingerprints(source: _typing.Source, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) _typing.FingerprintIterator
Read fingerprints from a structure source as a FingerprintIterator
Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a Location instance. If None then a default Location will be created.
- Parameters:
source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information
- Returns:
a
chemfp.FingerprintIteratorwhich iterates over the (id, fingerprint) pair
- read_molecule_fingerprints_from_string(content: _typing.Content, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args_: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) _typing.FingerprintIterator
Read fingerprints from structure records in a string, as a FingerprintIterator
Iterate through the format structure records in content. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a Location instance. If None then a default Location will be created.
- Parameters:
content – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information
- Returns:
a
chemfp.FingerprintIteratorwhich iterates over the (id, fingerprint) pair
- software: _OptionalStr = None
a description of the software package(s) used
- toolkit: _Optional[_typing.ToolkitType] = None
a reference to the underlying toolkit wrapper
- version: _OptionalStr
the part of the name after the ‘/’
- class chemfp.types.FingerprintFamily(fingerprint_class)
Bases:
BaseFingerprintFamilyA FingerprintFamily is used to create a FingerprintType or get information about its parameters
Two reasons to use a FingerprintFamily (instead of using
chemfp.get_fingerprint_type()orchemfp.get_fingerprint_type_from_text_settings()) are:figure out the default arguments;
given a text settings or parameter dictionary, use the keys from the default argument keys to remove other parameters before creating a FingerprintType (otherwise the creation function will raise an exception)
All fingerprint families have the following attributes:
name - the type name, including version
toolkit - the toolkit API for the underlying chemistry toolkit, or None
- property base_name: str
The base fingerprint name, without the version
- from_kwargs(fingerprint_kwargs: Dict[str, Any] | None = None) FingerprintType
Create a fingerprint type; items in the fingerprint_kwargs dictionary can override the defaults
The dictionary values are native Python values, not string-encoded values:
>>> import chemfp >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint") >>> fptype = family() >>> fptype.get_type() 'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1' >>> fptype = family.from_kwargs({"fpSize": 1024}) >>> fptype.get_type() 'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
The function will raise an exception for unknown arguments.
- Parameters:
fingerprint_kwargs (a dictionary where the values are Python objects) – the fingerprint parameters
- Returns:
an object implementing the
chemfp.types.FingerprintTypeAPI
- from_text_settings(settings: Dict[str, str] | None = None) FingerprintType
Create a fingerprint type; settings is a dictionary with string-encoded value that can override the defaults
The dictionary values are string-encoded values, not native Python values. This function exists to help handle command-line arguments and setting files.:
>>> import chemfp >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint") >>> fptype = family.from_text_settings() >>> fptype.get_type() 'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=2048 nBitsPerHash=2 useHs=1' >>> fptype = family.from_text_settings({"fpSize": "1024"}) >>> fptype.get_type() 'RDKit-Fingerprint/2 minPath=1 maxPath=7 fpSize=1024 nBitsPerHash=2 useHs=1'
The function will raise an exception for unknown arguments.
- Parameters:
settings (a dictionary where the values are string-encoded) – the fingerprint text settings
- Returns:
an object implementing the
chemfp.types.FingerprintTypeAPI
- get_defaults() Dict[str, Any]
Return the default parameters as a dictionary
The dictionary values are native Python objects:
>>> import chemfp >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint") >>> family.get_defaults() {'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
- Returns:
an dictionary of fingerprint parameters
- get_kwargs_from_text_settings(settings: Dict[str, str] | None = None) Dict[str, Any]
Convert a dictionary of string-encoded fingerprint parameters into native Python values
String-encoded values (“text settings”) can come from the command-line, a configuration file, a web reqest, or other text sources. The fingerprint types need actual Python values. This method converts the first to the second:
>>> import chemfp >>> family = chemfp.get_fingerprint_family("RDKit-Fingerprint") >>> family.get_kwargs_from_text_settings() {'maxPath': 7, 'fpSize': 2048, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1} >>> family.get_kwargs_from_text_settings({"fpSize": "128", "maxPath": "5"}) {'maxPath': 5, 'fpSize': 128, 'nBitsPerHash': 2, 'minPath': 1, 'useHs': 1}
- Parameters:
settings (a dictionary where the values are string-encoded) – the fingerprint text settings
- Returns:
an dictionary of (decoded) fingerprint parameters
- property name: str
The full fingerprint name, including the version
- property toolkit: _Optional[_typing.ToolkitType]
The chemfp toolkit wrapper for the chemistry toolkit used to generate this fingerprint, or None
- property version: str
The fingerprint version (the part after the ‘/’ in the full name)
- class chemfp.types.FingerprintType(fingerprint_kwargs: Dict[str, Any])
Bases:
BaseFingerprintTypeThe base to all fingerprint types
A fingerprint type has the following public attributes:
FingerprintType.name- the fingerprint name, including the versionFingerprintType.base_name- the fingerprint name, without the versionFingerprintType.version- the fingerprint version part of the nameFingerprintType.toolkit- the chemfp wrapper module to the appropriate chemistry toolkitFingerprintType.software- a description of the software package(s) involvedFingerprintType.num_bits- the number of bits in this fingerprint typeFingerprintType.fingerprint_kwargs- a dictionary containing any fingerprint arguments
The built-in fingerprint types are:
chemfp.openbabel_types.OpenBabelFP2FingerprintType_v1-OpenBabel-FP2/1- Open Babel FP2chemfp.openbabel_types.OpenBabelFP3FingerprintType_v1-OpenBabel-FP3/1- Open Babel FP3chemfp.openbabel_types.OpenBabelFP4FingerprintType_v1-OpenBabel-FP4/1- Open Babel FP4chemfp.openbabel_types.OpenBabelMACCSFingerprintType_v1-OpenBabel-MACCS/1- Open Babel 166 MACCS keyschemfp.openbabel_types.OpenBabelMACCSFingerprintType_v2-OpenBabel-MACCS/2- Open Babel 166 MACCS keyschemfp.openbabel_types.OpenBabelECFP0FingerprintType_v1-OpenBabel-ECFP0- Open Babel ECFP0chemfp.openbabel_types.OpenBabelECFP2FingerprintType_v1-OpenBabel-ECFP2- Open Babel ECFP2chemfp.openbabel_types.OpenBabelECFP4FingerprintType_v1-OpenBabel-ECFP4- Open Babel ECFP4chemfp.openbabel_types.OpenBabelECFP6FingerprintType_v1-OpenBabel-ECFP6- Open Babel ECFP6chemfp.openbabel_types.OpenBabelECFP8FingerprintType_v1-OpenBabel-ECFP8- Open Babel ECFP8chemfp.openbabel_types.OpenBabelECFP10FingerprintType_v1-OpenBabel-ECFP10- Open Babel ECFP10chemfp.openbabel_patterns.SubstructOpenBabelFingerprinter_v1-ChemFP-Substruct-OpenBabel/1- chemfp’s 881 CACTVS/PubChem-like keys implemented with Open Babelchemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v1-RDMACCS-OpenBabel/1- chemfp’s own 166 MACCS keys implemented with Open Babel (does not include key 44)chemfp.openbabel_patterns.RDMACCSOpenBabelFingerprinter_v2-RDMACCS-OpenBabel/2- chemfp’s own 166 MACCS keys implemented with Open Babelchemfp.openeye_types.OpenEyeCircularFingerprintType_v2-OpenEye-Circular/2- OEGraphSim circular fingerprintschemfp.openeye_types.OpenEyeMACCSFingerprintType_v2-OpenEye-MACCS166/2- OEGraphSim 166 MACCS keyschemfp.openeye_types.OpenEyePathFingerprintType_v2-OpenEye-Path/2- OEGraphSim path fingerprintschemfp.openeye_types.OpenEyeTreeFingerprintType_v2-OpenEye-Tree/2- OEGraphSim tree fingerprintschemfp.openeye_types.OpenEyeMoleculeScreenFingerprintType_v1-OpenEye-MoleculeScreen/1- OEGraphSim molecule screenschemfp.openeye_types.OpenEyeSMARTSScreenFingerprintType_v1-OpenEye-SMARTSScreen/1- OEGraphSim SMARTS screenschemfp.openeye_types.OpenEyeMDLScreenFingerprintType_v1-OpenEye-MDLScreen/1- OEGraphSim MDL screenschemfp.openeye_types.KlekotaRothOpenEyeFingerprintType_v1-KlekotaRoth-OpenEye/1chemfp’s implementation of the 4860-bit Klekota-Roth fingerprints implemented with OEChemchemfp.openeye_patterns.SubstructOpenEyeFingerprinter_v1-ChemFP-Substruct-OpenEye/1- chemfp’s 881 CACTVS/PubChem-like keys implemented with OEChemchemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v1-RDMACCS-OpenEye/1- chemfp’s own 166 MACCS keys implemented with OEChem (does not include key 44)chemfp.openeye_patterns.RDMACCSOpenEyeFingerprinter_v2-RDMACCS-OpenEye/2- chemfp’s own 166 MACCS keys implemented with OEChemchemfp.rdkit_types.RDKitFingerprintType_v2-RDKit-Fingerprint/2- RDKit path and tree fingerprint (deprecated API)chemfp.rdkit_types.RDKitFingerprintType_v3- RDKit-Fingerprint/3 -RDKit path and tree fingerprint (new generator API)
chemfp.rdkit_types.RDKitMACCSFingerprintType_v2-RDKit-MACCS/2- RDKit 166 MACCS keyschemfp.rdkit_types.RDKitMorganFingerprintType_v1-RDKit-Morgan/1- RDKit circular fingerprints (deprecated API)chemfp.rdkit_types.RDKitMorganFingerprintType_v2-RDKit-Morgan/2- RDKit circular fingerprints (new generator API)chemfp.rdkit_types.RDKitAtomPairFingerprint_v2-RDKit-AtomPair/2- RDKit atom pair fingerprints (deprecated API)chemfp.rdkit_types.RDKitAtomPairFingerprint_v2-RDKit-AtomPair/2- RDKit atom pair fingerprints (new generator API)chemfp.rdkit_types.RDKitTorsionFingerprintType_v3-RDKit-Torsion/3- RDKit torsion fingerprints (deprecated API)chemfp.rdkit_types.RDKitTorsionFingerprintType_v4-RDKit-Torsion/4- RDKit torsion fingerprints (new generator API)chemfp.rdkit_types.RDKitTorsionFingerprintType_v2-RDKit-Torsion/2- RDKit torsion fingerprintschemfp.rdkit_types.RDKitAvalonFingerprintType_v2-RDKit-Avalon/2- RDKit’s interface to the Avalon fingerprintschemfp.rdkit_types.RDKitPatternFingerprint_v4-RDKit-Pattern/4- RDKit’s substructure pattern fingerprintschemfp.rdkit_types.RDKitSECFPFingerprintType_v2-RDKit-SECFP/2- SMILES Extended Connectivity Fingerprint from Probst et al.chemfp.rdkit_types.KlekotaRothRDKitFingerprintType_v1-KlekotaRoth-RDKit/1chemfp’s implementation of the 4860-bit Klekota-Roth fingerprints implemented with RDKitchemfp.rdkit_patterns.SubstructRDKitFingerprintType_v1-ChemFP-Substruct-RDKit/1- chemfp’s 881 CACTVS/PubChem-like keys implemented with RDKitchemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v1-RDMACCS-RDKit/1- chemfp’s own 166 MACCS keys implemented with OEChem (does not include key 44)chemfp.rdkit_patterns.RDMACCSRDKitFingerprinter_v2-RDMACCS-RDKit/2- chemfp’s own 166 MACCS keys implemented with OEChem
- compute_fingerprint(mol: _typing.Mol) _typing.Fingerprint
Compute and return the fingerprint byte string for the toolkit molecule
- Parameters:
mol – a toolkit molecule
- Returns:
the fingerprint as a byte string
- compute_fingerprints(mols: _typing.MolIterable) _typing.FingerprintIter
Compute and return the fingerprint for each toolkit molecule in an iterator
This function is a slightly optimized version of:
for mol in mols: yield self.compute_fingerprint(mol)
- Parameters:
mols – an iterable of toolkit molecules
- Returns:
a generator of fingerprints, one per molecule
- from_mol(mol: _typing.Mol) _typing.FingerprintOrNone
Return the corresponding fingerprint byte string for the molecule
Deprecated since version 4.2.
This is an alias for
compute_fingerprint(). Deprecation warnings started with chemfp 5.0. This will be removed in chemfp 5.2.In chemfp 4.0 the
compute_fingerprint()method was documented as deprecated, for eventual removal, in favor offrom_mol(). This was a mistake.Experience revealed that the
from_prefix was confusing, since it’s often used to indicate a class constructor, while the goal here is to, well, compute a fingerprint.
- from_mols(mols: _typing.MolIterable) _typing.FingerprintIter
Return the corresponding fingerprint byte strings for a stream of molecules
Deprecated since version 4.2.
This is an alias for
compute_fingerprints(). Deprecation warnings started with chemfp 5.0. This will be removed in chemfp 5.2.In chemfp 4.0 the
compute_fingerprints()method was documented as deprecated, for eventual removal, in favor offrom_mols(). This was a mistake.Experience revealed that the
from_prefix was confusing, since it’s often used to indicate a class constructor, while the goal here is to, well, compute fingerprints from molecules.
- get_metadata(sources: _Optional[_typing.FilenameOrNames] = None) _typing.Metadata
Return a Metadata appropriate for the given fingerprint type.
This is most commonly used to make a
chemfp.Metadatathat can be passed into achemfp.FingerprintWriter.If sources is a string or a list of strings then it will passed to the newly created Metadata instance. It should contain filenames or other description of the fingerprint sources.
- Parameters:
sources (None, a string, or list of strings) – fingerprint source filenames or other description
- Returns:
- get_type(complete: bool = False) str
Get the type string (name and parameters) for this fingerprint type
By default this generates a canonical type string, which may exclude some parameters. For example, if version 1.0 had parameters A and B, the version 1.1 adds parameter C, but the fingerprints don’t change when C is in the default value, then the canonical fingerprint will exclude C so the 1.0 and 1.1 type strings are compatible.
Use complete=True to include all fingerprint parameters.
- Parameters:
complete (bool) – if True, always include all fingerprint parameters
- Returns:
a canonical fingerprint type string, including its parameters
- make_fingerprinter() Callable[[str | bytes], bytes]
Make a ‘fingerprinter’; a callable which takes a molecule and returns a fingerprint
- Returns:
a function object which takes a molecule and return a fingerprint
- make_id_and_molecule_fingerprint_parser(format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') Callable[[str | bytes], Tuple[str, bytes | None]]
Make a function which parses molecule from a record and returns the id and computed fingerprint
This is a very specialized function, designed for performance, but it doesn’t appear to give any advantage. You likely don’t need it.
Return a function which parses a content string containing structure records in the given format to get a molecule. Use the molecule to compute the fingerprint and get its id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.
The new function will return the (id, fingerprint) pair.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).
- Parameters:
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
a function which takes a content string and returns an (id, fingerprint) pair
- num_bits: int
- parse_id_and_molecule_fingerprint(content: str | bytes, format: str, id_tag: str | None = None, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') Tuple[str, bytes | None]
Parse the first molecule record of the content then compute and return the id and fingerprint
Read the first molecule from content, which contains records in the given format. Compute its fingerprint and get the molecule id. For an SD record use id_tag to get the record id from the given SD tag instead of from the title line.
Return the id and fingerprint as the (id, fingerprint) pair.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for values it cannot compute, and “ignore” is like “report” but without the error message. For “report” and “ignore”, if the molecule cannot be parsed then the result will be (None, None). If the fingerprint cannot be computed then the result will be (id, None).
- Parameters:
content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
a pair of (id string, fingerprint byte string)
- parse_molecule_fingerprint(content: str | bytes, format: str, reader_args: Dict[str, Any] | None = None, errors: Literal['strict', 'report', 'ignore'] = 'strict') bytes | None
Parse the first molecule record of the content then compute and return the fingerprint
Read the first molecule from content, which contains records in the given format. Compute and return its fingerprint.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and return None for the fingerprint, and “ignore” returns None for the fingerprint without any extra message.
- Parameters:
content – the string containing at least one structure record
format (a format name string, or Format object) – the input structure format
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
- Returns:
the fingerprint as a byte string
- read_molecule_fingerprints(source: _typing.Source, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) _typing.FingerprintIterator
Read fingerprints from a structure source as a FingerprintIterator
Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a Location instance. If None then a default Location will be created.
- Parameters:
source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information
- Returns:
a
chemfp.FingerprintIteratorwhich iterates over the (id, fingerprint) pair
- read_molecule_fingerprints_from_string(content: _typing.Content, format: _OptionalStr = None, id_tag: _OptionalStr = None, reader_args: _typing.OptionalReaderArgs = None, errors: _typing.ErrorsNames = 'strict', location: _typing.OptionalLocation = None) _typing.FingerprintIterator
Read fingerprints from structure records in a string, as a FingerprintIterator
Iterate through the format structure records in content. Use the fingerprint type to compute the fingerprint. For SD files, use id_tag to get the record id from the given SD tag instead of the title line.
The reader_args dictionary parameters depend on the toolkit and format. For details see the docstring for
self.toolkit.read_molecules.The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.
The location parameter takes a Location instance. If None then a default Location will be created.
- Parameters:
content – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a Location object, or None) – object used to track parser state information
- Returns:
a
chemfp.FingerprintIteratorwhich iterates over the (id, fingerprint) pair
- software: _OptionalStr = None
a description of the software package(s) used
- toolkit: _typing.OptionalToolkitType = None
a reference to the underlying toolkit wrapper
- exception chemfp.types.FingerprintTypeError(name: str | None, reason: str)
Bases:
FingerprintValueErrorRaised when the fingerprint type string is invalid.
- name: str | None
- reason: str
- exception chemfp.types.FingerprintTypeParameterError(type: str, parsed: TokenizedType, reason: str)
Bases:
FingerprintValueErrorRaised when one of the fingerprint type parameters is invalid.
- get_fingerprint_family() FingerprintFamily
- parsed: TokenizedType
- reason: str
- type: str
- exception chemfp.types.FingerprintTypeUnknownError(family_name: str, registry: FingerprintFamilyRegistry, toolkit_name: str | None = None)
Bases:
FingerprintValueErrorRaised when the fingerprint family is unknown.
- add_aliases(aliases: Dict[str, str]) None
Include aliases {alias name -> chemfp family name} in the help
- family_name: str
- get_all_names() List[Tuple[str, str | None]]
Return a list of available fingerprint names as (name, alias) tuples
“This is part of the internal API”
- get_help() str
Get a help message about likely or possible names
- get_suggestions(cutoff: float = 0.6) List[Tuple[str, str | None]]
Return a list of likely fingerprint names as (name, alias) tuples
“This is part of the internal API”
- include_help() None
- set_toolkit_name(toolkit_name: str) None
Set the toolkit name used to get the list of available fingerprint types
Bases:
FingerprintValueErrorRaised when the fingerprint family is registered, but not available.
This may be if the underlying toolkit isn’t installed or isn’t licensed.
- exception chemfp.types.FingerprintValueError
Bases:
ValueError,ChemFPError
- class chemfp.types.NoFingerprintParametersMixin(fingerprint_kwargs: Dict[str, Any])
Bases:
objectThis fingerprint type does not support parameters