chemfp.cdk_toolkit module¶

The chemfp toolkit API wrapper for the CDK toolkit.

The CDK toolkit is written in Java and distributed as a jar file. The jar file must be on your CLASSPATH and you must install the JPype package so Python can use the jar.

The cdk_toolkit is also available as chemfp.cdk.

chemfp.cdk_toolkit.name¶: The string “cdk”.

chemfp.cdk_toolkit.software¶: The string used in output file metadata to describe this version of CDK. For example, “CDK/2.8”.

chemfp.cdk_toolkit.is_available¶

True if the CDK toolkit is available, otherwise False.

This mostly used as chemfp.cdk.is_available as this module cannot be imported if CDK is not available.

chemfp.cdk_toolkit.atom_pairs2d¶

The available version of the ‘CDK-AtomPairs2D’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKAtomPairs2DFingerprintType_v20 with the full type:

CDK-AtomPairs2D/2.0

chemfp.cdk_toolkit.daylight¶

The available version of the ‘CDK-Daylight’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKDaylightFingerprintType_v20 with the full type:

CDK-Daylight/2.0 size=1024 searchDepth=7 pathLimit=42000 hashPseudoAtoms=0

chemfp.cdk_toolkit.ecfp0¶

The available version of the ‘CDK-ECFP0’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKECFP0FingerprintType_v20 with the full type:

CDK-ECFP0/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.ecfp2¶

The available version of the ‘CDK-ECFP2’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKECFP2FingerprintType_v20 with the full type:

CDK-ECFP2/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.ecfp4¶

The available version of the ‘CDK-ECFP4’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKECFP4FingerprintType_v20 with the full type:

CDK-ECFP4/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.ecfp6¶

The available version of the ‘CDK-ECFP6’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKECFP6FingerprintType_v20 with the full type:

CDK-ECFP6/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.estate¶

The available version of the ‘CDK-EState’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKEStateFingerprintType_v20 with the full type:

CDK-EState/2.0

chemfp.cdk_toolkit.extended¶

The available version of the ‘CDK-Extended’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKExtendedFingerprintType_v20 with the full type:

CDK-Extended/2.0 size=1024 searchDepth=7 pathLimit=42000 hashPseudoAtoms=0

chemfp.cdk_toolkit.fcfp0¶

The available version of the ‘CDK-FCFP0’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKFCFP0FingerprintType_v20 with the full type:

CDK-FCFP0/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.fcfp2¶

The available version of the ‘CDK-FCFP2’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKFCFP2FingerprintType_v20 with the full type:

CDK-FCFP2/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.fcfp4¶

The available version of the ‘CDK-FCFP4’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKFCFP4FingerprintType_v20 with the full type:

CDK-FCFP4/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.fcfp6¶

The available version of the ‘CDK-FCFP6’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKFCFP6FingerprintType_v20 with the full type:

CDK-FCFP6/2.0 size=1024 perceiveStereochemistry=0

chemfp.cdk_toolkit.graph_only¶

The available version of the ‘CDK-GraphOnly’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKGraphOnlyFingerprintType_v20 with the full type:

CDK-GraphOnly/2.0 size=1024 searchDepth=7 pathLimit=42000 hashPseudoAtoms=0

chemfp.cdk_toolkit.hybridization¶

The available version of the ‘CDK-Hybridization’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKHybridizationFingerprintType_v20 with the full type:

CDK-Hybridization/2.0 size=1024 searchDepth=7 pathLimit=42000 hashPseudoAtoms=0

chemfp.cdk_toolkit.klekota_roth¶

The available version of the ‘CDK-KlekotaRoth’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKKlekotaRothFingerprintType_v20 with the full type:

CDK-KlekotaRoth/2.0

chemfp.cdk_toolkit.maccs¶

The available version of the ‘CDK-MACCS’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKMACCSFingerprintType_v20 with the full type:

CDK-MACCS/2.0

chemfp.cdk_toolkit.pubchem¶

The available version of the ‘CDK-Pubchem’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKPubchemFingerprintType_v20 with the full type:

CDK-Pubchem/2.0

chemfp.cdk_toolkit.shortest_path¶

The available version of the ‘CDK-ShortestPath’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKShortestPathFingerprintType_v27 with the full type:

CDK-ShortestPath/2.7 size=1024

chemfp.cdk_toolkit.substructure¶

The available version of the ‘CDK-Substructure’ fingerprint type, for example, an instance of chemfp.cdk_types.CDKSubstructureFingerprintType_v20 with the full type:

CDK-Substructure/2.0

chemfp.cdk_toolkit.add_tag(mol, tag, value)¶

Add an SD tag value to the CDK molecule

Parameters:

mol (a CDK molecule) – the molecule
tag (string) – the SD tag name
value (string) – the text for the tag

Returns:

None

chemfp.cdk_toolkit.copy_molecule(mol)¶

Return a new CDK molecule which is a copy of the given molecule

Parameters:: mol (a CDK molecule) – the molecule to copy
Returns:: a new CDK Mol instance

chemfp.cdk_toolkit.create_bytes(mol, format, id=None, writer_args=None, errors='strict', level=None)¶

Convert a CDK molecule into a structure record in the given format as a byte string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:

mol (a CDK molecule) – the molecule to use for the output
format (a format name string, or Format object) – the output structure format
id (a string, or None to use the molecule's own id) – an alternate record id
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a byte string

chemfp.cdk_toolkit.create_inchi(mol: Any, *, id: str | None = None, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict') → str | None¶

Generate an InChI string and its id from a CDK molecule

This is equivalent to calling:

create_string(mol, "inchi", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_inchikey(mol: Any, *, id: str | None = None, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict') → str | None¶

Generate an InChIKey string and its id from a CDK molecule

This is equivalent to calling:

create_string(mol, "inchikey", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

Generate an InChIKey string from a CDK molecule

This is equivalent to calling:

create_string(mol, "inchikeystring", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

Generate an InChI string from a CDK molecule

This is equivalent to calling:

create_string(mol, "inchistring", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_molfile(mol: Any, *, id: str | None = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict') → str | None¶

Generate a molfile record from a CDK molecule

This is equivalent to calling:

create_string(mol, "molfile", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_sdf(mol: Any, *, id: str | None = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict') → str | None¶

Generate an SDF record from a CDK molecule

This is equivalent to calling:

create_string(mol, "sdf", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_sdf3k(mol: Any, *, id: str | None = None, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict') → str | None¶

Generate an SDF record in V3000 format from a CDK molecule

This is equivalent to calling:

create_string(mol, "sdf3k", id=id, writer_args={...}, errors=errors)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_smi(mol: Any, *, id: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, flavor: int | str | None = 'Default', cxsmiles: bool = False, errors: str = 'strict') → str | None¶

Generate a SMILES string and its id from a CDK molecule

This is equivalent to calling:

create_string(mol, "smi", id=id, writer_args={...}, errors=errors)

Available bit flag flavors are:

‘Canonical’ = 1 (in default bit flags)
‘InChILabelling’ = 3
‘AtomAtomMap’ = 4
‘AtomicMass’ = 8 (in default bit flags)
‘UseAromaticSymbols’ = 16
‘StereoTetrahedral’ = 256 (in default bit flags)
‘StereoCisTrans’ = 512 (in default bit flags)
‘StereoExTetrahedral’ = 1024 (in default bit flags)
‘StereoExCisTrans’ = 1280 (in default bit flags)
‘StereoSquarePlanar’ = 16777216 (in default bit flags)
‘StereoTrigonalBipyramidal’ = 67108864 (in default bit flags)
‘StereoOctahedral’ = 134217728 (in default bit flags)
‘AtomicMassStrict’ = 2048
‘Stereo’ = 218105600 (in default bit flags)
‘Cx2dCoordinates’ = 4096
‘Cx3dCoordinates’ = 8192
‘CxCoordinates’ = 12288
‘CxAtomLabel’ = 32768
‘CxAtomValue’ = 65536
‘CxRadical’ = 131072
‘CxMulticenter’ = 262144
‘CxPolymer’ = 524288
‘CxFragmentGroup’ = 1048576
‘AtomAtomMapRenumber’ = 33554437
‘CxSmiles’ = 12550400
‘CxSmilesWithCoords’ = 12562688
‘Unique’ = 1 (in default bit flags)
‘Isomeric’ = 218105608 (in default bit flags)
‘Absolute’ = 218105609 (in default bit flags)
‘UniversalSmiles’ = 218105611
‘Default’ = 218105609 (in default bit flags)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
cxsmiles (Boolean (default: False)) – If true, include the appropriate ChemAxon CXSMILES extensions in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_smiles(mol: Any, *, id: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, flavor: int | str | None = 'Default', cxsmiles: bool = False, errors: str = 'strict') → str | None¶

Generate a SMILES string and its id from a CDK molecule

This is equivalent to calling:

create_string(mol, "smi", id=id, writer_args={...}, errors=errors)

Available bit flag flavors are:

‘Canonical’ = 1 (in default bit flags)
‘InChILabelling’ = 3
‘AtomAtomMap’ = 4
‘AtomicMass’ = 8 (in default bit flags)
‘UseAromaticSymbols’ = 16
‘StereoTetrahedral’ = 256 (in default bit flags)
‘StereoCisTrans’ = 512 (in default bit flags)
‘StereoExTetrahedral’ = 1024 (in default bit flags)
‘StereoExCisTrans’ = 1280 (in default bit flags)
‘StereoSquarePlanar’ = 16777216 (in default bit flags)
‘StereoTrigonalBipyramidal’ = 67108864 (in default bit flags)
‘StereoOctahedral’ = 134217728 (in default bit flags)
‘AtomicMassStrict’ = 2048
‘Stereo’ = 218105600 (in default bit flags)
‘Cx2dCoordinates’ = 4096
‘Cx3dCoordinates’ = 8192
‘CxCoordinates’ = 12288
‘CxAtomLabel’ = 32768
‘CxAtomValue’ = 65536
‘CxRadical’ = 131072
‘CxMulticenter’ = 262144
‘CxPolymer’ = 524288
‘CxFragmentGroup’ = 1048576
‘AtomAtomMapRenumber’ = 33554437
‘CxSmiles’ = 12550400
‘CxSmilesWithCoords’ = 12562688
‘Unique’ = 1 (in default bit flags)
‘Isomeric’ = 218105608 (in default bit flags)
‘Absolute’ = 218105609 (in default bit flags)
‘UniversalSmiles’ = 218105611
‘Default’ = 218105609 (in default bit flags)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
cxsmiles (Boolean (default: False)) – If true, include the appropriate ChemAxon CXSMILES extensions in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_smistring(mol: Any, *, id: str | None = None, flavor: int | str | None = 'Default', cxsmiles: bool = False, errors: str = 'strict') → str | None¶

Generate a SMILES string from a CDK molecule

This is equivalent to calling:

create_string(mol, "smistring", id=id, writer_args={...}, errors=errors)

Available bit flag flavors are:

‘Canonical’ = 1 (in default bit flags)
‘InChILabelling’ = 3
‘AtomAtomMap’ = 4
‘AtomicMass’ = 8 (in default bit flags)
‘UseAromaticSymbols’ = 16
‘StereoTetrahedral’ = 256 (in default bit flags)
‘StereoCisTrans’ = 512 (in default bit flags)
‘StereoExTetrahedral’ = 1024 (in default bit flags)
‘StereoExCisTrans’ = 1280 (in default bit flags)
‘StereoSquarePlanar’ = 16777216 (in default bit flags)
‘StereoTrigonalBipyramidal’ = 67108864 (in default bit flags)
‘StereoOctahedral’ = 134217728 (in default bit flags)
‘AtomicMassStrict’ = 2048
‘Stereo’ = 218105600 (in default bit flags)
‘Cx2dCoordinates’ = 4096
‘Cx3dCoordinates’ = 8192
‘CxCoordinates’ = 12288
‘CxAtomLabel’ = 32768
‘CxAtomValue’ = 65536
‘CxRadical’ = 131072
‘CxMulticenter’ = 262144
‘CxPolymer’ = 524288
‘CxFragmentGroup’ = 1048576
‘AtomAtomMapRenumber’ = 33554437
‘CxSmiles’ = 12550400
‘CxSmilesWithCoords’ = 12562688
‘Unique’ = 1 (in default bit flags)
‘Isomeric’ = 218105608 (in default bit flags)
‘Absolute’ = 218105609 (in default bit flags)
‘UniversalSmiles’ = 218105611
‘Default’ = 218105609 (in default bit flags)

Parameters:

mol (a CDK molecule) – a molecule object
id (None or a string (default: None)) – an alternate identifier for the output record, if relevant
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
cxsmiles (Boolean (default: False)) – If true, include the appropriate ChemAxon CXSMILES extensions in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a string, or None if errors are ignored

chemfp.cdk_toolkit.create_string(mol, format, id=None, writer_args=None, errors='strict')¶

Convert a CDK molecule into a structure record in the given format as a Unicode string

If id is not None then use it instead of the molecule’s own title. Warning: this may briefly modify the molecule, so may not be thread-safe.

Parameters:

mol (a CDK molecule) – the molecule to use for the output
format (a format name string, or Format object) – the output structure format
id (a string, or None to use the molecule's own id) – an alternate record id
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a Unicode string

chemfp.cdk_toolkit.get_format(format)¶

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not available.

Parameters:: format_name (a string) – the format name
Returns:: a list of chemfp.base_toolkit.Format objects

chemfp.cdk_toolkit.get_formats(include_unavailable=False)¶

Get the list of structure formats that CDK supports

If include_unavailable is True then also include CDK formats which aren’t available to this specific version of CDK.

Parameters:: include_unavailable (True or False) – include unavailable formats?
Returns:: a list of Format objects

chemfp.cdk_toolkit.get_id(mol)¶

Get the molecule’s id from CDK’s “cdk:Title” property

Parameters:: mol (a CDK molecule) – the molecule
Returns:: a string

chemfp.cdk_toolkit.get_input_format(format)¶

Get the named input format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an input format.

Parameters:: format_name (a string) – the format name
Returns:: a list of chemfp.base_toolkit.Format objects

chemfp.cdk_toolkit.get_input_format_from_source(source=None, format=None)¶

Get the most appropriate format given the available source and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the source to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:

source (a filename (as a string), a file object, or None to read from stdin) – the structure data source.
format (a Format(-like) object, string, or None) – format information, if known.

Returns:

a chemfp.base_toolkit.Format object

chemfp.cdk_toolkit.get_input_formats()¶

Get the list of supported CDK input formats

Returns:: a list of chemfp.base_toolkit.Format objects

chemfp.cdk_toolkit.get_output_format(format)¶

Get the named format, or raise a ValueError

This will raise a ValueError if CDK does not implement the format format_name or that format is not an output format.

Parameters:: format_name (a string) – the format name
Returns:: a list of chemfp.base_toolkit.Format objects

chemfp.cdk_toolkit.get_output_format_from_destination(destination=None, format=None)¶

Get the most appropriate format given the available destination and format information

If format is a chemfp.base_toolkit.Format then return it. If it’s a Format-like object with “name” and “compression” attributes use it to make a real Format object with the same attributes. If it’s a string then use it to create a Format object.

If format is None, use the destination to auto-detect the format. If auto-detection is not possible, assume it’s an uncompressed SMILES file.

Parameters:

destination (a filename (as a string), a file object, or None to read from stdin) – The structure data source.
format (a Format(-like) object, string, or None) – format information, if known.

Returns:

a chemfp.base_toolkit.Format object

chemfp.cdk_toolkit.get_output_formats()¶

Get the list of supported CDK output formats

Returns:: a list of chemfp.base_toolkit.Format objects

chemfp.cdk_toolkit.get_tag(mol, tag)¶

Get the named SD tag value, or None if it doesn’t exist

Parameters:

mol (a CDK molecule) – the molecule
tag (string) – the SD tag name

Returns:

a string, or None

chemfp.cdk_toolkit.get_tag_pairs(mol)¶

Get a list of all SD tag (name, value) pairs for the molecule

Parameters:: mol (a CDK molecule) – the molecule
Returns:: a list of (string name, string value) pairs

chemfp.cdk_toolkit.is_licensed()¶

Return True - CDK is always licensed

Returns:: True

chemfp.cdk_toolkit.make_id_and_molecule_parser(format, id_tag=None, reader_args=None, errors='strict')¶

Create a specialized function which takes a record and returns an (id, CDK molecule) pair

The returned function is optimized for reading many records from individual strings because it only does parameter validation once. However, I haven’t really noticed much of a performance difference between this and chemfp.cdk_toolkit.parse_id_and_molecule() so you can probably so I suggest you use that function directly instead of making a specialized function. (Let me know if making a specialized function is useful.)

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

Parameters:

format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a function of the form parser(record string) -> (id, CDK molecule)

chemfp.cdk_toolkit.open_inchi_writer(destination: None | str | BinaryIO, *, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict')¶

Open an InChI file (with InChI and optional id) to write CDK molecules

This is mostly equivalent to calling:

open_molecule_writer(destination, "inchi", writer_args={...}, errors=errors)

along with compression based on the destination filename’s extension.

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_inchi_writer_to_string(*, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict')¶

Open an InChI file (with InChI and optional id) to write CDK molecules to an in-memory string

This is equivalent to calling:

open_molecule_writer_to_string("inchi", writer_args={...}, errors=errors)

Use write_molecules_to_string() to write compressed output.

Parameters:

RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_inchikey_writer(destination: None | str | BinaryIO, *, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict')¶

Open an InChIKey file (with InChIKey and optional id) to write CDK molecules

This is mostly equivalent to calling:

open_molecule_writer(destination, "inchikey", writer_args={...}, errors=errors)

along with compression based on the destination filename’s extension.

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_inchikey_writer_to_string(*, RecMet: bool | None = None, FixedH: bool | None = None, DoNotAddH: bool | None = None, options: str | None = None, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, include_id: bool = True, errors: str = 'strict')¶

Open an InChIKey file (with InChIKey and optional id) to write CDK molecules to an in-memory string

This is equivalent to calling:

open_molecule_writer_to_string("inchikey", writer_args={...}, errors=errors)

Use write_molecules_to_string() to write compressed output.

Parameters:

RecMet (Boolean or None for the InChI default (default: None)) – Reconnect metals
FixedH (Boolean or None for the InChI default (default: None)) – Use fixed hydrogens
DoNotAddH (Boolean or None for the InChI default (default: None)) – Do not add hydrogens
options (space separated strings) – Configuration string to pass to the InChI API
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
include_id (Boolean (default: True)) – if true, include the molecule id in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer(destination=None, format=None, writer_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict', level=None)¶

Return a MoleculeWriter which can write CDK molecules to a destination.

A chemfp.base_toolkit.MoleculeWriter has the methods write_molecule, write_molecules, and write_ids_and_molecules, which are ways to write a CDK molecule, a CDK molecule iterator, or an (id, CDK molecule) pair iterator to a file.

Molecules are written to destination. The output format can be a string like “sdf.gz” or “smi”, a chemfp.base_toolkit.Format, or Format-like object with “name” and “compression” attributes, or None to auto-detect based on the destination. If auto-detection is not possible, the output will be written as uncompressed SMILES.

The writer_args dictionary parameters depend on the format. These include:

SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- isomericSmiles - True to generate isomeric SMILES
- kekuleSmiles - True to generate SMILES in Kekule form
- canonical - True to generate a canonical SMILES
- allBondsExplicit - True to write explict ‘-’ and ‘:’ bonds, even if they can be inferred; default is False
- allHsExplicit - True to write explicit hydrogen counts; default is False
- cxsmiles - True to include CXSMILES annotations; default is False

InChI and InChIKey

delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None

include_id - True or default to include the id as the second column; False has no id column

options - an options string passed to the underlying InChI library

logLevel - an integer log level

treatWarningAsError - True raises an exception on error; False or default keeps processing

SDF

includeStereo - True include stereo information; False or default does not

kekulize - True or default creates the connection table with bonds in Kekeule form

v3k - True to always export in V3000 format

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

Parameters:

destination (a filename, file object, or None to write to stdout) – the structure destination
format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_bytes(format, writer_args=None, errors='strict', location=None, level=None)¶

Return a MoleculeStringWriter which can write molecule records in the given format to a text string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a byte string.

Parameters:

format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information
level (None, a positive integer, or one of the strings 'min', 'default', or 'max') – compression level to use for compressed formats

Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.open_molecule_writer_to_string(format, writer_args=None, errors='strict', location=None)¶

Return a MoleculeStringWriter which can write molecule records in the given format to a string.

See chemfp.cdk_toolkit.open_molecule_writer() for full parameter details.

Use the writer’s chemfp.base_toolkit.MoleculeStringWriter.getvalue() to get the output as a Unicode string.

Parameters:

format (a format name string, or Format(-like) object, or None to auto-detect) – the output structure format
writer_args (a dictionary) – writer arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track writer state information

Returns:

a chemfp.base_toolkit.MoleculeStringWriter expecting CDK molecules

chemfp.cdk_toolkit.open_sdf3k_writer(destination: None | str | BinaryIO, *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶

Open an SDF file in V3000 format to write CDK molecules

This is mostly equivalent to calling:

open_molecule_writer(destination, "sdf3k", writer_args={...}, errors=errors)

along with compression based on the destination filename’s extension.

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_sdf3k_writer_to_string(*, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = True, errors: str = 'strict')¶

Open an SDF file in V3000 format to write CDK molecules to an in-memory string

This is equivalent to calling:

open_molecule_writer_to_string("sdf3k", writer_args={...}, errors=errors)

Use write_molecules_to_string() to write compressed output.

Parameters:

WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: True)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_sdf_writer(destination: None | str | BinaryIO, *, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶

Open an SDF file to write CDK molecules

This is mostly equivalent to calling:

open_molecule_writer(destination, "sdf", writer_args={...}, errors=errors)

along with compression based on the destination filename’s extension.

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_sdf_writer_to_string(*, WriteAromaticBondTypes: bool = False, WriteMajorIsotopes: bool = True, writeProperties: bool = True, WriteQueryFormatValencies: bool = False, TruncateLongData: bool = False, ProgramName: str = 'CDK', ForceWriteAs2DCoordinates: bool = False, WriteDefaultProperties: bool = True, writeV3000: bool = False, errors: str = 'strict')¶

Open an SDF file to write CDK molecules to an in-memory string

This is equivalent to calling:

open_molecule_writer_to_string("sdf", writer_args={...}, errors=errors)

Use write_molecules_to_string() to write compressed output.

Parameters:

WriteAromaticBondTypes (Boolean (default: False)) – if true, write aromatic bonds as bond type 4
WriteMajorIsotopes (Boolean (default: True)) – if true, include isotopic mass for atoms with a specified mass
writeProperties (Boolean (default: True)) – if true, write non-molecule data to the data tags
WriteQueryFormatValencies (Boolean (default: False)) – if true, write valences in the MDL query format (deprecated)
TruncateLongData (Boolean (default: False)) – if true, truncate data items longer than 200 characters
ProgramName (a string up to 8 characters long) – text to use in the ‘program name’ section of the second line
ForceWriteAs2DCoordinates (Boolean (default: False)) – if true, write coordinates as 2D
WriteDefaultProperties (Boolean (default: True)) – if true, always include zeros in the empty atom and bond block fields
writeV3000 (Boolean (default: False)) – if true, always write the record in V3000 format
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_smi_writer(destination: None | str | BinaryIO, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, flavor: int | str | None = 'Default', cxsmiles: bool = False, errors: str = 'strict')¶

Open a SMILES file to write CDK molecules

This is mostly equivalent to calling:

open_molecule_writer(destination, "smi", writer_args={...}, errors=errors)

along with compression based on the destination filename’s extension.

Available bit flag flavors are:

‘Canonical’ = 1 (in default bit flags)
‘InChILabelling’ = 3
‘AtomAtomMap’ = 4
‘AtomicMass’ = 8 (in default bit flags)
‘UseAromaticSymbols’ = 16
‘StereoTetrahedral’ = 256 (in default bit flags)
‘StereoCisTrans’ = 512 (in default bit flags)
‘StereoExTetrahedral’ = 1024 (in default bit flags)
‘StereoExCisTrans’ = 1280 (in default bit flags)
‘StereoSquarePlanar’ = 16777216 (in default bit flags)
‘StereoTrigonalBipyramidal’ = 67108864 (in default bit flags)
‘StereoOctahedral’ = 134217728 (in default bit flags)
‘AtomicMassStrict’ = 2048
‘Stereo’ = 218105600 (in default bit flags)
‘Cx2dCoordinates’ = 4096
‘Cx3dCoordinates’ = 8192
‘CxCoordinates’ = 12288
‘CxAtomLabel’ = 32768
‘CxAtomValue’ = 65536
‘CxRadical’ = 131072
‘CxMulticenter’ = 262144
‘CxPolymer’ = 524288
‘CxFragmentGroup’ = 1048576
‘AtomAtomMapRenumber’ = 33554437
‘CxSmiles’ = 12550400
‘CxSmilesWithCoords’ = 12562688
‘Unique’ = 1 (in default bit flags)
‘Isomeric’ = 218105608 (in default bit flags)
‘Absolute’ = 218105609 (in default bit flags)
‘UniversalSmiles’ = 218105611
‘Default’ = 218105609 (in default bit flags)

Parameters:

destination (None, a filename string, or a file-like object) – where to write the molecules
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
cxsmiles (Boolean (default: False)) – If true, include the appropriate ChemAxon CXSMILES extensions in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.open_smi_writer_to_string(*, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = None, flavor: int | str | None = 'Default', cxsmiles: bool = False, errors: str = 'strict')¶

Open a SMILES file to write CDK molecules to an in-memory string

This is equivalent to calling:

open_molecule_writer_to_string("smi", writer_args={...}, errors=errors)

Use write_molecules_to_string() to write compressed output.

Available bit flag flavors are:

‘Canonical’ = 1 (in default bit flags)
‘InChILabelling’ = 3
‘AtomAtomMap’ = 4
‘AtomicMass’ = 8 (in default bit flags)
‘UseAromaticSymbols’ = 16
‘StereoTetrahedral’ = 256 (in default bit flags)
‘StereoCisTrans’ = 512 (in default bit flags)
‘StereoExTetrahedral’ = 1024 (in default bit flags)
‘StereoExCisTrans’ = 1280 (in default bit flags)
‘StereoSquarePlanar’ = 16777216 (in default bit flags)
‘StereoTrigonalBipyramidal’ = 67108864 (in default bit flags)
‘StereoOctahedral’ = 134217728 (in default bit flags)
‘AtomicMassStrict’ = 2048
‘Stereo’ = 218105600 (in default bit flags)
‘Cx2dCoordinates’ = 4096
‘Cx3dCoordinates’ = 8192
‘CxCoordinates’ = 12288
‘CxAtomLabel’ = 32768
‘CxAtomValue’ = 65536
‘CxRadical’ = 131072
‘CxMulticenter’ = 262144
‘CxPolymer’ = 524288
‘CxFragmentGroup’ = 1048576
‘AtomAtomMapRenumber’ = 33554437
‘CxSmiles’ = 12550400
‘CxSmilesWithCoords’ = 12562688
‘Unique’ = 1 (in default bit flags)
‘Isomeric’ = 218105608 (in default bit flags)
‘Absolute’ = 218105609 (in default bit flags)
‘UniversalSmiles’ = 218105611
‘Default’ = 218105609 (in default bit flags)

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: None)) – The separator between the SMILES and the id
flavor (None, integer or string with "|"- or ","-separated terms (default: "Default")) – Output flavor bit flags
cxsmiles (Boolean (default: False)) – If true, include the appropriate ChemAxon CXSMILES extensions in the output
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeWriter expecting CDK molecules

chemfp.cdk_toolkit.parse_id_and_molecule(content, format, id_tag=None, reader_args=None, errors='strict')¶

Parse the first structure record from content and return the (id, CDK molecule) pair.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters.

See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_molecule() if just want the CDK molecule and not the the (id, CDK molecule) pair.

Parameters:

content (a string) – the string containing a structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

an (id, CDK molecule) pair

chemfp.cdk_toolkit.parse_inchi(content: str | bytes, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Parse an InChI string and its id using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "inchi", reader_args={...}, errors=errors)

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_inchistring(content: str | bytes, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Parse an InChI string using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "inchistring", reader_args={...}, errors=errors)

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_molecule(content, format, id_tag=None, reader_args=None, errors='strict')¶

Parse the first structure record from the content string and return a CDK molecule.

content is a string containing a single structure record in format format. (Additional records are ignored). See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.parse_id_and_molecule() if you want the (id, CDK molecule) pair instead of just the molecule.

Parameters:

content (a string) – the string containing a structure record
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a CDK molecule

chemfp.cdk_toolkit.parse_molfile(content: str | bytes, *, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Parse a molfile record using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "molfile", reader_args={...}, errors=errors)

Parameters:

AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_sdf(content: str | bytes, *, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Parse an SDF record using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "sdf", reader_args={...}, errors=errors)

Parameters:

AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_smi(content: str | bytes, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Parse a SMILES string and its id using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "smi", reader_args={...}, errors=errors)

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_smiles(content: str | bytes, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Parse a SMILES string and its id using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "smi", reader_args={...}, errors=errors)

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.parse_smistring(content: str | bytes, *, kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Parse a SMILES string using the CDK toolkit

This is equivalent to calling:

parse_molecule(content, "smistring", reader_args={...}, errors=errors)

Parameters:

kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a CDK molecule object

chemfp.cdk_toolkit.read_csv_ids_and_molecules(source, *, id_column=1, mol_column=2, dialect=None, has_header=True, compression='auto', format='smi', id_tag=None, reader_args=None, errors='report', csv_errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶

Read ids and molecules from column(s) of a CSV file using CDK.

Read from source, which may be a filename, a file-like object, or None to read from stdin.

Use id_column and mol_column to specify the columns containing the record identifier and molecule record. By default the identifiers come from column 1 (the first column) and the molecules from column 2 (the second column). Columns can be specified by integer position (starting with 1), or by a string matching the title from the header line. If id_column is None then the molecule id will come from parsing the molecule record.

Use dialect to specify the type of CSV file. The default of None infers the dialect from the filename extension; *.csv for comma-separated, and *.tsv for tab-separated. The dialect can be specified directly as “csv” or “tsv”, as a registered Python csv dialect at https://docs.python.org/3/library/csv.html (though “excel” is the same as “csv” and “excel-tab” is the same as “tsv”), or as a csv.Dialect or a .class:CSVDialect instance.

If has_header is True then the first line/record contains column titles, and if False then there are no column titles.

Use compression to specify how the file compression format. The default “auto” uses the filename extension. Other options are “gz” and “zst”, or the empty string “” to mean no compresssion.

Use format to specify the structure format for how to parse the molecule column. The default of ‘smi’ will parse it as a SMILES string and, if id_column=None, will also parse any identifier.

The id_tag and reader_args arguments contain additional format configuration parameters.

The errors and csv_errors describe how to handle failures in molecule parsing and CSV parsing, respectively. The default is to report molecule parse failures to stderr, and to stop parsing if a CSV row does not contain enough columns.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

The encoding and encoding_errors are strings describing the input file character encoding, and how to handle decoding errors. See https://docs.python.org/3/library/codecs.html#error-handlers and https://docs.python.org/3/library/codecs.html#error-handlers for details.

Parameters:

source (a filename, file object, or None to read from stdin) – the CSV source
id_column (integer position (starting from 1), string, or None) – the column position or column title containing the identifier
mol_column (integer position (starting from 1), string) – the column position or column title containing the structure record
dialect (None, a string name, or a Dialect instance) – the CSV dialect
has_header (bool) – True if the first record contains titles, False of it does not
compression (string or None) – file compression format
format (a format name string, or Format object) – the molecule structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle molecule parse errors
csv_errors (one of "strict", "report", or "ignore") – specify how to handle CSV errors
location (a chemfp.io.Location object, or None) – object used to track parser state information
encoding (string) – the name of the file’s character encoding
encoding_errors (string) – the method used handle decoding errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.read_ids_and_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶

Return an iterator that reads (id, CDK molecule) pairs from a structure file

See chemfp.cdk_toolkit.read_molecules() for full parameter details. The major difference is that this returns an iterator of (id, CDK molecule) pairs instead of just the molecules.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.read_ids_and_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶

Return an iterator that reads (id, CDK molecule) pairs from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_molecules_from_string() if you just want to read the CDK molecules instead of (id, molecule) pairs.

Parameters:

content (a string) – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating (id, CDK molecule) pairs

chemfp.cdk_toolkit.read_inchi_ids_and_molecules(source: None | str | BinaryIO, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from an InChI file (with InChI and optional id) using the CDK toolkit

This is mostly equivalent to calling:

read_ids_and_molecules(source, "inchi", reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_inchi_ids_and_molecules_from_string(content: str | bytes, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from a string containing an InChI file (with InChI and optional id) using the CDK toolkit

This is equivalent to calling:

read_ids_and_molecules_from_string(content, "inchi", reader_args={...}, errors=errors)

Use read_ids_and_molecules_from_string() if the content is compressed.

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_inchi_molecules(source: None | str | BinaryIO, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Read molecules from an InChI file (with InChI and optional id) using the CDK toolkit

This is mostly equivalent to calling:

read_molecules(source, "inchi", reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_inchi_molecules_from_string(content: str | bytes, *, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', prepare: bool = True, errors: str = 'strict')¶

Read molecules from a string containing an InChI file (with InChI and optional id) using the CDK toolkit

This is equivalent to calling:

read_molecules_from_string(content, "inchi", reader_args={...}, errors=errors)

Use read_molecules_from_string() if the content is compressed.

Parameters:

delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_molecules(source=None, format=None, id_tag=None, reader_args=None, errors='strict', location=None, encoding='utf8', encoding_errors='strict')¶

Return an iterator that reads CDK molecules from a structure file

Iterate through the format structure records in source. If format is None then auto-detect the format based on the source. For SD files, use id_tag to get the record id from the given SD tag instead of the title line. (read_molecules() will ignore the id_tag. It exists to make it easier to switch between reader functions.)

Note: the reader returns a new CDK molecule each time.

The reader_args dictionary parameters depend on the format. These include:

SMILES
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- has_header - True or False
- sanitize - True or default sanitizes; False for unsanitized processing
InChI
- delimiter - one of “tab”, “space”, “to-eol”, the space or tab characters, or None
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- logLevel - an integer log level
- treatWarningAsError - True raises an exception on error; False or default keeps processing
SDF
- sanitize - True or default sanitizes; False for unsanitized processing
- removeHs - True or default removes explicit hydrogens; False leaves them in the structure
- strictParsing - True or default for strict parsing; False for lenient parsing

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

The location parameter takes a chemfp.io.Location instance. If None then a default Location will be created.

See chemfp.cdk_toolkit.read_ids_and_molecules() if you want (id, molecule) pairs instead of just the molecules.

Parameters:

source (a filename, file object, or None to read from stdin) – the structure source
format (a format name string, or Format object, or None to auto-detect) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader parameters passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_molecules_from_string(content, format, id_tag=None, reader_args=None, errors='strict', location=None)¶

Return an iterator that reads CDK molecules from a string containing structure records

content is a string containing 0 or more records in the format format. See chemfp.cdk_toolkit.read_molecules() for details about the other parameters. See chemfp.cdk_toolkit.read_ids_and_molecules_from_string() if you want to read (id, CDK molecule) pairs instead of just molecules.

Parameters:

content (a string) – the string containing structure records
format (a format name string, or Format object) – the input structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary) – reader arguments passed to the underlying toolkit
errors (one of "strict", "report", or "ignore") – specify how to handle errors
location (a chemfp.io.Location object, or None) – object used to track parser state information

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_sdf_ids_and_molecules(source: None | str | BinaryIO, *, id_tag: None | str = None, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from an SDF file using the CDK toolkit

This is mostly equivalent to calling:

read_ids_and_molecules(source, "sdf", id_tag=id_tag, reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

id_tag (a string, or None to use the title) – get the id from the named data item instead of using the record title
AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_sdf_ids_and_molecules_from_string(content: str | bytes, *, id_tag: None | str = None, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from a string containing an SDF file using the CDK toolkit

This is equivalent to calling:

read_ids_and_molecules_from_string(content, "sdf", id_tag=id_tag, reader_args={...}, errors=errors)

Use read_ids_and_molecules_from_string() if the content is compressed.

Parameters:

id_tag (a string, or None to use the title) – get the id from the named data item instead of using the record title
AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_sdf_molecules(source: None | str | BinaryIO, *, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Read molecules from an SDF file using the CDK toolkit

This is mostly equivalent to calling:

read_molecules(source, "sdf", reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_sdf_molecules_from_string(content: str | bytes, *, AddStereo0d: bool = True, AddStereoElements: bool = True, InterpretHydrogenIsotopes: bool = True, ForceReadAs3DCoordinates: bool = False, mode: Literal['RELAXED', 'STRICT'] = 'RELAXED', hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', prepare: bool = True, errors: str = 'strict')¶

Read molecules from a string containing an SDF file using the CDK toolkit

This is equivalent to calling:

read_molecules_from_string(content, "sdf", reader_args={...}, errors=errors)

Use read_molecules_from_string() if the content is compressed.

Parameters:

AddStereo0d (Boolean (default: True)) – if true, create stereo from parity value when no coordinates
AddStereoElements (Boolean (default: True)) – if true, detect and create IStereoElements
InterpretHydrogenIsotopes (Boolean (default: True)) – if true, interpret D and T as hydrogen isotopes
ForceReadAs3DCoordinates (Boolean (default: False)) – if true, always interpret coordinates as 3D
mode ('RELAXED' will attempt to recover, 'STRICT' will not) – strictness mode when parsing a record
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_smi_ids_and_molecules(source: None | str | BinaryIO, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from a SMILES file using the CDK toolkit

This is mostly equivalent to calling:

read_ids_and_molecules(source, "smi", reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_smi_ids_and_molecules_from_string(content: str | bytes, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Read ids and molecules from a string containing a SMILES file using the CDK toolkit

This is equivalent to calling:

read_ids_and_molecules_from_string(content, "smi", reader_args={...}, errors=errors)

Use read_ids_and_molecules_from_string() if the content is compressed.

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.IdAndMoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_smi_molecules(source: None | str | BinaryIO, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Read molecules from a SMILES file using the CDK toolkit

This is mostly equivalent to calling:

read_molecules(source, "smi", reader_args={...}, errors=errors)

along with decompression based on the source filename’s extension.

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.read_smi_molecules_from_string(content: str | bytes, *, has_header: bool = False, delimiter: Literal['to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', ' ', '\t'] | None = 'to-eol', implementation: Literal['cdk', 'chemfp'] | None = 'cdk', kekulise: bool = True, cxsmiles: bool = True, hydrogens: Literal['as-is', 'make-explicit', 'make-implicit', 'make-nonchiral-implicit'] = 'as-is', prepare: bool = True, errors: str = 'strict')¶

Read molecules from a string containing a SMILES file using the CDK toolkit

This is equivalent to calling:

read_molecules_from_string(content, "smi", reader_args={...}, errors=errors)

Use read_molecules_from_string() if the content is compressed.

Parameters:

has_header (Boolean (default: False)) – If true, treat the first line of the SMILES file as a header
delimiter (One of None, 'to-eol', 'space', 'tab', 'comma', 'whitespace', 'native', or the space or tab characters (default: "to-eol")) – The separator between the SMILES and the id
implementation (either 'cdk' or 'chemfp') – use CDK or chemfp to identify records
kekulise (Boolean (default: True)) – if true, ensure a valid Kekule intepretation exists
cxsmiles (Boolean (default: True)) – If true, look for ChemAxon CXSMILES extensions after the SMILES string (only works with ‘chemfp’ implementation)
hydrogens (One of 'as-is', 'make-explicit', 'make-implicit', or 'make-nonchiral-implicit' (default: "as-is")) – Specify how to handle implicit or explicit hydrogens
prepare (Boolean (default: True)) – if true, perform ring and aromaticity perception during input
errors (one of "strict", "ignore", or "log") – specify how to handle errors

Returns:

a chemfp.base_toolkit.MoleculeReader iterating CDK molecules

chemfp.cdk_toolkit.set_id(mol, id)¶

Set the molecule’s id as CDK’s “cdk:Title” property

Parameters:

mol (a CDK molecule) – the molecule
id (string) – the new id

Returns:

None

chemfp.cdk_toolkit.suppress_log_output()¶

Return a context manager to disable CDK logging.

One entry, set the cdk.logging.level to “ERROR”. On exit, restore the previous value.

Example:

with suppress_output():: … do something which may log …

The returned context manager is a re-entrant singleton object. The context manager may be entered multiple times. Logging will not be re-enabled until the matching number of exits.

Returns:: a context manager

chemfp.cdk_toolkit.translate_record(content, in_format='smi', out_format='smi', *, id_tag=None, reader_args=None, writer_args=None, id=None, errors='strict')¶

Translate a molecule record from one format to another

Use the RDKit toolkit to parse the content as format in_format (default: “smi”) and translate it into out_format (default: “smi”). For an SDF record, use id_tag to get the record id from the given SD tag instead of the title line. Use reader_args and writer_args to configure format-specific parameters. Use id to set the id of the output record.

The errors parameter specifies how to handle errors. “strict” raises an exception, “report” sends a message to stderr and goes to the next record, and “ignore” goes to the next record.

Parameters:

content (a string) – the string containing a structure record
in_format (a format name string, or Format object) – the input structure format
out_format (a format name string, or Format object) – the output structure format
id_tag (string, or None to use the record title) – SD tag containing the record id
reader_args (a dictionary, or None) – reader arguments for the specified in_format
writer_args (a dictionary, or None) – writer arguments for the specified out_format
id (a string, or None to use the default) – the record id to use for the output record
errors (one of "strict", "report", or "ignore") – specify how to handle errors

Returns:

a string