chemfp.base_toolkit module¶
Support code which is shared by the toolkit wrappers and the text_toolkit.
This is an internal chemfp module. It should not be imported by programs which use the public API. (Let me know if anything else should be part of the public API.)
This module contains class definitions for objects which are returned as part of the public API.
A Format
contains information about a toolkit format, along
with methods to get information about format-specific parameters.
A FormatMetadata
contains metadata about the structure file
reader or writer, including the record format and any format-specific
parameters.
The BaseMoleculeReader
is the base class for
IdAndMoleculeReader
,
IdAndRecordReader
,
MoleculeReader
, and
RecordReader
, which are returned by the different ways to
read from a structure file.
The BaseMoleculeWriter
is the base class for
MoleculeWriter
and
MoleculeStringWriter
, which are used to write molecule (or
records) to a file or to a string, respectively.
- class chemfp.base_toolkit.BaseMoleculeReader(metadata, structure_reader, location)¶
Bases:
object
Base class for the toolkit readers
A Reader is an iterators, so iter(reader) returns itself. next(reader) returns either a single object or a pair of objects depending on reader.
A Reader is also a context manager, and calls self.close() when exiting the context.
location
- achemfp.io.Location
instanceclosed
- True if the reader has been closed
- close()¶
Close the reader
If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the reader may have opened, and set
self.closed
to False.
- closed¶
False if the reader is open, otherwise True
- location¶
a
chemfp.io.Location
instance
- metadata¶
a
chemfp.base_toolkit.FormatMetadata
instance
- class chemfp.base_toolkit.BaseMoleculeWriter(metadata, structure_writer, location)¶
Bases:
object
The base molecule writer API, implemented by
MoleculeWriter
andMoleculeStringWriter
A writer is a context manager, and calls self.close() when the context exits.
- close()¶
Close the writer
If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the writer may have opened, and set
self.closed
to False.
- closed¶
False if the reader is open, otherwise True
- location¶
a
chemfp.io.Location
instance
- metadata¶
a
chemfp.base_toolkit.FormatMetadata
instance
- write_id_and_molecule(id, mol)¶
Write an identifier and toolkit molecule
If id is None then the output uses the molecule’s own id/title. Specifying the id may modify the molecule’s id/title, depending on the format and toolkit.
- Parameters:
id (string, or None) – the identifier to use for the molecule
mol (a toolkit molecule) – the molecule to write
- write_ids_and_molecules(ids_and_mols)¶
Write a sequence of (id, molecule) pairs
This function works well with
chemfp.toolkit.read_ids_and_molecules()
, for example, to convert an SD file to SMILES file, and use an alternate id_tag to specify an alternative identifier.- Parameters:
mols (a (id string, toolkit molecule) iterator) – the molecules to write
- write_molecule(mol)¶
Write a toolkit molecule
- Parameters:
mol (a toolkit molecule) – the molecule to write
- write_molecules(mols)¶
Write a sequence of molecules
- Parameters:
mols (a toolkit molecule iterator) – the molecules to write
- class chemfp.base_toolkit.Format(toolkit_name, format_config, compression=None)¶
Bases:
object
Information about a toolkit format.
Use the toolkit’s
get_format
and related functions to return a Format instance.- compression¶
the compression type, “” for uncompressed, “gz” for gzip, etc.
- property extensions¶
Return a list of appropriate filename extensions for this format
Returns an empty list if this format does not support io.
- get_default_reader_args()¶
Return a dictionary of the default reader arguments
The keys are unqualified (ie, without dots).
>>> from chemfp import openbabel_toolkit as T >>> fmt = T.get_format("smi") >>> fmt.get_default_reader_args() {'has_header': False, 'delimiter': None, 'options': None}
- Returns:
a dictionary of string keys and Python objects for values
- get_default_writer_args()¶
Return a dictionary of the default writer arguments
The keys are unqualified (ie, without dots).
>>> from chemfp import openbabel_toolkit as T >>> fmt = T.get_format("smi") >>> fmt.get_default_writer_args() {'explicit_hydrogens': False, 'isomeric': True, 'delimiter': None, 'options': None, 'canonicalization': 'default'}
- Returns:
a dictionary of string keys and Python objects for values
- get_reader_args_from_text_settings(reader_settings)¶
Process the reader_settings and return the reader_args for this format.
This function exists to help convert string settings, eg, from the command-line or a configuration, into usable reader_args.
Setting names may be fully-qualified names like “rdkit.sdf.sanitize”, partially qualified names like “rdkit.*.sanitize” or “openeye.smi.delimiter”, or unqualified names like “delimiter”. The qualifiers act as a namespace so the settings can be specified without needing to know the actual toolkit or format.
The function turns the format-appropriate qualified names into unqualified ones and converts the string values into usable Python objects. For example:
>>> from chemfp import rdkit_toolkit as T >>> fmt = T.get_format("smi") >>> fmt.get_reader_args_from_text_settings({"rdkit.*.sanitize": "true", "delimiter": "to-eol"}) {'delimiter': 'to-eol', 'sanitize': True}
- Parameters:
reader_settings (a dictionary with string keys and values) – the reader settings
- Returns:
a dictionary of unqualified argument names as keys and processed Python values as values
- get_unqualified_reader_args(reader_args)¶
Convert possibly qualified reader args into unqualified reader args for this format
The reader_args dictionary can be confusing because of the priority rules in how to resolve qualifiers, and because it can include irrelevant parameters, which are ignored.
The get_unqualified_reader_args function applies the qualifier resolution algorithm and removes irrelevant parameters to return a dictionary containing the equivalent unqualified reader args dictionary for this format.
>>> from chemfp import rdkit_toolkit as T >> fmt = T.get_format("smi") >>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"}) {'delimiter': 'tab', 'has_header': False, 'sanitize': False} >>> fmt = T.get_format("can") >>> fmt.get_unqualified_reader_args({"rdkit.*.delimiter": "tab", "smi.sanitize": False, "X": "Y"}) {'delimiter': 'tab', 'has_header': False, 'sanitize': True}
- Parameters reader_args:
reader arguments, which can contain qualified and unqualified arguments
- Returns:
a dictionary of reader arguments, containing only unqualified arguments appropriate for this format.
- get_unqualified_writer_args(writer_args)¶
Convert possibly qualified writer args into unqualified writer args for this format
The writer_args dictionary can be confusing because of the priority rules in how to resolve qualifiers, and because it can include irrelevant parameters, which are ignored.
The get_unqualified_writer_args function applies the qualifier resolution algorithm and removes irrelevant parameters to return a dictionary containing the equivalent unqualified writer args dictionary for this format.
>>> from chemfp import rdkit_toolkit as T >>> fmt = T.get_format("smi") >>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"}) {'isomericSmiles': True, 'delimiter': 'tab', 'kekuleSmiles': True, 'allBondsExplicit': False, 'canonical': True} >>> fmt = T.get_format("can") >>> fmt.get_unqualified_writer_args({"rdkit.*.delimiter": "tab", "smi.kekuleSmiles": True, "X": "Y"}) {'isomericSmiles': False, 'delimiter': 'tab', 'kekuleSmiles': False, 'allBondsExplicit': False, 'canonical': True}
- Parameters writer_args:
writer arguments, which can contain qualified and unqualified arguments
- Returns:
a dictionary of writer arguments, containing only unqualified arguments appropriate for this format.
- get_writer_args_from_text_settings(writer_settings)¶
Process writer_settings and return the writer_args for this format.
This function exists to help convert string settings, eg, from the command-line or a configuration, into usable writer_args.
Setting names may be fully-qualified names like “rdkit.sdf.kekulize”, partially qualified names like “rdkit.*.delimiter” or “openeye.smi.delimiter”, or unqualified names like “delimiter”. The qualifiers act as a namespace so the settings can be specified without needing to know the actual toolkit or format.
The function turns the format-appropriate qualified names into unqualified ones and converts the string values into usable Python objects. For example:
>>> from chemfp import rdkit_toolkit as T >>> fmt = T.get_format("smi") >>> fmt.get_writer_args_from_text_settings({"rdkit.*.kekuleSmiles": "true", "canonical": "false"}) {'kekuleSmiles': True, 'canonical': False}
- Parameters:
writer_settings (a dictionary with string keys and values) – the writer settings
- Returns:
a dictionary of unqualified argument names as keys and processed Python values as values
- property is_available¶
Return True if this version of the toolkit understands this format
For example, if your version of RDKit does not support InChI then this would return False for the “inchi” and “inchikey” formats.
- property is_input_format¶
Return True if this toolkit can read molecules in this format
- property is_output_format¶
Return True if this toolkit can write molecules in this format
- name¶
the format name, without any compression information
- property prefix¶
Return the prefix to turn an unqualified parameter into a fully qualified parameter
- Returns:
a string like “rdkit.smi” or “openbabel.sdf”
- property supports_io¶
Return True if this format support reading or writing records
This will return False for formats like “smistring” and “inchikeystring” because those are are not record-based formats.
Note: I don’t like this name. I may change it to
is_record_format
. Let me know if you have ideas, or if changing the name will be a problem.
- toolkit_name¶
the toolkit name; either “cdk”, “openeye”, “openbabel”, or “rdkit”
- class chemfp.base_toolkit.FormatMetadata(filename, record_format, args)¶
Bases:
object
Information about the reader or writer
- args¶
the final reader_args or writer_args, after all processing, and as used by the reader and writer
- filename¶
the source or destination filename, the string “<string>” for string-based I/O, or None if not known
- record_format¶
the normalized record format name. All SMILES formats are “smi” and this does not contain compression information
- class chemfp.base_toolkit.IdAndMoleculeReader(metadata, structure_reader, location)¶
Bases:
BaseMoleculeReader
Read structures from a file and iterate over the (id, toolkit molecule) pairs
Note: the toolkit implementation is free to reuse a molecule instead of returning a new one each time.
- class chemfp.base_toolkit.IdAndRecordReader(metadata, structure_reader, location)¶
Bases:
BaseMoleculeReader
Read records from file and iterate over the (id, record string) pairs
- class chemfp.base_toolkit.MoleculeReader(metadata, structure_reader, location)¶
Bases:
BaseMoleculeReader
Read structures from a file and iterate over the toolkit molecules
Note: the toolkit implementation is free to reuse a molecule instead of returning a new one each time.
- class chemfp.base_toolkit.MoleculeStringWriter(details, structure_writer, getvalue, location)¶
Bases:
BaseMoleculeWriter
A BaseMoleculeWriter which writes molecules to a string.
A writer is a context manager, and calls self.close() when the context exits.
- close()¶
Close the writer
If the reader wasn’t previously closed then close it. This will set the location properties to their final values, close any files that the writer may have opened, and set
self.closed
to False.self.getvalue()
will still work after the file is closed.
- getvalue()¶
Get the string containing all of the written record.
This function can also be called after the writer is closed.
- Returns:
a string
- class chemfp.base_toolkit.MoleculeWriter(metadata, structure_writer, location)¶
Bases:
BaseMoleculeWriter
A BaseMoleculeWriter which writes molecules to a file.
A writer is a context manager, and calls self.close() when the context exits.
- class chemfp.base_toolkit.RecordReader(metadata, structure_reader, location)¶
Bases:
BaseMoleculeReader
Read and iterate over records as strings