chemfp API

This chapter contains the docstrings for the public portion of the chemfp API. Chemfp also has internal modules and functions that should not be imported or used directly. If you use parts of the undocumented API then your code is more likely to break with newer chemfp releases.

See “Getting started with the API” for some introductory examples.

Overview

The top-level chemfp module is the starting point for using chemfp. It contains functions to read and write fingerprint files, “high-level” commands for working with chemfp, and more.

The API for the FPS and FPS fingerprint readers and writers are defined in chemfp.fps_io and chemfp.fpb_io, which may refer to a Location object defined in chemfp.io.

The fingerprint arena class is defined in chemfp.arena.

The chemfp.search module contains similarity search functions for searching fingerprint arenas, and the SearchResult and SearchResults result class definitions. It also contains the similarity array functions to generate an all-by-all NumPy comparison array. These are the low-level APIs used for the high-level chemfp.simsearch() and chemfp.simarray() functions.

The chemfp.fps_search module contains similarity search functions for searching FPS files, and the search result class definitions. This is only needed when working in a streaming environment where fingerprint arena creation overhead is too large.

The chemfp.diversity module contains chemfp’s diversity pickers, all of which require a fingerprint arena. This is a lower-level API than using chemfp.maxmin(), chemfp.heapsweep(), or chemfp.spherex().

The chemfp.clustering module contains the ButinaClusters result from Butina clustering using chemfp.butina().

The chemfp.cdk_toolkit, chemfp.openbabel_toolkit, chemfp.openeye_toolkit and chemfp.rdkit_toolkit modules contain the public-facing API for chemfp’s cheminformatics toolkit wrapper implementations. The chemfp.cdk, chemfp.openbabel, chemfp.openeye, chemfp.rdkit objects will automatically import the underlying toolkit and forward to them.

The FingerprintType implementations for the different toolkits are:

  • CDK
  • RDKit
    • chemfp.rdkit_types: core RDKit toolkit fingerprints

    • chemfp.rdkit_patterns: chemfp’s RDKit-based fingerprints

  • OpenEye
    • chemfp.openeye_types: core OEGraphSim fingerprints

    • chemfp.openeye_patterns: chemfp’s OEChem-based fingerprints

  • Open Babel
    • chemfp.openbabel_types: core Open Babel toolkit fingerprints

    • chemfp.openbabel_patterns: chemfp’s Open Babel-based fingerprints

Sometimes you need to work with SMILES or SD files as text records, not molecules. For that, use the chemfp.text_toolkit module.

Sometimes you need to work with CVS files containing structure records or fingerprint. For that, use functions like read_csv_ids_and_fingerprints() and read_csv_rows() from the chemfp.csv_readers module, or the read_csv_ids_and_molecules() function in the toolkit wrapper module.

The chemfp.bitops module has functions to work with fingerprints represented as byte strings or hex-encoded strings, as well as configuration functions for configuring chemfp’s bit operations. Use the chemfp.encodings to decode from various fingerprint string representations to a byte string.

Finally, the chemfp.types module contains a few public exceptions which derived from ValueError but which don’t yet also derive from ChemFPError.