Supported toolkits

Chemfp is a fingerprint toolkit with an extensive Python API. It depends on a third-party chemistry toolkit to generate fingerprints from a chemical structure. The currently supported toolkits are OEChem/OEGraphSim, RDKit, Open Babel, and CDK.

The latest versions of each toolkit are supported, as well as the previous several releases.

Command-line support for different toolkits

The toolkit integration occurs at multiple levels.

At the command-line level you can use oe2fps, rdkit2fps, ob2fps, and cdk2fps, to generate toolkit-specific fingerprints from SMILES file, SDF, or other chemistry structure format, and save the result to chemfp's FPS or FPB formats. (The FPB format is only supported in the commercial version of chemfp.)

Cross-toolkit API

These tools are built using a common cross-platform API, which is part of the public chemfp API. If you are a Python programmer, you can use it for your own tools.

The toolkit wrappers provide a consistent API for reading and writing structure files, parsing and creating structure records, and for fingerprint generation. This might not be that important if you only deal with one toolkit, but it's very handy if you want to handle or switch between multiple toolkits in your software.

Take a look for yourself at the wrapper APIs for CDK, OpenEye's OEChem, Open Babel, and RDKit.

Example fingerprint search web service

The Toolkit API examples section of the documentation contains many examples of how to use the toolkit API.

For an example to get you started, here's a small program named "fpsearch.py" which uses the flask microframework to implement a web service that finds the nearest 10 ChEMBL matches to a query SMILES. It uses only the chemfp APIs, which means it will work with any of the supported fingerprint types and toolkits.

# Save this as "fpsearch.py"
from flask import Flask, request, abort, Response

import chemfp

# Load the database, and use the 'type' metadata to figure out which
# toolkit and which fingerprint parameters to use.
db = chemfp.load_fingerprints("chembl_32.fpb")
fptype = db.get_fingerprint_type()

app = Flask(__name__)

@app.route("/search")
def search():
    # Get the 'q' query parameter and try to process it as a SMILES string.
    smiles = request.args.get("q", None)
    if smiles is None:
        abort(Response("Missing 'q' parameter"))

    fp = fptype.from_smistring(smiles, errors="ignore")
    if fp is None:
        abort(Response("Cannot parse 'q' parameter as a SMILES"))

    # Search the database and report the 10 nearest hits.
    result = db.knearest_tanimoto_search_fp(fp, k=10, threshold=0.0)
    ids_and_scores = result.get_ids_and_scores()
    response = "".join(f"{score:.3f},{id}\n" for (id, score) in ids_and_scores)
    return Response(response, content_type="text/plain")

To make it work:

  1. Install the flask framework with pip install flask

  2. Download and uncompress the pre-generated fingerprints for ChEMBL 32, for example, with the following:

    • curl -O https://chemfp.com/datasets/chembl_32.fpb.gz
    • gunzip chembl_32.fpb.gz
  3. Save the above program as fpsearch.py.

  4. set the environment variable FLASK_APP to "fpsearch.py", eg,

    • export FLASK_APP=fpsearch.py
  5. In the directory containing fpsearch.py, run the command "flask run" to start the server.

  6. With your web browser, go to: http://127.0.0.1:5000/search?q=c1ccccc1N

You should see output like:

1.000,CHEMBL538
0.917,CHEMBL3182415
0.625,CHEMBL44201
0.588,CHEMBL3185160
0.583,CHEMBL403741
0.579,CHEMBL3392014
0.538,CHEMBL3561416
0.533,CHEMBL1595914
0.526,CHEMBL572203
0.526,CHEMBL69011

The file "chemfp_32.fpb" derives from the FPS files distributed as part of ChEMBL 32. They are RDKit Morgan fingerprints with the fingerprint type RDKit-Morgan/1 radius=2 fpSize=2048 useFeatures=0 useChirality=0 useBondTypes=1.

In addition, I embedded a special database license key in the FPB file that unlocks chemfp functionality that otherwise requires a chemfp license key.

What makes the chemfp API useful is that I could replace the FPB file with, say, the OpenEye circular fingerprints, restart the server, and the search service will switch from using RDKit to OEChem and OEGraphSim - with no other changes to the code.