Supported toolkits

Chemfp is a fingerprint toolkit. It depends on a third-party chemistry toolkit to generate fingerprints from a chemical structure. The currently supported toolkits are OEChem/OEGraphSim, RDKit and Open Babel.

The latest versions of each toolkit are supported, as well as the previous several releases. OEChem 2017.Oct, which is the last version of OEChem/OEGraphSim to support Python 2.7, will be supported until 2020.

Command-line support for different toolkits

The toolkit integration occurs at multiple levels.

At the command-line level you can use oe2fps, rdkit2fps, and ob2fps to generate toolkit-specific fingerprints from SMILES file, SDF, or other chemistry structure format, and save the result to chemfp's FPS or FPB formats. (The FPB format is only supported in the commercial version of chemfp.)

Cross-toolkit API

If you are a Python programmer, you can also use chemfp's fingerprint and toolkit APIs. These work with toolkit-native molecules, so you are free to create the molecule any way you like.

In addition, chemfp provides a common API for fingerprint generation and file I/O across all of the supported toolkits. This might not be that important if you only deal with one toolkit, but it's very handy if you want to handle multiple toolkits.

Example fingerprint search web service

For example, here's a small program named "fpsearch.py" which uses the flask microframework to implement a web service that finds the nearest 10 ChEMBL matches to a query SMILES. It uses only the chemfp APIs, which means it will work with any of the supported fingerprint types and toolkits.

# Save this as "fpsearch.py"
from flask import Flask, request, abort, Response

import chemfp

# Load the database, and use the 'type' metadata to figure out which
# toolkit and which fingerprint parameters to use.
db = chemfp.load_fingerprints("chembl_23.fps")
fptype = db.get_fingerprint_type()

app = Flask(__name__)

@app.route("/search")
def search():
    # Get the 'q' query parameter and try to process it as a SMILES string.
    smiles = request.args.get("q", None)
    if smiles is None:
        abort(Response("Missing 'q' parameter"))

    fp = fptype.parse_molecule_fingerprint(smiles, "smistring", errors="ignore")
    if fp is None:
        abort(Response("Cannot parse 'q' parameter as a SMILES"))

    # Search the database and report the 10 nearest hits.
    result = db.knearest_tanimoto_search_fp(fp, k=10, threshold=0.0)
    ids_and_scores = result.get_ids_and_scores()
    response = "".join("%.3f,%s\n" % (score, id) for (id, score) in ids_and_scores)
    return Response(response, content_type="text/plain")

To make it work:

  1. Install the flask framework with pip install flask

  2. Download the ChEMBL 23 SDF and use one of the following to generate fingerprints:

    • ob2fps chembl_23.sdf.gz -o chembl_23.fps
    • oe2fps chembl_23.sdf.gz -o chembl_23.fps
    • rdkit2fps chembl_23.sdf.gz -o chembl_23.fps
  3. Save the above program as fpsearch.py.

  4. set the environment variable FLASK_APP to "fpsearch.py", eg,

    • export FLASK_APP=fpsearch.py
  5. In the directory containing fpsearch.py, run the command flask run to start the server.

  6. With your web browser, go to: http://127.0.0.1:5000/search?q=c1ccccc1N

You should see output like:

1.000,CHEMBL538
0.955,CHEMBL3182415
0.656,CHEMBL3392014
0.600,CHEMBL572203
0.588,CHEMBL44201
0.583,CHEMBL403741
0.583,CHEMBL3186715
0.571,CHEMBL3185160
0.567,CHEMBL1595914
0.560,CHEMBL3561416

In case you are curious, I generated the "chembl_23.fps" file using the fingerprint type OpenEye-Tree/2 numbits=4096 minbonds=0 maxbonds=4 atype=Arom|AtmNum|Chiral|FCharge|HvyDeg|Hyb btype=Order.

What makes the chemfp API useful is that I could replace the FPS file with, say, the RDKit MACCS fingerprints, restart the server, and the search service will switch from using OEChem and OEGraphSim to using RDKit - with no other changes to the code.