The newly released chemfp 3.5.1 adds support for licensed FPB files. These are cheminformatics fingerprint datasets which can be used under the terms of chemfp's base license agreement even without a chemfp license key or source code distribution.
As the first (and so far only) data set, I've converted the RDKit Morgan fingerprints from the ChEMBL 27 release into FPB format and made it available from https://chemfp.com/datasets/ as chembl_27.fpb.gz. The file is distributed under the terms of the ChEMBL license, which is CC BY-SA 3.0.
What is chemfp?
Chemfp is a Python package for working with cheminformatics fingerprints, including high-performance Tanimoto similarity search, built-in support for RDKit, OEChem/OEGraphSim, Open Babel, and the CDK, and integration with NumPy/SciPy. It contains an extensive and well-documented Python API and a set of command-line tools for fingerprint generation, search, and format conversion.
Chemfp natively supports the text-oriented FPS, and binary-oriented FPB fingerprint file formats. A licensed FPB file contains an authorization token which enables chemfp's Tanimoto search functionality for that data set.
How to get started with the ChEMBL 27 FPB file
If you are on a Linux-based OS and RDKit is already installed then here are the steps to get started:
1) Install a pre-compiled version of chemfp for Linux using the following:
python -m pip install chemfp -i https://chemfp.com/packages/
2) Download the ChEMBL data set in FPB format using one of the following:
curl -O https://chemfp.com/datasets/chembl_27.fpb.gz
or use your browser to save chembl_27.fpb.gz directly.
3) (Optional but recommended) Uncompress it:
4) Do a similarity search, for examples, with a query SMILES or query file:
simsearch --query c1ccccc1O chembl_27.fpb
simsearch --queries your_queries.sdf chembl_27.fpb
For more help about the
simsearch command use
on the command-line or see the chapter
"Working with the Command-line Tools"
in the chemfp documentation.
python -m chemfp fpb_text chembl_27.fpb
Chemfp's base license agreement lets you use most chemfp functionality for in-house use, except that you may not use it to:
- generate FPB files;
- create in-memory fingerprint arenas with more than 50,000 fingerprints;
- search in-memory fingerprint arenas with more than 50,000 fingerprints, unless they are licensed FPB files;
- perform Tversky searches;
- perform Tanimoto searches of FPS files with more than 20 queries at a time.
These features are present but disabled in the pre-compiled Linux distribution unless a time-based chemfp license key is found.
As an alternative, most customers choose to purchase a source code license, which has no time-limit (you may continue to use it even after your support period ends) and can also be used under macOS.
No-cost academic licensing is available.
See the chemfp licensing page for more details on the licensing options and for information about how to request an evaluation license.