Licensed chemfp datasets
Chemfp 3.5.1 added support for licensed FPB files. Chemfp's Base License Agreement, describes how you may use chemfp if no other legal agreement is in place. It usually restricts you from doing in-memory searches with more than 50,000 fingerprints unless you have a valid chemfp license key. (See the licensing page to request a evaluation key or request a no-cost academic license.)
A licensed FPB is an FPB file with an embedded authorization key generated by Andrew Dalke Scientific, AB. The Base License Agreement permits you to do Tanimoto searches of those files, and the authorization key unlocks that part of the chemfp software.
Available data sets.
At present two data sets are available:
The RDKit/Morgan circular fingerprints from the respective ChEMBL releases were converted into FPB format and are distributed under the terms of the ChEMBL license.
These files are gzip-compressed for distribution. For most cases you should either uncompress them after downloading or re-compress using Zstandard.
The required attribution is:
For publications using ChEMBL data, the primary current citation is:
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019 47(D1):D930-D940. DOI: 10.1093/nar/gky1075
If ChEMBL is incorporated into other works, we ask that the ChEMBL IDs are preserved, and that the release number of ChEMBL is clearly displayed.
The ChEMBL attribution and license terms, and a legal notice about how it was adapted for this distribution are embedded in the FPB file. To see them, either view the uncompressed content directly or use:
python -m chemfp fpb_text chembl27.fpb.gz python -m chemfp fpb_text chembl28.fpb.gz
If you have uncompressed the file or re-compressed it for Zstandard then use the appropriate alternate filename.
Note that the embedded FPB authorization token only affects your use of chemfp to process this data file. It does not add additional restrictions on your use of ChEMBL data in the FPB file.