Licensed chemfp datasets
Chemfp 3.5.1 added support for licensed FPB files. Chemfp's Base License Agreement, describes how you may use chemfp if no other legal agreement is in place. It usually restricts you from doing in-memory searches with more than 50,000 fingerprints unless you have a valid chemfp license key. (See the licensing page to request a evaluation key or request a no-cost academic license.)
A licensed FPB is an FPB file with an embedded authorization key generated by Andrew Dalke Scientific, AB. The Base License Agreement permits you to do Tanimoto searches of those files, and the authorization key unlocks that part of the chemfp software.
Available data sets.
At present three data sets are available:
-
chembl_32.fpb.gz - The fingerprints chembl_32.fps.gz from the ChEMBL 32 release.
-
chembl_33.fpb.gz - The fingerprints chembl_33.fps.gz from the ChEMBL 33 release.
-
chembl_34.fpb.gz - The fingerprints chembl_34.fps.gz from the ChEMBL 34 release.
The RDKit/Morgan circular fingerprints from the respective ChEMBL releases were converted into FPB format and are distributed under the terms of the ChEMBL license.
These files are gzip-compressed for distribution. For most cases you should either uncompress them after downloading or re-compress using Zstandard.
The required attribution is:
For publications using ChEMBL data, the primary current citation is:
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019 47(D1):D930-D940. DOI: 10.1093/nar/gky1075
If ChEMBL is incorporated into other works, we ask that the ChEMBL IDs are preserved, and that the release number of ChEMBL is clearly displayed.
The ChEMBL attribution and license terms, and a legal notice about how
it was adapted for this distribution are embedded in the FPB file. To
see them, use chemfp's fpb_text
subcommand:
chemfp fpb_text chembl_32.fpb.gz
chemfp fpb_text chembl_33.fpb.gz
chemfp fpb_text chembl_34.fpb.gz
If you have uncompressed the file or re-compressed it for Zstandard then use the appropriate alternate filename.
Note that the embedded FPB authorization token only affects your use of chemfp to process this data file. It does not add additional restrictions on your use of ChEMBL data in the FPB file. You may use any other program to process these files.