chemfp makes cheminformatics fingerprints easy and fast.
Chemfp is a set of command-line tools and a Python library for fingerprint generation and high-performance similarity search. Chemfp supports the FPS exchange format for fingerprint data, so if want something beyond the OpenEye, Open Babel and RDKit fingerprints that chemfp supports then you can write your own FPS files and still use the fast Tanimoto search tool. It also supports the FPB binary format for fast loading, which is handy fast web server reloads and subsecond command-line searches.
The command-line tools are written in Python, with a C extension for better performance. The Python library has a large and well-documented public API which you can use from your own programs. The core functionality works with fingerprints as byte or hex strings, reads and writes fingerprint files, and performs similarity searches. Chemfp does not understand chemistry. Instead, it knows how to use RDKit, OEChem/OEGraphSim, and Open Babel to handle molecule I/O and to compute fingerprints for a molecule, and makes all of that work through a portable cross-toolkit API.
As a result, you can write a web service which takes molecule record in a supported format, searches a fingerprint file of one of the supported fingerprint types, finds the k=3 nearest neighbors, and returns the hit ids and scores, in about six lines of chemfp code, plus another five lines for the web service adapter. The fingerprint file contains the complete fingerprint type as metadata, which is enough for chemfp to figure out which toolkit to use to parse the input record and which fingerprint method and parameters to use to make the query fingerprint. Working with multiple chemistry toolkits has never been so easy.
The commercial license is available for €28 000 (+VAT if appropriate). It includes access to chemfp-2.0, free upgrades for one year, support, and reduced rates for support contract renewal.
The goal with version 2.0 is to make web development easy, across all of the supported toolkits. Some of the new features include:
- Binary FPB format support for fast load times. Command-line search for the nearest 10 neighbors take 0.1 seconds, and web-server reloads of multi-million fingerprint data sets is nearly instant!
- 40% faster FPS reader
- Support for reading structures from a string, in addition to file-based I/O. No longer do you need to save your CGI query to a temporary file before working with it.
- New APIs to discover the available fingerprint types and their default parameters.
- New toolkit API molecule I/O, tied to each fingerprint type so your code can easily use the appropriate toolkit.
Version 1.1 of the chemfp toolkit is available at no cost under the MIT/BSD license. New features will be added to the commercial version first, and after a few years older versions of the toolkit will be released at no cost.