chemfp makes cheminformatics fingerprints easy and fast.

Chemfp is a set of command-line tools and a Python library for fingerprint generation and high-performance similarity search. Chemfp supports the FPS exchange format for fingerprint data, so if want something beyond the OpenEye, Open Babel and RDKit fingerprints that chemfp supports then you can write your own FPS files and still use the fast Tanimoto search tool. It also supports the FPB binary format for fast loading, which is handy fast web server reloads and subsecond command-line searches.

The command-line tools are written in Python, with a C extension for better performance. The Python library has a large and well-documented public API which you can use from your own programs. The core functionality works with fingerprints as byte or hex strings, reads and writes fingerprint files, and performs similarity searches. Chemfp does not understand chemistry. Instead, it knows how to use RDKit, OEChem/OEGraphSim, and Open Babel to handle molecule I/O and to compute fingerprints for a molecule, and makes all of that work through a portable cross-toolkit API.

As a result, you can write a web service which takes molecule record in a supported format, searches a fingerprint file of one of the supported fingerprint types, finds the k=3 nearest neighbors, and returns the hit ids and scores, in about six lines of chemfp code, plus another five lines for the web service adapter. The fingerprint file contains the complete fingerprint type as metadata, which is enough for chemfp to figure out which toolkit to use to parse the input record and which fingerprint method and parameters to use to make the query fingerprint. Working with multiple chemistry toolkits has never been so easy.

Licensing options

There are two development tracks for chemfp. Version 1.3 is the current no-cost version of chemfp, and version 3.1 is the current commercial version.

The commercial version is available for €28 000 (+VAT if appropriate). It includes access to chemfp 3.1, free upgrades for one year, support, and reduced rates for support contract renewal. You also get the source code under the MIT license. Yes, chemfp is commercial open source!

The focus of the 2.0 series was to make web development easy, across all of the supported toolkits, and the 3.0 series added Python 3 support. Some of the improvements over chemfp 1.3 include:

  • Binary FPB format support for fast load times. Command-line search for the nearest 10 neighbors take 0.1 seconds, and web-server reloads of multi-million fingerprint data sets is nearly instant!
  • Support for reading structures from a string, in addition to file-based I/O. No longer do you need to save your CGI query to a temporary file before working with it.
  • Similarity search performance is about 15% faster
  • New APIs to discover the available fingerprint types and their default parameters.
  • New toolkit API molecule I/O, tied to each fingerprint type so your code can easily use the appropriate toolkit.
  • The same code base supports both Python 2.7 and Python 3.5 or later.
Download the no-cost version

Version 1.4 of the chemfp toolkit is available at no cost under the MIT license.