chemfp 3.5 is available

I've just released chemfp 3.5, which is the latest version of the commercial chemfp development track. To install it on Linux-based OSes do the following:

python -m pip install chemfp -i https://chemfp.com/packages/

These are available at no cost under the Chemfp Base License Agreement. This license lets you use most chemfp features in-house, and generate FPS files for any purpose.

Some features are either limited or disabled. See the chemfp licensing page to learn how to request an evaluation key and to learn about the different licensing options.

Below are a couple of highlights. See the documentation for the full list of notable changes.

Search-related changes

This release adds much better integration between the chemfp search results and NumPy with a new function to access the hits as a zero-copy NumPy array view of the underlying data. In addition, reordering the search results by score is significantly faster. There is also a small helper function to pick a fingerprint at random. Finally, there is an option to load the contents of the FPB file into memory rather than use a mmap, which seems to have better run-time performance, especially for networked filesystems.

The following small program shows how the new features work together to select a random fingerprint from ChEMBL, find all Tanimoto scores to the rest of the data set, then plot the distribution of scores from most similar to least:

import time
import chemfp
from matplotlib import pyplot

arena = chemfp.load_fingerprints("chembl_23.fpb", allow_mmap=False)
query_fp = arena.fingerprints.random_choice(rng=1235)
result = arena.threshold_tanimoto_search_fp(query_fp, threshold=0.0)
result.reorder("decreasing-score") # ~5x faster
score = result.get_scores_as_numpy_array() # over 300x faster
pyplot.plot(scores)
pyplot.show()

CDK support

Chemfp 3.4 adds support for the CDK, which is a Java-based chemistry toolkit. Chemfp uses JPype to have Python integrate with Java. See the CDK installation notes for help getting CDK to work with chemfp.

This adds the new fingerprint types CDK-Daylight, CDK-GraphOnly, CDK-MACCS, CDK-EState, CDK-Extended, CDK-Hybridization, CDK-Pubchem, CDK-Substructure, CDK-ShortestPath, CDK-ECFP0, CDK-ECFP2, CDK-ECFP4, CDK-ECFP6, CDK-FCFP0, CDK-FCFP2, CDK-FCFP4, CDK-FCFP6, CDK-AtomPairs2D, RDMACCS-CDK, and ChemFP-Substruct-CDK.

The supported CDK formats are smi, can, usm, smistring, canstring, usmstring, inchi, inchistring, inchikey, inchikeystring, sdf, and molfile.