chemfp 3.4 is available

I've just released chemfp 3.4, which is the latest version of the commercial chemfp development track. Use the following to download pre-compiled packages for most Linux-based operating systems:

python -m pip install chemfp -i https://chemp.com/packages/

These are available at no cost under the Chemfp Base License Agreement. This license lets you use most chemfp features in-house, and generate FPS files for any purpose.

Some features are either limited or disabled. The Base License does not permit you to:

  • generate FPB files;
  • create or search in-memory fingerprint arenas with more than 50,000 fingerprints;
  • perform Tversky searches;
  • perform Tanimoto searches of FPS files with more than 20 queries at a time.

These features can be enabled with a time-based license key. See the chemfp licensing page to learn how to request an evaluation key and to learn about the different licensing options.

Notable chemistry toolkit changes

chemfp's RDKit interface added support for the “SECFP” SMILES-based circular fingerprints from the Reymond group, added options to write cxsmiles annotations and to always write v3000 molfiles, and added support for RDKit's Mol2, PDB, Maestro, XYZ, HELM, and FASTA formats.

chemfp's Open Babel interface added support for Open Babel 3.0 and its ECFP fingerprint implementations.

chemfp's OpenEye interface added support for its OEZ, CIF, mmCIF, PDB, FASTA, and CSV parsers, and experimental support for its substructure screens.

Tool changes

Simsearch now accepts a structure input, either as a command-line argument or from a filename. Previously you needed to convert the structure using ob2fps, oe2fps, or rdkit2fps.

Added a --help-format option to rdkit2fps, ob2fps and oe2fps to show the list of available input structure format.

I/O changes

Chemfp now supports compressed FPB files as input. These are not memory-mapped but are read into memory instead. Compressed FPB files may be useful for network filesystems where the data transfer overhead is much lower than the decompression time.

The gzip reader performance was improved by about 15%, and the FPS reader by about 20%. The overall sdf2fps performance on PubChem files is about 10% faster.

If the 'zstandard' package is installed then chemfp will use it to create and read fps.zst and fpb.zst files.

For a more comprehensive list of changes, see the chemfp 3.4 documentation.