Chemfp 1.5 is now available
for download
or through PyPI as pip install chemfp
.
The knearest and count searches are about 10% faster. They now use a fast integer test to reject obvious mismatches before doing the computationally expensive divison operation. This optimization was already available in threshold search.
Similarity search using un-indexed fingerprints (eg, when reorder=False is used to keep the fingerprints in input order) is about 5x faster. The new code uses uses the optimized popcount functions rather than the generic and slow Tanimoto calculation.
The "--times" option now distinguishes between search time and output time. This change was done to make chemfp 1.5 a better baseline for fingerprint similarity benchmarks.
A serious bug in the symmetric k-nearest search has been fixed. If there were multiple fingerprints with no bits set then all of the hits were merged together. This caused the code to crash in multi-threaded clustering when multiple threads tried to reallocate the same data structure. In practice, it seems that this was only a problem when using k-nearest neighbors to cluster PubChem fingerprints.
For more information, see the What's New section of the documentation.