Fast cheminformatics fingerprint search, anywhere you use Python
Chemfp is an analytics platform for cheminformatics fingerprints. It contains command-line tools and an extensive Python library for fingerprint generation, high-performance similarity search, diversity selection, and exploratory research.
Its market-leading performance and comprehensive API make it easy for you to add fast similarity search anywhere you use Python.
NEW! Chemfp 4.1 was released on 17 May 2023. See the documentation for the full list of notable changes or go to the download page.
Why chemfp?
- Do you want single-threaded search of 1M 1024-bit fingerprints in under 10 milliseconds?
- Do you want to make a sparse similarity matrix from 1M 2048-bit fingerprints in less than 30 minutes on a four core machine?
- Do you want to include fingerprint similarity results in your Python web application?
- ... with fast reload times during development, and without the complexity of using a dedicated search server?
- Do you want fast Butina clustering? Or MaxMin or sphere exclusion diversity selection?
- Do you work with fingerprints from multiple chemistry toolkits, or have custom fingerprint types?
- Do you want command-line tools with sub-second similarity search times?
- Do you program in Python and want to write new fingerprint analysis programs?
- Do you want the option to have the source code with no time-based licensing?
If that sounds interesting
You can get started by downloading the pre-compiled Linux version of chemfp using the following:
python -m pip install chemfp -i https://chemfp.com/packages/
A few features are either limited or disabled. Visit the licensing page to see the licensing terms, to request a evaluation key to unlock those features, and learn about some of the available licensing options.
You do not need to request a license key for Tanimoto searches of the licensed FPB files available from the datasets page, so long as you follow the terms of the Chemfp Base License Agreement.
More information
Chemfp includes extensive documentation. For a more scholarly description, see: Dalke, A. The chemfp project. J. Cheminformatics 11, 76 (2019). doi: 10.1186/s13321-019-0398-8
Open source reference baseline for benchmarking
Chemfp 1.6.1 is the latest version of the no-cost/open source chemfp development track. It only supports Python 2.7. It is being maintained in order to provide a good reference baseline to evaluate similarity search performance, and to support the dwindling number of legacy users who haven't moved to Python 3. See the download page for download details.
Some of the many improvements in chemfp 4.1 are: Butina clustering, read/write to SciPy compressed sparse matrix npz files, CXSMILES support, CSV format readers, parallelized sphere exclusion along with new ranking methods, and improved API for molecule structure and file processing.
Some of the improvements in chemfp 4.0 were MaxMin and sphere exclusion diversity selection, improved API for notebook use, pandas integration, and support for CSV/TSV output.