chemfp heapsweep¶
The “chemfp heapsweep” command-line tool implements the heapsweep diversity selection algorithm. See Heapsweep diversity selection for an example of selecting diverse fingerprints from a set of references.
This functionality is also available from Python using the high-level
chemfp.heapsweep()
function, with an example at
Select diverse fingerprints with Heapsweep.
The main use case is to find the globally most diverse fingerprint or fingerprints in a dataset. While it can be used to find additional fingerprints, I’m not sure the result is scientifically useful.
The rest of this chapter contains the output from chemfp heapsweep --help.
chemfp heapsweep command-line options¶
The following comes from chemfp heapsweep --help
:
Usage: chemfp heapsweep [OPTIONS] CANDIDATES
Diversity selection using the heapsweep algorithm.
Options:
-t, --threshold FLOAT Maximum similarity (default: 1.0)
-n, --num-picks N Number of picks (default: 'all')
--all-equal Continue picking past --num-picks if the
pick score is unchanged
--in, --candidates-format TEXT Format of the candidates file (default uses
filename extension, or 'fps')
--randomize / --no-randomize Use --randomize (the default) to shuffle the
candidates before starting MaxMin
--seed N Specify the random number generator seed
between 0 and 2**64-1, inclusive, or use -1
to have one picked at random (default: -1)
--mmap / --no-mmap Don't use mmap to read uncompressed FPB
files. May give better performance on
networked file systems, at the expense of
higher memory use.
--neighbors FILENAME For each pick, includes the nearest neighbor
and score from fingerprints in FILENAME
--neighbors-format FORMAT Format of the neighbors file (default uses
filename extension, or 'fps')
--save-picks PATH Write picked fingerprints to the named file.
--save-picks-format PATH Specify the format for the picked
fingerprints.
--save-candidates PATH Write remaining candidate fingerprints to
the named file.
--save-candidates-format FORMAT
Specify the format for the remaining
candidate fingerprints.
--precision [1|2|3|4|5|6|7|8|9|10]
Number of digits in Tanimoto score (default:
based on the fingerprint size)
-o, --output PATH Write output to the named file instead of
stdout.
--out TEXT Output format. Must be one of 'diversity'
(the default), 'csv', or 'tsv' with optional
compression
--pick-time / --no-pick-time Include the elapsed time for each pick
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--times / --no-times Write timing information to stderr
--progress / --no-progress Show a progress bar (default: show unless
the output is a terminal)
--help Show this message and exit.