fpcat¶
The “fpcat” command-line tool (also available as “chemfp fpcat”) concatenates one or more fingerprint files. It is often used to convert between FPS and FPB formats.
There is no equivalent high-level function in the Python API. Instead,
use chemfp.open()
and chemfp.open_fingerprint_writer()
,
like:
with open("input.fps") as reader:
with open("output.fpb") as writer:
writer.write_fingerprints(reader)
The rest of this chapter contains the output from fpcat --help.
fpcat command-line options¶
The following comes from fpcat --help
:
Usage: fpcat [OPTIONS] FILENAME
Combine multiple fingerprint files into a single file.
Options:
--in FORMAT Input fingerprint format. One of fps or fpb
(with optional gz or zst compression), or
flush. (default guesses from filename or is
fps)
--merge Assume the input fingerprint files are in
popcount order and do a merge sort.
-o, --output FILENAME Save the fingerprints to FILENAME
(default=stdout)
--out FORMAT Output format, one of 'fps', 'fps.gz',
'fps.zst', 'fpb', or 'flush' (default
guesses from output filename, or is 'fps')
--include-metadata / --no-metadata
With --no-metadata, do not include the
header metadata for FPS output.
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--level LEVEL Compression level. Must be a positive
integer or one of 'min', 'default', or
'max'.
--reorder Reorder the output fingerprints by popcount.
(default for FPB output)
--preserve-order Save the output fingerprints in the same
order as the input. (default for FPS output)
--alignment [1|2|4|8|16|32|64|128|256]
Alignment size when saving a FPB file.
(default=8)
--show-progress Show progress.
--max-spool-size SIZE Use temporary files for extra storage space
for huge FPB files (default uses RAM).
--tmpdir DIRNAME Directory for the temporary files (default
uses the system temp directory).
--version Show the version and exit.
--license-check Check the license and report results to
stdout.
--license-file FILENAME Specify a chemfp license file
--traceback Print the traceback on KeyboardInterrupt
--version Show the version and exit.
--help Show this message and exit.
Examples:
fpcat can be used to convert between FPS and FPB formats. This is handy if
you want to see what's inside of an FPB file:
fpcat fingerprints.fpb
You can use also use fpcat to make an FPB file from an FPS file:
fpcat fingerprints.fps -o fingerprints.fpb
You might have generated a set of FPS file which you want to merge into a
single FPB. (For example, you might have used GNU parallel to generate FPS
files for each of the PubChem files, which you want to merge into a single
file.):
fpcat Compound_*.fps -o pubchem.fpb
By default the FPB format sorts the fingerprints by popcount. (Use
--preserve-order if you really want to preserve the input order.) The sort
overhead for PubChem uses about 10 GB of RAM. If you don't have that much
memory then ask fpcat to use less memory:
fpcat --max-spool-size 1GB Compound_*.fps -o pubchem.fpb
This will use about 2 GB of RAM and the --tmpdir for the rest. (Yes, it
would be nice if I could get those two memory size numbers to match.)
The --merge option is experimental. Use it if the input fingerprints are in
popcount order, because sorted output is a simple merge sort of the
individual sorted inputs. However, this option opens all input files at the
same time, which may exceed your resource limit on file descriptors. The
current implementation also requires a lot of disk seeks so is slow for many
files.
The flush format is only available if the chemfp_converter package was
installed.