fpcat

The “fpcat” command-line tool (also available as “chemfp fpcat”) concatenates one or more fingerprint files. It is often used to convert between FPS and FPB formats.

There is no equivalent high-level function in the Python API. Instead, use chemfp.open() and chemfp.open_fingerprint_writer(), like:

with open("input.fps") as reader:
  with open("output.fpb") as writer:
    writer.write_fingerprints(reader)

The rest of this chapter contains the output from fpcat --help.

fpcat command-line options

The following comes from fpcat --help:

Usage: fpcat [OPTIONS] FILENAME

  Combine multiple fingerprint files into a single file.

Options:
  --in FORMAT                     Input fingerprint format. One of fps or fpb
                                  (with optional gz or zst compression), or
                                  flush. (default guesses from filename or is
                                  fps)
  --merge                         Assume the input fingerprint files are in
                                  popcount order and do a merge sort.
  -o, --output FILENAME           Save the fingerprints to FILENAME
                                  (default=stdout)
  --out FORMAT                    Output format, one of 'fps', 'fps.gz',
                                  'fps.zst', 'fpb', or 'flush' (default
                                  guesses from output filename, or is 'fps')
  --include-metadata / --no-metadata
                                  With --no-metadata, do not include the
                                  header metadata for FPS output.
  --no-date                       Do not include the 'date' metadata in the
                                  output header
  --date STR                      An ISO 8601 date (like
                                  '2025-02-07T11:10:15') to use for the 'date'
                                  metadata in the output header
  --level LEVEL                   Compression level. Must be a positive
                                  integer or one of 'min', 'default', or
                                  'max'.
  --reorder                       Reorder the output fingerprints by popcount.
                                  (default for FPB output)
  --preserve-order                Save the output fingerprints in the same
                                  order as the input. (default for FPS output)
  --alignment [1|2|4|8|16|32|64|128|256]
                                  Alignment size when saving a FPB file.
                                  (default=8)
  --show-progress                 Show progress.
  --max-spool-size SIZE           Use temporary files for extra storage space
                                  for huge FPB files (default uses RAM).
  --tmpdir DIRNAME                Directory for the temporary files (default
                                  uses the system temp directory).
  --version                       Show the version and exit.
  --license-check                 Check the license and report results to
                                  stdout.
  --license-file FILENAME         Specify a chemfp license file
  --traceback                     Print the traceback on KeyboardInterrupt
  --version                       Show the version and exit.
  --help                          Show this message and exit.

  Examples:

  fpcat can be used to convert between FPS and FPB formats. This is handy if
  you want to see what's inside of an FPB file:

      fpcat fingerprints.fpb

  You can use also use fpcat to make an FPB file from an FPS file:

      fpcat fingerprints.fps -o fingerprints.fpb

  You might have generated a set of FPS file which you want to merge into a
  single FPB. (For example, you might have used GNU parallel to generate FPS
  files for each of the PubChem files, which you want to merge into a single
  file.):

      fpcat Compound_*.fps -o pubchem.fpb

  By default the FPB format sorts the fingerprints by popcount. (Use
  --preserve-order if you really want to preserve the input order.)  The sort
  overhead for PubChem uses about 10 GB of RAM. If you don't have that much
  memory then ask fpcat to use less memory:

      fpcat --max-spool-size 1GB Compound_*.fps -o pubchem.fpb

  This will use about 2 GB of RAM and the --tmpdir for the rest. (Yes, it
  would be nice if I could get those two memory size numbers to match.)

  The --merge option is experimental. Use it if the input fingerprints are in
  popcount order, because sorted output is a simple merge sort of the
  individual sorted inputs. However, this option opens all input files at the
  same time, which may exceed your resource limit on file descriptors. The
  current implementation also requires a lot of disk seeks so is slow for many
  files.

  The flush format is only available if the chemfp_converter package was
  installed.