chemfp rdkit2fpc

The “chemfp rdkit2fpc” command-line tool uses RDKit to generate sparse count fingerprints from the input structures. The input formats and fingerprint type command-line arguments are shared with rdkit2fps.

The output is in FPC format.

The rest of this chapter contains the output from chemfp rdkit2fpc --help. Use rdkit2fps --help-formats for details about the supported input formats.

chemfp rdkit2fpc command-line options

The following comes from chemfp rdkit2fpc --help:

Usage: chemfp rdkit2fpc [OPTIONS] [FILENAMES]...

  Generate count fingerprints from a structure file using RDKit.

  If specified, process the filenames, otherwise read from stdin.

Options:
  --RDK, --RDK/3                  Generate RDK/3 count fingerprints (default).
  --morgan1                       Generate MorganCount/2 fingerprints
                                  (radius=1).
  --morgan2                       Generate MorganCount/2 count fingerprints
                                  (radius=2).
  --morgan, --morgan3             Generate MorganCount/2 fingerprints
                                  (radius=3).
  --morgan4                       Generate MorganCount/2 fingerprints
                                  (radius=4).
  --torsion, --torsions, --torsion/4
                                  Generate Topological TorsionCount/4
                                  fingerprints.
  --pair, --pairs                 Generate AtomPairCount/3 fingerprints.
  --type TYPE_STR                 Specify a chemfp type string
  --using FILENAME                Get the fingerprint type from the metadata
                                  of a fingerprint file
  --minPath INT                   Minimum number of bonds to include in the
                                  subgraph (default=1) [RDKitCount/3]
  --maxPath INT                   Maximum number of bonds to include in the
                                  subgraph (default=7) [RDKitCount/3]
  --numBitsPerFeature INT         Number of bits to set per path (default=2)
                                  [RDKitCount/3]
  --useHs 0|1                     Include information about the number of
                                  hydrogens on each atom (default=1)
                                  [RDKitCount/3]
  --branchedPaths 0|1             If 1, both branched and unbranched paths
                                  will be used in the fingerprint (default=1)
                                  [RDKitCount/3]
  --useBondOrder 0|1              If 1, both bond orders will be used in the
                                  path hashes (default=1) [RDKitCount/3]
  --radius INT                    Radius for the Morgan algorithm (default=3)
                                  [MorganCount/2]
  --useFeatures 0|1               if 1, use chemical-feature invariants
                                  (default=0) [MorganCount/2]
  --includeChirality 0|1          include chirality information
                                  [AtomPairCount/3, MorganCount/2,
                                  TorsionCount/4]
  --useBondTypes 0|1              if 1, include bond type information
                                  (default=1) [MorganCount/2]
  --includeRingMembership 0|1     if 1, include ring membership in the atom
                                  invariants (default=1) [MorganCount/2]
  --includeRedundantEnvironments 0|1
                                  if 1, include redundant environments in the
                                  fingerprint (default=0) [MorganCount/2]
  --minDistance INT               minimum bond distance for two atoms to be
                                  considered a pair (default=1)
                                  [AtomPairCount/3]
  --maxDistance INT               maximum bond distance for two atoms to be
                                  considered a pair (default=30)
                                  [AtomPairCount/3]
  --use2D 0|1                     If 1, use 2D instead of 3D distance matrix
                                  (default=1) [AtomPairCount/3]
  --torsionAtomCount INT          the number of atoms to include in the
                                  'torsions' (default=4) [TorsionCount/4]
  --onlyShortestPaths 0|1         if 1, only include the shortest paths
                                  between the start and end atoms, not all
                                  paths (default=0) [TorsionCount/4]
  --id-tag TAG                    Tag name containing the record id (SD files
                                  only)
  --delimiter VALUE               Delimiter style for SMILES and InChI files.
                                  Forces '-R delimiter=VALUE'.
  --has-header                    Skip the first line of a SMILES or InChI
                                  file. Forces '-R has_header=1'.
  -R NAME=VALUE                   Specify a reader argument
  --cxsmiles / --no-cxsmiles      Use --no-cxsmiles to disable the default
                                  support for CXSMILES extensions. Forces '-R
                                  cxsmiles=1' or '-R cxsmiles=0'.
  --in FORMAT                     Input structure format (default guesses from
                                  filename)
  -o, --output FILENAME           Save the fingerprints to FILENAME
                                  (default=stdout)
  --out FORMAT                    Output structure format (default guesses
                                  from output filename, or is 'fpc')
  --include-metadata / --no-metadata
                                  With --no-metadata, do not include the
                                  header metadata for FPC output.
  --no-date                       Do not include the 'date' metadata in the
                                  output header
  --date STR                      An ISO 8601 date (like
                                  '2025-02-07T11:10:15') to use for the 'date'
                                  metadata in the output header
  --errors [strict|report|ignore]
                                  How should structure parse errors be
                                  handled? (default=ignore)
  --progress / --no-progress      Show a progress bar (default: show unless
                                  the output is a terminal)
  --help-formats                  List the available formats and reader
                                  arguments
  --help                          Show this message and exit.

  This program guesses the input structure format and the compression based on
  the filename extension. If the guess fails then it assumes the input is an
  uncompressed SMILES file.

  If the data comes from stdin, or the guess based on extension name is wrong,
  then use "--in" to change the default input format.

  Use the '-R' reader arguments option to pass in format-specific structure
  reader arguments. The details depend on the specific format.

  Use the command-line option `--help-formats` to display a list of available
  formats and reader arguments.