chemfp rdkit2fpc¶
The “chemfp rdkit2fpc” command-line tool uses RDKit to generate sparse count fingerprints from the input structures. The input formats and fingerprint type command-line arguments are shared with rdkit2fps.
The output is in FPC format.
The rest of this chapter contains the output from chemfp rdkit2fpc --help. Use rdkit2fps --help-formats for details about the supported input formats.
chemfp rdkit2fpc command-line options¶
The following comes from chemfp rdkit2fpc --help
:
Usage: chemfp rdkit2fpc [OPTIONS] [FILENAMES]...
Generate count fingerprints from a structure file using RDKit.
If specified, process the filenames, otherwise read from stdin.
Options:
--RDK, --RDK/3 Generate RDK/3 count fingerprints (default).
--morgan1 Generate MorganCount/2 fingerprints
(radius=1).
--morgan2 Generate MorganCount/2 count fingerprints
(radius=2).
--morgan, --morgan3 Generate MorganCount/2 fingerprints
(radius=3).
--morgan4 Generate MorganCount/2 fingerprints
(radius=4).
--torsion, --torsions, --torsion/4
Generate Topological TorsionCount/4
fingerprints.
--pair, --pairs Generate AtomPairCount/3 fingerprints.
--type TYPE_STR Specify a chemfp type string
--using FILENAME Get the fingerprint type from the metadata
of a fingerprint file
--minPath INT Minimum number of bonds to include in the
subgraph (default=1) [RDKitCount/3]
--maxPath INT Maximum number of bonds to include in the
subgraph (default=7) [RDKitCount/3]
--numBitsPerFeature INT Number of bits to set per path (default=2)
[RDKitCount/3]
--useHs 0|1 Include information about the number of
hydrogens on each atom (default=1)
[RDKitCount/3]
--branchedPaths 0|1 If 1, both branched and unbranched paths
will be used in the fingerprint (default=1)
[RDKitCount/3]
--useBondOrder 0|1 If 1, both bond orders will be used in the
path hashes (default=1) [RDKitCount/3]
--radius INT Radius for the Morgan algorithm (default=3)
[MorganCount/2]
--useFeatures 0|1 if 1, use chemical-feature invariants
(default=0) [MorganCount/2]
--includeChirality 0|1 include chirality information
[AtomPairCount/3, MorganCount/2,
TorsionCount/4]
--useBondTypes 0|1 if 1, include bond type information
(default=1) [MorganCount/2]
--includeRingMembership 0|1 if 1, include ring membership in the atom
invariants (default=1) [MorganCount/2]
--includeRedundantEnvironments 0|1
if 1, include redundant environments in the
fingerprint (default=0) [MorganCount/2]
--minDistance INT minimum bond distance for two atoms to be
considered a pair (default=1)
[AtomPairCount/3]
--maxDistance INT maximum bond distance for two atoms to be
considered a pair (default=30)
[AtomPairCount/3]
--use2D 0|1 If 1, use 2D instead of 3D distance matrix
(default=1) [AtomPairCount/3]
--torsionAtomCount INT the number of atoms to include in the
'torsions' (default=4) [TorsionCount/4]
--onlyShortestPaths 0|1 if 1, only include the shortest paths
between the start and end atoms, not all
paths (default=0) [TorsionCount/4]
--id-tag TAG Tag name containing the record id (SD files
only)
--delimiter VALUE Delimiter style for SMILES and InChI files.
Forces '-R delimiter=VALUE'.
--has-header Skip the first line of a SMILES or InChI
file. Forces '-R has_header=1'.
-R NAME=VALUE Specify a reader argument
--cxsmiles / --no-cxsmiles Use --no-cxsmiles to disable the default
support for CXSMILES extensions. Forces '-R
cxsmiles=1' or '-R cxsmiles=0'.
--in FORMAT Input structure format (default guesses from
filename)
-o, --output FILENAME Save the fingerprints to FILENAME
(default=stdout)
--out FORMAT Output structure format (default guesses
from output filename, or is 'fpc')
--include-metadata / --no-metadata
With --no-metadata, do not include the
header metadata for FPC output.
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--errors [strict|report|ignore]
How should structure parse errors be
handled? (default=ignore)
--progress / --no-progress Show a progress bar (default: show unless
the output is a terminal)
--help-formats List the available formats and reader
arguments
--help Show this message and exit.
This program guesses the input structure format and the compression based on
the filename extension. If the guess fails then it assumes the input is an
uncompressed SMILES file.
If the data comes from stdin, or the guess based on extension name is wrong,
then use "--in" to change the default input format.
Use the '-R' reader arguments option to pass in format-specific structure
reader arguments. The details depend on the specific format.
Use the command-line option `--help-formats` to display a list of available
formats and reader arguments.