oe2fps¶
The “oe2fps” command (also available as the “chemfp oe2fps” subcommand) uses the OpenEye toolkits to generate OEGraphSim fingerprints from structure files.
This functionality is also available from Python using the high-level
chemfp.oe2fps()
function, following chemfp’s “*2fps” API.
The rest of this chapter contains the output from oe2fps --help and oe2fps --help-formats.
oe2fps command-line options¶
The following comes from oe2fps --help
:
Usage: oe2fps [OPTIONS] [FILENAMES]...
Generate fingerprints from a structure file using OEChem and OEGraphSim.
If specified, process the filenames, otherwise read from stdin.
Fingerprint types:
--path Generate path fingerprints (default).
--circular Generate circular fingerprints.
--tree Generate tree fingerprints
--maccs166 Generate 166-bit MACCS fingerprints
--substruct Generate chemfp's PubChem-like substructure
fingerprints.
--rdmaccs, --rdmaccs/2 Generate chemfp's MACCS fingerprints, version 2.
--rdmaccs/1 Generate chemfp's MACCS fingerprints, version 1.
--type TYPE_STR Specify a chemfp type string
--using FILENAME Get the fingerprint type from the metadata of a
fingerprint file
Fingerprint options:
--numbits INT Number of bits in the fingerprint (default=4096) [circular,
path, tree]
--minbonds INT Minimum number of bonds in the fingerprint (default=0)
[path, tree]
--maxbonds INT Maximum number of bonds in the fingerprint (default=4 for
tree, 5 for path) [path, tree]
--atype OPT Atom type flags, described below (default=Default)
[circular, path, tree]
--btype OPT Bond type flags, described below (default=Default)
[circular, path, tree]
--minradius INT Minimum radius for the circular fingerprint (default=0)
[circular]
--maxradius INT Maximum radius for the circular fingerprint (default=5)
[circular]
Options:
--aromaticity NAME Use the named aromaticity model (same as '-R
aromaticity=NAME')
--id-tag TAG Get the record it from the tag TAG instead
of the first line of the record.
--in FORMAT Input structure format (default guesses from
filename)
-o, --output FILENAME Save the fingerprints to FILENAME
(default=stdout)
--out FORMAT Output structure format (default guesses
from output filename, or is 'fps')
--include-metadata / --no-metadata
With --no-metadata, do not include the
header metadata for FPS output.
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--delimiter VALUE Delimiter style for SMILES and InChI files.
Forces '-R delimiter=VALUE'.
--has-header Skip the first line of a SMILES or InChI
file. Forces '-R has_header=1'.
-R NAME=VALUE Specify a reader argument
--cxsmiles / --no-cxsmiles Use --no-cxsmiles to disable the default
support for CXSMILES extensions. Forces '-R
cxsmiles=1' or '-R cxsmiles=0'.
--errors [strict|report|ignore]
How should structure parse errors be
handled? (default=ignore)
--progress / --no-progress Show a progress bar (default: show unless
the output is a terminal)
--help-formats List the available formats and reader
arguments
--version Show the version and exit.
--license-check Check the license and report results to
stdout.
--help Show this message and exit.
ATYPE is one or more of the following, separated by the '|' character
Arom AtmNum Chiral EqArom EqHBAcc EqHBDon EqHalo FCharge HCount HvyDeg
Hyb InRing
The following shorthand terms and expansions are also available:
DefaultPathAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb|EqHalo
DefaultCircularAtom = AtmNum|Arom|Chiral|FCharge|HCount|EqHalo
DefaultTreeAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb
and 'Default' selects the correct value for the specified fingerprint.
Examples:
--atype Default
--atype "Arom|AtmNum|FCharge|HCount"
--atype Arom,AtmNum,FCharge,HCount
BTYPE is one or more of the following, separated by the '|' character
Chiral InRing Order
The following shorthand terms and expansions are also available:
DefaultPathBond = Order|Chiral
DefaultCircularBond = Order
DefaultTreeBond = Order
and 'Default' selects the correct value for the specified fingerprint.
Examples:
--btype Default
--btype Order|InRing
To simplify command-line use, a comma may be used instead of a '|' to
separate different fields. Example: --atype AtmNum,HvyDegree
By default, chemfp will use the filename extension to determine the
structure file format type and possible compression. Most of the file
readers support configuration parameters. Use the '-R' option to specify
those parameters.
Use '--help-formats' to list available formats and reader parameters.
Supported oe2fps formats¶
The following comes from oe2fps --help-formats
:
These are the structure file formats that chemfp can read when using the
OEChem toolkit.
By default, chemfp uses the filename extension to determine the format type.
If the filename ends with ".gz" then it is intepreted as a gzip compressed
file, and the second-to-last extension is used to determine the format type.
Unknown or unsupported extensions are interpreted as a SMILES file.
(The OEChem structure file readers do not support Zstandard compression.)
You may instead specify the file format by name (see below), which is
especially important when reading from stdin, which has no associated filename
extension.
The supported filename extensions are:
File Type Extension(s)
========== =============
SMILES can, ism, isosmi, smi, usm
SDF mdl, rxn, sd, sdf
InChI inchi
Tripos Mol2 mol2, mol2h
PDB ent, pdb
XYZ xyz
SKC skc
Macromodel mmd, mmod
ChemDraw CDX cdx
OE binary oeb
OEB compressed oez
CIF cif
mmCIF mmcif
FASTA fasta
CSV csv
Append a '.gz' to the filename to indicate that the contents are gzip-
compressed.
The format can also be specified by name using the '--in' option:
File Type Format name
========== =============
SMILES smi, can, usm
SDF sdf
InChI inchi
Tripos Mol2 mol2, mol2h
PDB pdb
XYZ xyz
SKC skc
Macromodel mmod
ChemDraw CDX cdx
OE binary oeb
OEB compressed oez
CIF cif
mmCIF mmcif
FASTA fasta
CSV csv
Append a '.gz' to the format name to indicate that the contents are gzip-
compressed.
The input format parsers can be configured with the "-R" option. For example,
the following reader arguments tell the SMILES readers that the fields are
whitespace delimited and the first line is a header.
-R delimiter=whitespace -R has_header=true
All formats handle the following two reader arguments:
aromaticity - one of 'openeye', 'daylight', 'tripos', 'mdl', or 'mmff'
(this can also be set via the older '--aromaticity' command-line option)
flavor - a '|' or ',' separated list of flavor names, or a numeric value.
A leading '-' means to remove the given flavor. Examples include:
o Canon,Strict -- the bitwise merger of the format's Canon and Strict values
o Default,-Kekule -- the format's Default flavor but without the Kekule bits
(every flavor has a Default)
o 42 -- the specific OEChem flavor value 42
The SMILES and InChI formats also handle reader arguments for the delimiter
style and the presence of an initial header line using the following:
delimiter - one of 'to-eol' (Daylight/OEChem style), 'tab',
'whitespace', 'space', or 'native' (for the native toolkit style)
has_header - '1' if the first line contains a header, else '0'.
The SMILES formats also support the `cxsmiles` option, which describes how
handle CXSMILES extensions. The default (true) will have OEChem process the
extension as OEFormat_CXSMILES. If false the record will be parsed as
OEFormat_SMI and any extension will be treated as part of the identifier.
The supported format, default reader arguments, and input flavors are:
Format: can
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: cdx
aromaticity: openeye
flavor: Default
default flags: SuperAtom
available flags: SuperAtom
Format: cif
aromaticity: openeye
flavor: Default
default flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
RemoveQuestionMarkInLabel, Rings
available flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
RemoveQuestionMarkInLabel, Rings
Format: csv
aromaticity: openeye
flavor: Default
default flags: Header
available flags: Header
Format: cxsmi
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: fasta
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: CustomResidues, EmbeddedSMILES
Format: inchi
aromaticity: <N/A>
delimiter: to-eol
flavor: Default
no flavor flags available
has_header: 0
Format: mmcif
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: NoAltLoc
Format: mmod
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FormalCrg
Format: mol2
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: Forcefield, M2H
Format: mol2h
aromaticity: openeye
flavor: Default
default flags: M2H
available flags: M2H
Format: oeb
aromaticity: <N/A>
flavor: Default
no flavor flags available
Format: oez
aromaticity: <N/A>
flavor: Default
no flavor flags available
Format: pdb
aromaticity: openeye
flavor: Default
default flags: BondOrder, Connect, END, ENDM, FormalCrg, ImplicitH,
Rings, SecStruct
available flags: ALL, ALTLOC, BondOrder, CHARGE, Connect, DATA, END,
ENDM, FORMALCHARGE, FormalCrg, ImplicitH, RADIUS, Rings,
SecStruct, TER
Format: sdf
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FixBondMarks, SuppressEmptyMolSkip,
SuppressImp2ExpENHSTE
Format: sdf3k
aromaticity: openeye
flavor: Default
default flags: <none>
available flags: FixBondMarks, SuppressEmptyMolSkip,
SuppressImp2ExpENHSTE
Format: skc
aromaticity: openeye
flavor: Default
no flavor flags available
Format: smi
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: usm
aromaticity: openeye
delimiter: to-eol
flavor: Default
default flags: <none>
available flags: Canon, Strict
has_header: 0
Format: xyz
aromaticity: openeye
flavor: Default
default flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings
available flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings
See
https://docs.eyesopen.com/toolkits/cpp/oechemtk/molreadwrite.html#flavored-
input-and-output for documentation about the flavors for each format.