oe2fps

The “oe2fps” command (also available as the “chemfp oe2fps” subcommand) uses the OpenEye toolkits to generate OEGraphSim fingerprints from structure files.

This functionality is also available from Python using the high-level chemfp.oe2fps() function, following chemfp’s “*2fps” API.

The rest of this chapter contains the output from oe2fps --help and oe2fps --help-formats.

oe2fps command-line options

The following comes from oe2fps --help:

Usage: oe2fps [OPTIONS] [FILENAMES]...

  Generate fingerprints from a structure file using OEChem and OEGraphSim.

  If specified, process the filenames, otherwise read from stdin.

Fingerprint types:
  --path                  Generate path fingerprints (default).
  --circular              Generate circular fingerprints.
  --tree                  Generate tree fingerprints
  --maccs166              Generate 166-bit MACCS fingerprints
  --substruct             Generate chemfp's PubChem-like substructure
                          fingerprints.
  --rdmaccs, --rdmaccs/2  Generate chemfp's MACCS fingerprints, version 2.
  --rdmaccs/1             Generate chemfp's MACCS fingerprints, version 1.
  --type TYPE_STR         Specify a chemfp type string
  --using FILENAME        Get the fingerprint type from the metadata of a
                          fingerprint file

Fingerprint options:
  --numbits INT    Number of bits in the fingerprint (default=4096) [circular,
                   path, tree]
  --minbonds INT   Minimum number of bonds in the fingerprint (default=0)
                   [path, tree]
  --maxbonds INT   Maximum number of bonds in the fingerprint (default=4 for
                   tree, 5 for path) [path, tree]
  --atype OPT      Atom type flags, described below (default=Default)
                   [circular, path, tree]
  --btype OPT      Bond type flags, described below (default=Default)
                   [circular, path, tree]
  --minradius INT  Minimum radius for the circular fingerprint (default=0)
                   [circular]
  --maxradius INT  Maximum radius for the circular fingerprint (default=5)
                   [circular]

Options:
  --aromaticity NAME              Use the named aromaticity model (same as '-R
                                  aromaticity=NAME')
  --id-tag TAG                    Get the record it from the tag TAG instead
                                  of the first line of the record.
  --in FORMAT                     Input structure format (default guesses from
                                  filename)
  -o, --output FILENAME           Save the fingerprints to FILENAME
                                  (default=stdout)
  --out FORMAT                    Output structure format (default guesses
                                  from output filename, or is 'fps')
  --include-metadata / --no-metadata
                                  With --no-metadata, do not include the
                                  header metadata for FPS output.
  --no-date                       Do not include the 'date' metadata in the
                                  output header
  --date STR                      An ISO 8601 date (like
                                  '2025-02-07T11:10:15') to use for the 'date'
                                  metadata in the output header
  --delimiter VALUE               Delimiter style for SMILES and InChI files.
                                  Forces '-R delimiter=VALUE'.
  --has-header                    Skip the first line of a SMILES or InChI
                                  file. Forces '-R has_header=1'.
  -R NAME=VALUE                   Specify a reader argument
  --cxsmiles / --no-cxsmiles      Use --no-cxsmiles to disable the default
                                  support for CXSMILES extensions. Forces '-R
                                  cxsmiles=1' or '-R cxsmiles=0'.
  --errors [strict|report|ignore]
                                  How should structure parse errors be
                                  handled? (default=ignore)
  --progress / --no-progress      Show a progress bar (default: show unless
                                  the output is a terminal)
  --help-formats                  List the available formats and reader
                                  arguments
  --version                       Show the version and exit.
  --license-check                 Check the license and report results to
                                  stdout.
  --help                          Show this message and exit.

  ATYPE is one or more of the following, separated by the '|' character

    Arom AtmNum Chiral EqArom EqHBAcc EqHBDon EqHalo FCharge HCount HvyDeg
    Hyb InRing

  The following shorthand terms and expansions are also available:

   DefaultPathAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb|EqHalo
   DefaultCircularAtom = AtmNum|Arom|Chiral|FCharge|HCount|EqHalo
   DefaultTreeAtom = AtmNum|Arom|Chiral|FCharge|HvyDeg|Hyb

  and 'Default' selects the correct value for the specified fingerprint.

  Examples:

    --atype Default
    --atype "Arom|AtmNum|FCharge|HCount"
    --atype Arom,AtmNum,FCharge,HCount

  BTYPE is one or more of the following, separated by the '|' character

    Chiral InRing Order

  The following shorthand terms and expansions are also available:

   DefaultPathBond = Order|Chiral
   DefaultCircularBond = Order
   DefaultTreeBond = Order

  and 'Default' selects the correct value for the specified fingerprint.

  Examples:

     --btype Default
     --btype Order|InRing

  To simplify command-line use, a comma may be used instead of a '|' to
  separate different fields. Example:   --atype AtmNum,HvyDegree

  By default, chemfp will use the filename extension to determine the
  structure file format type and possible compression. Most of the file
  readers support configuration parameters. Use the '-R' option to specify
  those parameters.

  Use '--help-formats' to list available formats and reader parameters.

Supported oe2fps formats

The following comes from oe2fps --help-formats:

These are the structure file formats that chemfp can read when using the
OEChem toolkit.

By default, chemfp uses the filename extension to determine the format type.
If the filename ends with ".gz" then it is intepreted as a gzip compressed
file, and the second-to-last extension is used to determine the format type.
Unknown or unsupported extensions are interpreted as a SMILES file.

(The OEChem structure file readers do not support Zstandard compression.)

You may instead specify the file format by name (see below), which is
especially important when reading from stdin, which has no associated filename
extension.

The supported filename extensions are:

   File Type    Extension(s)
   ==========   =============
    SMILES      can, ism, isosmi, smi, usm
      SDF       mdl, rxn, sd, sdf
     InChI      inchi
  Tripos Mol2   mol2, mol2h
      PDB       ent, pdb
      XYZ       xyz
      SKC       skc
  Macromodel    mmd, mmod
 ChemDraw CDX   cdx
   OE binary    oeb
OEB compressed  oez
      CIF       cif
     mmCIF      mmcif
     FASTA      fasta
      CSV       csv

Append a '.gz' to the filename to indicate that the contents are gzip-
compressed.

The format can also be specified by name using the '--in' option:

   File Type    Format name
   ==========   =============
    SMILES      smi, can, usm
      SDF       sdf
     InChI      inchi
  Tripos Mol2   mol2, mol2h
      PDB       pdb
      XYZ       xyz
      SKC       skc
  Macromodel    mmod
 ChemDraw CDX   cdx
   OE binary    oeb
OEB compressed  oez
      CIF       cif
     mmCIF      mmcif
     FASTA      fasta
      CSV       csv

Append a '.gz' to the format name to indicate that the contents are gzip-
compressed.

The input format parsers can be configured with the "-R" option. For example,
the following reader arguments tell the SMILES readers that the fields are
whitespace delimited and the first line is a header.

   -R delimiter=whitespace -R has_header=true

All formats handle the following two reader arguments:

  aromaticity - one of 'openeye', 'daylight', 'tripos', 'mdl', or 'mmff'
      (this can also be set via the older '--aromaticity' command-line option)

  flavor - a '|' or ',' separated list of flavor names, or a numeric value.
       A leading '-' means to remove the given flavor. Examples include:
       o  Canon,Strict  -- the bitwise merger of the format's Canon and Strict values
       o  Default,-Kekule -- the format's Default flavor but without the Kekule bits
                      (every flavor has a Default)
       o  42  -- the specific OEChem flavor value 42

The SMILES and InChI formats also handle reader arguments for the delimiter
style and the presence of an initial header line using the following:

   delimiter - one of 'to-eol' (Daylight/OEChem style), 'tab',
        'whitespace', 'space', or 'native' (for the native toolkit style)

   has_header - '1' if the first line contains a header, else '0'.

The SMILES formats also support the `cxsmiles` option, which describes how
handle CXSMILES extensions. The default (true) will have OEChem process the
extension as OEFormat_CXSMILES. If false the record will be parsed as
OEFormat_SMI and any extension will be treated as part of the identifier.

The supported format, default reader arguments, and input flavors are:

Format: can
    aromaticity: openeye
    delimiter: to-eol
    flavor: Default
        default flags: <none>
        available flags: Canon, Strict
    has_header: 0

Format: cdx
    aromaticity: openeye
    flavor: Default
        default flags: SuperAtom
        available flags: SuperAtom

Format: cif
    aromaticity: openeye
    flavor: Default
        default flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
            NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
            RemoveQuestionMarkInLabel, Rings
        available flags: BondHydToClosest, BondOrder, FormalCrg, ImplicitH,
            NormalizeHydPos, OccFilterOneHalf, RemovePBCImages,
            RemoveQuestionMarkInLabel, Rings

Format: csv
    aromaticity: openeye
    flavor: Default
        default flags: Header
        available flags: Header

Format: cxsmi
    aromaticity: openeye
    delimiter: to-eol
    flavor: Default
        default flags: <none>
        available flags: Canon, Strict
    has_header: 0

Format: fasta
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: CustomResidues, EmbeddedSMILES

Format: inchi
    aromaticity: <N/A>
    delimiter: to-eol
    flavor: Default
      no flavor flags available
    has_header: 0

Format: mmcif
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: NoAltLoc

Format: mmod
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: FormalCrg

Format: mol2
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: Forcefield, M2H

Format: mol2h
    aromaticity: openeye
    flavor: Default
        default flags: M2H
        available flags: M2H

Format: oeb
    aromaticity: <N/A>
    flavor: Default
      no flavor flags available

Format: oez
    aromaticity: <N/A>
    flavor: Default
      no flavor flags available

Format: pdb
    aromaticity: openeye
    flavor: Default
        default flags: BondOrder, Connect, END, ENDM, FormalCrg, ImplicitH,
            Rings, SecStruct
        available flags: ALL, ALTLOC, BondOrder, CHARGE, Connect, DATA, END,
            ENDM, FORMALCHARGE, FormalCrg, ImplicitH, RADIUS, Rings,
            SecStruct, TER

Format: sdf
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: FixBondMarks, SuppressEmptyMolSkip,
            SuppressImp2ExpENHSTE

Format: sdf3k
    aromaticity: openeye
    flavor: Default
        default flags: <none>
        available flags: FixBondMarks, SuppressEmptyMolSkip,
            SuppressImp2ExpENHSTE

Format: skc
    aromaticity: openeye
    flavor: Default
      no flavor flags available

Format: smi
    aromaticity: openeye
    delimiter: to-eol
    flavor: Default
        default flags: <none>
        available flags: Canon, Strict
    has_header: 0

Format: usm
    aromaticity: openeye
    delimiter: to-eol
    flavor: Default
        default flags: <none>
        available flags: Canon, Strict
    has_header: 0

Format: xyz
    aromaticity: openeye
    flavor: Default
        default flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings
        available flags: BondOrder, Connect, FormalCrg, ImplicitH, Rings

See
https://docs.eyesopen.com/toolkits/cpp/oechemtk/molreadwrite.html#flavored-
input-and-output for documentation about the flavors for each format.