ob2fps

The “ob2fps” command (also available as the “chemfp ob2fps” subcommand) uses the Open Babel toolkit to generate Open Babel fingerprints from structure files.

This functionality is also available from Python using the high-level chemfp.ob2fps() function, following chemfp’s “*2fps” API.

The rest of this chapter contains the output from ob2fps --help and ob2fps --help-formats.

ob2fps command-line options

The following comes from ob2fps --help:

Usage: ob2fps [OPTIONS] [FILENAMES]...

  Generate fingerprints from a structure file using Open Babel.

  If specified, process the filenames, otherwise read from stdin.

Fingerprint types:
  --FP2                         Linear fragments up to 7 atoms (default)
  --FP3                         SMARTS patterns specified in the file
                                patterns.txt
  --FP4                         SMARTS patterns specified in the file
                                SMARTS_InteLigand.txt
  --MACCS, --maccs, --maccs166  Open Babel's implementation of the MACCS 166
                                keys
  --ECFP0                       ECFP (circular) fingerprints with diameter 0
  --ECFP2                       ECFP (circular) fingerprints with diameter 2
  --ECFP4                       ECFP (circular) fingerprints with diameter 4
  --ECFP6                       ECFP (circular) fingerprints with diameter 6
  --ECFP8                       ECFP (circular) fingerprints with diameter 8
  --ECFP10                      ECFP (circular) fingerprints with diameter 10
  --substruct                   chemfp's PubChem-like substructure
                                fingerprints
  --rdmaccs, --rdmaccs/2        chemfp's MACCS fingerprints, version 2.
  --rdmaccs/1                   chemfp's MACCS fingerprints, version 1
  --type TYPE_STR               Specify a chemfp type string
  --using FILENAME              Get the fingerprint type from the metadata of
                                a fingerprint file

Fingerprint options:
  --nBits INT  number of bits in the fingerprint (default=4096) [ECFP]

Options:
  --id-tag TAG                    Tag name containing the record id (SD files
                                  only)
  --in FORMAT                     Input structure format (default guesses from
                                  filename)
  -o, --output FILENAME           Save the fingerprints to FILENAME
                                  (default=stdout)
  --out FORMAT                    Output structure format (default guesses
                                  from output filename, or is 'fps')
  --include-metadata / --no-metadata
                                  With --no-metadata, do not include the
                                  header metadata for FPS output.
  --no-date                       Do not include the 'date' metadata in the
                                  output header
  --date STR                      An ISO 8601 date (like
                                  '2025-02-07T11:10:15') to use for the 'date'
                                  metadata in the output header
  --delimiter VALUE               Delimiter style for SMILES and InChI files.
                                  Forces '-R delimiter=VALUE'.
  --has-header                    Skip the first line of a SMILES or InChI
                                  file. Forces '-R has_header=1'.
  -R NAME=VALUE                   Specify a reader argument
  --cxsmiles / --no-cxsmiles      Use --no-cxsmiles to disable the default
                                  support for CXSMILES extensions. Forces '-R
                                  cxsmiles=1' or '-R cxsmiles=0'.
  --errors [strict|report|ignore]
                                  How should structure parse errors be
                                  handled? (default=ignore)
  --progress / --no-progress      Show a progress bar (default: show unless
                                  the output is a terminal)
  --help-formats                  List the available formats and reader
                                  arguments
  --version                       Show the version and exit.
  --license-check                 Check the license and report results to
                                  stdout.
  --help                          Show this message and exit.

  By default the Open Babel structure reader determines the file format and
  compression type based on the filename extension. Unknown filename
  extensions are treated as a uncompressed SMILES files.

  If the data comes from stdin, or the guess based on extension name is wrong,
  then use "--in FORMAT" option to change the default input format. For
  examples:

     --in smi    --in sdf.gz

  Use `-R` to specify format-specific reader arguments.

  Use `--help-formats` for a list of available formats and reader arguments.

Supported ob2fps formats

The following comes from ob2fps --help-formats:

These are the structure file formats that chemfp can read when using the Open
Babel toolkit.

chemfp has special support for the SMILES, InChI, and SDF formats when using
the Open Babel toolkit.

For these formats, by default, chemfp uses the filename extension to determine
the format type. If the filename ends with ".gz" or ".zst" then it is
intepreted as a gzip or Zstandard compressed file, and the second-to-last
extension is used to determine the format type. Unknown or unsupported
extensions are then tested against Open Babel format names (see below), and if
still unknown, interpreted as a SMILES file.

You will need to use "-R implementation=chemfp" to enable zst support for  the
SDF format.

You may instead specify the file format by name (see below), which is
especially important when reading from stdin, which has no associated filename
extension.

These specially supported filename extensions are:

   File Type    Extension(s)
   ==========   =============
     SMILES     can, ism, isosmi, smi, usm
      SDF       sdf
     InChI      inchi

The format can also be specified by name using the '--in' option:

   File Type    Format name (append .gz or .zst if compressed)
   ==========   ===========
     SMILES     smi, can, usm
      SDF       sdf
     InChI      inchi

The input format parsers can be configured with the "-R" option. For examples,
the following reader arguments tell the SMILES readers that the fields are
whitespace delimited and the first line is a header.

   -R delimiter=whitespace -R has_header=true

All of the readers support the 'options' reader argument, which is a string
passed directly to OBConversion(). This is a compact way to encode all of the
Open Babel parameters used in the conversion. For example, 'ab"text"', would
set option 'a' to True, and option 'b' to the string "text".

The SMILES format parsers use three additional reader arguments:

   * 'delimiter' specifies the delimiter type. The default is 'to-eol'.
     The other values are 'tab', 'whitespace', 'space' and 'native'.
     Use "-R delimiter=native" to match Open Babel's native delimiter
     style, which is 'to-eol'.
   * 'has_header', if false will skip the first line of the SMILES
     file (because it is a header line).
   * 'cxsmiles' describes how to handle CXSMILES extensions. Open
     Babel does not handle CXSMILES. The default (true) will remove
     the extension before processing. If false any extension will
     be treated as part of the identifier.

The SDF format parser supports one additional reader argument:

   * 'implementation': if "openbabel" or "native", use Open Babel's
     native SDF parser. If "chemfp" use chemfp's own implementation
     to find SDF records, which are then passed to Open Babel for
     parsing. This gives more fine-grained error reporting, and
     supports zst compression, and with similar performance.
  (Note: Open Babel supports additional options.)

The InChI format parser supports one additional reader argument:

   * 'delimiter' works the same as it does for the SMILES formats

In addition, you may specify an Open Babel formats, either by one of the
following format names, or by reading a filename ending with one of the format
names, optionally with a .gz suffix. Zstandard compression is not supported by
the native Open Babel reader.

 Format   Description and options
========= ==========================
 CONFIG   DL-POLY CONFIG
 CONTCAR  VASP format
            s  Output single bonds only
            b  Disable bonding entirely
 CONTFF   MDFF format
 HISTORY  DL-POLY HISTORY
  MDFF    MDFF format
 POSCAR   VASP format
            s  Output single bonds only
            b  Disable bonding entirely
  POSFF   MDFF format
  VASP    VASP format
            s  Output single bonds only
            b  Disable bonding entirely
 abinit   ABINIT Output Format
            s  Output single bonds only
            b  Disable bonding entirely
 acesout  ACES output format
            s  Output single bonds only
            b  Disable bonding entirely
   acr    ACR format
 adfband  ADF Band output format
 adfdftb  ADF DFTB output format
 adfout   ADF output format
            s  Output single bonds only
            b  Disable bonding entirely
   alc    Alchemy format
 aoforce  Turbomole AOFORCE output format
   arc    Accelrys/MSI Biosym/Insight II CAR format
            s  Output single bonds only
            b  Disable bonding entirely
  axsf    XCrySDen Structure Format
            s  Output single bonds only
            b  Disable bonding entirely
   bgf    MSI BGF format
   box    Dock 3.5 Box format
   bs     Ball and Stick format
 c09out   Crystal 09 output format
            s  Consider single bonds only
  c3d1    Chem3D Cartesian 1 format
  c3d2    Chem3D Cartesian 2 format
 caccrt   Cacao Cartesian format
            s  Output single bonds only
            b  Disable bonding entirely
   car    Accelrys/MSI Biosym/Insight II CAR format
            s  Output single bonds only
            b  Disable bonding entirely
 castep   CASTEP format
   ccc    CCC format
 cdjson   ChemDoodle JSON
            c  <num>  coordinate multiplier (default: 20)
   cdx    ChemDraw binary format
            m  read molecules only; no reactions
            d  output CDX tree to OBText object
  cdxml   ChemDraw CDXML format
   cif    Crystallographic Information File
            s  Output single bonds only
            b  Disable bonding entirely
            B  Use bonds listed in CIF file from _geom_bond_etc records (overrides option b)
   ck     ChemKin format
            f  <file> File with standard thermo data: default therm.dat
            z  Use standard thermo only
            L  Reactions have labels (Usually optional)
   cml    Chemical Markup Language
            2  read 2D rather than 3D coordinates if both provided
  cmlr    CML Reaction format
   cof    Culgi object file format
  crk2d   Chemical Resource Kit diagram(2D)
  crk3d   Chemical Resource Kit 3D format
   ct     ChemDraw Connection Table format
   cub    Gaussian cube format
            b  no bonds
            s  no multiple bonds
  cube    Gaussian cube format
            b  no bonds
            s  no multiple bonds
 dallog   DALTON output format
            s  Output single bonds only
 dalmol   DALTON input format
            s  Output single bonds only
            b  Disable bonding entirely
   dat    Generic Output file format
            s  Output single bonds only
            b  Disable bonding entirely
  dmol    DMol3 coordinates format
            s  Output single bonds only
            b  Disable bonding entirely
   dx     OpenDX cube format for APBS
   ent    Protein Data Bank format
            s  Output single bonds only
            b  Disable bonding entirely
            c  Ignore CONECT records
  exyz    Extended XYZ cartesian coordinates format
            s  Output single bonds only
            b  Disable bonding entirely
   fa     FASTA format
            1  Output single-stranded DNA
            t  <turns>  Use the specified number of base pairs per turn (e.g., 10)
            s  Output single bonds only
            b  Disable bonding entirely
  fasta   FASTA format
            1  Output single-stranded DNA
            t  <turns>  Use the specified number of base pairs per turn (e.g., 10)
            s  Output single bonds only
            b  Disable bonding entirely
   fch    Gaussian formatted checkpoint file format
  fchk    Gaussian formatted checkpoint file format
   fck    Gaussian formatted checkpoint file format
  feat    Feature format
            s  Output single bonds only
            b  Disable bonding entirely
 fhiaims  FHIaims XYZ format
            s  Output single bonds only
            b  Disable bonding entirely
  fract   Free Form Fractional format
            s  Output single bonds only
            b  Disable bonding entirely
   fs     Fastsearch format
            t  # Do similarity search:#mols or # as min Tanimoto
            a  Add Tanimoto coeff to title in similarity search
            l  # Maximum number of candidates. Default<4000>
            e  Exact match
               Alternative to using exact in ``-s`` parameter, see above
            n  No further SMARTS filtering after fingerprint phase
   fsa    FASTA format
            1  Output single-stranded DNA
            t  <turns>  Use the specified number of base pairs per turn (e.g., 10)
            s  Output single bonds only
            b  Disable bonding entirely
   g03    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   g09    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   g16    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   g92    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   g94    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   g98    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   gal    Gaussian Output
            s  Output single bonds only
            b  Disable bonding entirely
   gam    GAMESS Output
            s  Output single bonds only
            b  Disable bonding entirely
            c  Read multiple conformers
 gamess   GAMESS Output
            s  Output single bonds only
            b  Disable bonding entirely
            c  Read multiple conformers
  gamin   GAMESS Input
 gamout   GAMESS Output
            s  Output single bonds only
            b  Disable bonding entirely
            c  Read multiple conformers
   got    GULP format
   gpr    Ghemical format
   gro    GRO format
            s  Consider single bonds only
  gukin   GAMESS-UK Input
 gukout   GAMESS-UK Output
  gzmat   Gaussian Z-Matrix Input
            s  Output single bonds only
            b  Disable bonding entirely
   hin    HyperChem HIN format
   inp    GAMESS Input
   ins    ShelX format
            s  Output single bonds only
            b  Disable bonding entirely
   jin    Jaguar input format
            s  Output single bonds only
            b  Disable bonding entirely
  jout    Jaguar output format
            s  Output single bonds only
            b  Disable bonding entirely
   log    Generic Output file format
            s  Output single bonds only
            b  Disable bonding entirely
  lpmd    LPMD format
            s  Output single bonds only
            b  Disable bonding entirely
   mae    Maestro format
  maegz   Maestro format
  mcdl    MCDL format
  mcif    Macromolecular Crystallographic Info
   mdl    MDL MOL format
            s  determine chirality from atom parity flags
               The default setting for 2D and 3D is to ignore atom parity and
               work out the chirality based on the bond
               stereochemistry (2D) or coordinates (3D).
               For 0D the default is already to determine the chirality
               from the atom parity.
            S  do not read stereochemistry from 0D MOL files
               Open Babel supports reading and writing cis/trans
               and tetrahedral stereochemistry to 0D MOL files.
               This is an extension to the standard which you can
               turn off using this option.
            T  read title only
            P  read title and properties only
               When filtering an sdf file on title or properties
               only, avoid lengthy chemical interpretation by
               using the ``T`` or ``P`` option together with the
               :ref:`copy format <Copy_raw_text>`.
   ml2    Sybyl Mol2 format
            c  Read UCSF Dock scores saved in comments preceding molecules
  mmcif   Macromolecular Crystallographic Info
   mmd    MacroModel format
  mmod    MacroModel format
   mol    MDL MOL format
            s  determine chirality from atom parity flags
               The default setting for 2D and 3D is to ignore atom parity and
               work out the chirality based on the bond
               stereochemistry (2D) or coordinates (3D).
               For 0D the default is already to determine the chirality
               from the atom parity.
            S  do not read stereochemistry from 0D MOL files
               Open Babel supports reading and writing cis/trans
               and tetrahedral stereochemistry to 0D MOL files.
               This is an extension to the standard which you can
               turn off using this option.
            T  read title only
            P  read title and properties only
               When filtering an sdf file on title or properties
               only, avoid lengthy chemical interpretation by
               using the ``T`` or ``P`` option together with the
               :ref:`copy format <Copy_raw_text>`.
  mol2    Sybyl Mol2 format
            c  Read UCSF Dock scores saved in comments preceding molecules
  mold    Molden format
            b  no bonds
            s  no multiple bonds
 molden   Molden format
            b  no bonds
            s  no multiple bonds
  molf    Molden format
            b  no bonds
            s  no multiple bonds
   moo    MOPAC Output format
            s  Output single bonds only
            b  Disable bonding entirely
   mop    MOPAC Cartesian format
            s  Output single bonds only
            b  Disable bonding entirely
 mopcrt   MOPAC Cartesian format
            s  Output single bonds only
            b  Disable bonding entirely
  mopin   MOPAC Internal
 mopout   MOPAC Output format
            s  Output single bonds only
            b  Disable bonding entirely
   mpc    MOPAC Cartesian format
            s  Output single bonds only
            b  Disable bonding entirely
   mpo    Molpro output format
            s  Output single bonds only
            b  Disable bonding entirely
  mpqc    MPQC output format
            s  Output single bonds only
            b  Disable bonding entirely
   mrv    Chemical Markup Language
            2  read 2D rather than 3D coordinates if both provided
   msi    Accelrys/MSI Cerius II MSI format
   nwo    NWChem output format
            s  Output single bonds only
            f  Overwrite molecule if more than one
               calculation with different molecules
               is present in the output file
               (last calculation will be prefered)
            b  Disable bonding entirely
  orca    ORCA output format
            s  Output single bonds only
            b  Disable bonding entirely
   out    Generic Output file format
            s  Output single bonds only
            b  Disable bonding entirely
 outmol   DMol3 coordinates format
            s  Output single bonds only
            b  Disable bonding entirely
 output   Generic Output file format
            s  Output single bonds only
            b  Disable bonding entirely
   pc     PubChem format
 pcjson   PubChem JSON
            s  disable stereo perception and just read stereo information from input
   pcm    PCModel Format
   pdb    Protein Data Bank format
            s  Output single bonds only
            b  Disable bonding entirely
            c  Ignore CONECT records
  pdbqt   AutoDock PDBQT format
            b  Disable automatic bonding
            d  Input file is in dlg (AutoDock docking log) format
   png    PNG 2D depiction
            y  <additional chunk ID> Look also in chunks with specified ID
   pos    POS cartesian coordinates format
            s  Output single bonds only
            b  Disable bonding entirely
   pqr    PQR format
            s  Output single bonds only
            b  Disable bonding entirely
   pqs    Parallel Quantum Solutions format
  prep    Amber Prep format
  pwscf   PWscf format
  qcout   Q-Chem output format
            s  Output single bonds only
            b  Disable bonding entirely
   res    ShelX format
            s  Output single bonds only
            b  Disable bonding entirely
  rsmi    Reaction SMILES format
   rxn    MDL RXN format
   sd     MDL MOL format
            s  determine chirality from atom parity flags
               The default setting for 2D and 3D is to ignore atom parity and
               work out the chirality based on the bond
               stereochemistry (2D) or coordinates (3D).
               For 0D the default is already to determine the chirality
               from the atom parity.
            S  do not read stereochemistry from 0D MOL files
               Open Babel supports reading and writing cis/trans
               and tetrahedral stereochemistry to 0D MOL files.
               This is an extension to the standard which you can
               turn off using this option.
            T  read title only
            P  read title and properties only
               When filtering an sdf file on title or properties
               only, avoid lengthy chemical interpretation by
               using the ``T`` or ``P`` option together with the
               :ref:`copy format <Copy_raw_text>`.
 siesta   SIESTA format
 smiles   SMILES format
            a  Preserve aromaticity present in the SMILES
               This option should only be used if reading aromatic SMILES
               generated by the same version of Open Babel. Any other
               use will lead to undefined behavior. The advantage of this
               option is that it avoids aromaticity perception, thus speeding
               up reading SMILES.
            S  Clean stereochemistry
               By default, stereochemistry is accepted as given. If you wish
               to clean up stereochemistry (e.g. by removing tetrahedral
               stereochemistry where two of the substituents are identical)
               then specifying this option will reperceive stereocenters.
   smy    SMILES format using Smiley parser
   sy2    Sybyl Mol2 format
            c  Read UCSF Dock scores saved in comments preceding molecules
   t41    ADF TAPE41 format
            s  Output single bonds only
            b  Disable bonding entirely
   tdd    Thermo format
            e  Terminate on "END"
  text    Read and write raw text
  therm   Thermo format
            e  Terminate on "END"
  tmol    TurboMole Coordinate format
            s  Output single bonds only
            b  Disable bonding entirely
            a  Input in Angstroms
   txt    Title format
  txyz    Tinker XYZ format
            s  Generate single bonds only
 unixyz   UniChem XYZ format
            s  Output single bonds only
            b  Disable bonding entirely
  vmol    ViewMol format
            s  Output single bonds only
            b  Disable bonding entirely
   wln    Wiswesser Line Notation
   xml    General XML format
            n  Read objects of first namespace only
   xsf    XCrySDen Structure Format
            s  Output single bonds only
            b  Disable bonding entirely
   xyz    XYZ cartesian coordinates format
            s  Output single bonds only
            b  Disable bonding entirely
   yob    YASARA.org YOB format

You will need to consult the Open Babel documentation (see
https://openbabel.org/wiki/List_of_extensions ) and implementation for full
details about how these options work.