Stanford University Libraries

Chemical Literature (Chem 184/284)
University of California at Santa Barbara

Lecture 14: Chemical Abstracts Registry File, Part 1

Searching for Chemical Substances by Registry Number

  • In the CA file, searching by Registry Number (RN) is the best way to obtain documents dealing with specific chemical substances.
  • You may obtain RN’s from a variety of sources and search them directly in the CA File — RN’s are listed in the Basic Index in the CA File.
  • However, note that Registry Numbers sometimes becomes obsolete, are deleted or replaced.
  • Deleted and Replaced Registry Numbers remain associated with the CA records they started with.
  • So, searching directly in the CA file with a given RN may miss some relevant records.

Crossover from the Registry File

  • The online Registry file contains records for each chemical substance, including both the current and deleted or replaced Registry Numbers.
  • Therefore, the best way to search for individual substances is to locate them in the Registry File and cross them over to the CA File. Fortunately, crossover is easy.
  • Crossover can be as simple as three commands:
    => s aspirin/cn
    L1      1 ASPIRIN/CN
    => file ca
    => s L1
    L2      12850  L1
    
  • All “aspirin” RN’s are automatically searched.

Registry File Records

  • A Registry File record is created for each distinctive substance — compounds, mixtures, ions, alloys, etc.
  • Variations in stereochemistry (including undefined stereochemistry) and isotopic composition get their own Registry record as well.
  • All Registry File records contain the Registry Number and systematic chemical name of the substance.
  • Where applicable, it will also contain synonyms for the chemical name, molecular formula, and the chemical structure.
  • Compounds with rings will have ring system data, like that found in the Ring System Handbook.
  • Certain substances, such as alloys, polymers, proteins and nucleic acids have additional information appropriate to their type.
  • The record also lists all the STN files in which the Registry Number is listed, and how many times it appears in the CA, CAplus, and CAOLD files.
  • See the Registry file summary sheet at http://info.cas.org/ONLINE/DBSS/registryss.html for examples of Registry file records and a list of the searchable and displayable fields
  • See Using the CAS Registry File on STN at http://www.cas.org/ACAD2/cover.html for more information.

Searching the Registry File:

Chemical Names
  • The Basic Index of the Registry File consists of chemical name fragments from the Index Name (/in) and other chemical names (/cn), plus molecular formulas and Registry Numbers.
  • Proximity operators and right-hand truncation may be used in the Basic Index.
  • Chemical Name Fragmentation Example
    dicyclopenta[def,pqr]tetraphenylene-1,8-dione
    breaks into:
    • dicyclopenta (plus di, cyclo, penta, dicyclo, cyclopenta)
    • [def,pqr]
    • tetraphenylene
    • 1,8 (plus 1 and 8)
    • dione (plus di and one)
  • Trade names and the like are usually not fragmented (except at spaces and punctuation).
  • Combinations of name fragments (using the (L) operator) can be very powerful for searching if you are not certain of the full name of a compound, but can make an educated guess about its nomenclature. Use with molecular formula (/MF) searching.
Chemical Name Field (/cn)
  • This index contains the complete chemical name from the IN field as well as all the synonyms in the record (e.g. trade names, common names and old CA index names.)
  • Note that not all possible synonyms are listed, and many compounds are listed only under their systematic names.
  • Remember that stereoisomers, mixtures, etc. each get their own Registry Number.
  • EXPAND is very useful in /cn searches; it lets you check the form of a name without wasting money on incorrect search terms.
  • Remember that the E#’s appear in strict left to right alphanumeric order (and that numbers come before letters, and spaces and other punctuation before either.)
  • Examples:
    => e benzene, 1,2-dimethyl-/cn
    
    E1           1     BENZENE, 1,2-DIMETHYL(OCTADECYLOXY)-/CN
    E2           1     BENZENE, 1,2-DIMETHYL(PHENYLETHYL)-/CN
    E3           1 --> BENZENE, 1,2-DIMETHYL-/CN
    E4           1     BENZENE, 1,2-DIMETHYL-, ALUMINUM COMPLEX/CN
    E5           1     BENZENE, 1,2-DIMETHYL-, ALUMINUM-TITANIUM COMPLEX/CN
    E6          14     BENZENE, 1,2-DIMETHYL-, BORON COMPLEX/CN
    E7           2     BENZENE, 1,2-DIMETHYL-, BORON-IRON-TUNGSTEN COMPLEX/CN
    E8           1     BENZENE, 1,2-DIMETHYL-, BORON-MOLYBDENUM-TUNGSTEN COMP
                       LEX/CN
    E9           3     BENZENE, 1,2-DIMETHYL-, BORON-RHODIUM COMPLEX/CN
    E10          1     BENZENE, 1,2-DIMETHYL-, CADMIUM COMPLEX/CN
    E11          1     BENZENE, 1,2-DIMETHYL-, CESIUM COMPLEX/CN
    
    => e “uridine, 5'-amino”/cn
    
    E1           1     URIDINE, 5'-ACETATE CYCLIC 2',3'-PHOSPHATE/CN
    E2           1     URIDINE, 5'-ACRYLATE/CN
    E3           0 --> URIDINE, 5'-AMINO/CN
    E4           1     URIDINE, 5'-AMINO-2',3',5'-TRIDEOXY-3'-FLUORO-/CN
    E5           1     URIDINE, 5'-AMINO-2',3',5'-TRIDEOXY-5-ETHYL-/CN
    E6           1     URIDINE, 5'-AMINO-2',3'-DIDEHYDRO-2',3',5'-TRIDEOXY-/C
                       N
    E7           1     URIDINE, 5'-AMINO-2',3'-DIDEHYDRO-2',3',5'-TRIDEOXY-, 
                       MONOACETATE (SALT)/CN
    E8           1     URIDINE, 5'-AMINO-2',5'-DIDEOXY-/CN
    E9           1     URIDINE, 5'-AMINO-2',5'-DIDEOXY-, MERCURY COMPLEX/CN
    E10          1     URIDINE, 5'-AMINO-2',5'-DIDEOXY-, PLATINUM COMPLEX/CN
    E11          1     URIDINE, 5'-AMINO-2',5'-DIDEOXY-5-(TRIMETHYLSILYL)-/CN
    
    => e hexane, 2,2,5,5-tetramethyl-/cn
    
    E1           1     HEXANE, 2,2,5,5-TETRAKIS(METHYLSELENO)-/CN
    E2           1     HEXANE, 2,2,5,5-TETRAKIS(P-(2,3-EPOXYPROPOXY)PHENYL)-/
                       CN
    E3           1 --> HEXANE, 2,2,5,5-TETRAMETHYL-/CN
    E4           1     HEXANE, 2,2,5,5-TETRAMETHYL-3,4-BIS(METHYLENE)-/CN
    E5           1     HEXANE, 2,2,5,5-TETRAMETHYL-3,4-BIS(METHYLENE)-, IRON 
                       COMPLEX/CN
    E6           1     HEXANE, 2,2,5,5-TETRAMETHYL-3,4-DIPHENYL-3,4-DI-1-PYRE
                       NYL-/CN
    E7           1     HEXANE, 2,2,5,5-TETRAMETHYL-3-METHYLENE-4-(NITROMETHYL
                       ENE)-, RADICAL ION(1-)/CN
    E8           1     HEXANE, 2,2,5,5-TETRAMETHYL-3-PHENYL-/CN
    E9           1     HEXANE, 2,2,5-TRIMETHYL-/CN
    E10          1     HEXANE, 2,2,5-TRIMETHYL-, LITHIUM COMPLEX/CN
    E11          1     HEXANE, 2,2,5-TRIMETHYL-3,4-BIS(METHYLENE)-/CN
    
  • Note that names must be searched EXACTLY as they appear in the record, including punctuation, Greek letters, etc.
  • In general, when searching names with complex punctuation, it is best to put them in quotation marks.
    => s “benzo(a)thiophene”/cn
    
  • You can truncate names, just like an exact title search in MELVYL. This can be very effective in retrieving multiple variations on a compound.
  • Remember, however, that there is an upper limit on the number of terms retrievable with truncation.
Heading Parent Field (/hp)
  • You can search the “parent” compound part of a chemical name separately in the /hp field. This has the advantage of not requiring truncation to find a whole family of compounds.
  • Remember that the choice of a heading parent can be tricky. Benzeneamine is the heading parent for anilines, not Benzene. Benzoic acid is the heading parent for benzene with a carboxylic acid group attached. So which is the heading parent for 4-amino-benzoic acid? Answer: Benzoic acid, but there’s no intuitive way to know that.
  • You can combine /hp searches with chemical name fragments using the (L) operator
  • An example of an EXPAND in the Heading Parent field:
    => e 1,3,4-oxadiazole/hp
    
    E1        1199     1,3,4-OXADIAZOL-2-AMINE/HP
    E2          47     1,3,4-OXADIAZOL-2-OL/HP
    E3        4098 --> 1,3,4-OXADIAZOLE/HP
    E4        1680     1,3,4-OXADIAZOLE-2(3H)-THIONE/HP
    E5           1     1,3,4-OXADIAZOLE-2(5H)-THIONE/HP
    E6           3     1,3,4-OXADIAZOLE-2,3(2H)-DICARBOXYLIC ACID/HP
    E7           1     1,3,4-OXADIAZOLE-2,5-D2/HP
    E8           1     1,3,4-OXADIAZOLE-2,5-DIACETIC ACID/HP
    E9           1     1,3,4-OXADIAZOLE-2,5-DIACETONITRILE/HP
    E10         51     1,3,4-OXADIAZOLE-2,5-DIAMINE/HP
    E11          1     1,3,4-OXADIAZOLE-2,5-DIBUTANOIC ACID/HP
    
Chemical Name Segment Field (/cns)
  • This field lists only the “basic” name segments. Its advantage is that you can use left-hand truncation in this field.
  • This can be very powerful for families of chemicals with similar roots in their names, e.g.
    => s ?cillin?/cns
    => s ?sterone?/cns
    => s ?porph?/cns
    
Chemical Formula Searching
  • The REGISTRY file makes chemical formula information searchable in a number of different ways.
  • Remember that the rules of the print Chemical Substance Index for molecular formulas can affect searches here too. Example: Copper (II) nitrate is Cu.2HNO3
  • The main formula field is Molecular Formula (/mf)
  • Formulas are written in Hill order, just as in the printed Molecular Formula index.
  • Remember that many substances can have the same MF, so MF searches are usually combined with some other search terms (e.g. name fragments) to narrow the search.
  • Multicomponent substances are written with a period separating the component formulas (see copper nitrate above.)
  • Polymers are written as, for example, (C2H4)x or (C2H4)n, depending on the case.
  • Copolymers combine the above, e.g. (C4H6)x.(C8H8)x for styrene-butadiene copolymers.
Element Symbol (/els)
  • You can search for the presence of a particular element (in any quantity) by searching the element symbol in the /els field. e.g.
    => s si/els and n/els
    
  • You can search for a number of atoms of a given element by searching the number followed by a slash and the symbol for the desired element, e.g.
    => s 6/c and 6/o
    
  • This is a numeric field, so it is range-searchable, e.g. si>=2.
  • You may search for all halogens (with X) or all metals (with M).
Atom Count (/atc) and Element Count (/elc)
  • With these two fields, you can specify the total number of atoms in the molecular formula (/atc) and the total number of different elements (/elc)
  • Both are numerically searchable.
  • These can get tricky with multi-component substances.
Element Formula (/elf)
  • This field allows you to search for a particular combination of elements without specifying the stoichiometry.
  • This can be very handy for substances like alloys or ceramic superconductors, where a given family of compounds of interest may have widely varying ratios of constituent elements and fractional stoichiometries
    => s ba cu la o/elf
    
Ring System Data
  • All the data given in the print Ring System Handbook is searchable, including number of rings, size of smallest ring, formula of rings, etc.
  • See the Registry file summary sheet for a list of fields.
  • These can be very useful in finding classes of compounds, or in limiting other searches.
Compound Class Identifier (/ci)
  • The Registry file adds a Compound Class identifier to let you home in on or exclude certain broad classes of compounds, e.g. alloys (ays), polymers (pms), mixtures (mxs).
Component Registry Number (/crn)
  • Remember that each distinct substance gets its own registry number; this includes mixtures and salts.
  • You can search for a substance as part of a mixture by searching its RN in the /crn field
    => s 51-34-7/crn
    
File Crossover
  • As mentioned above, you can search a Registry File answer set (L#) directly in the CA file to find all articles dealing with the set of substances.
  • Role identifiers can be used with crossover L#’s
    => s aspirin/cn
    L1      1  ASPIRIN
    => file ca
    => s L1/spn
    L2      145 L1 (L)  SPN/RL
    
  • In addition, you may crossover Registry answer sets into any file listed in the Locator Code (/lc) field of the Registry Record.
  • Note that some non-CA files assign Registry Numbers by an algorithm and so are not always completely accurate.

This page created by Chuck Huber (huber@library.ucsb.edu).