Typical Scenarios Where You Need OakLeaf Systems' Online Code of Federal Regulations XML Web Services

Note: You can access each of the preceding sections from a list on the HELP page. The help page has about 20 links to CFR sections of general interest.


About the U.S. Code of Federal Regulations

"The Code of Federal Regulations (CFR) is a codification of the general and permanent rules published in the Federal Register by the Executive departments and agencies of the Federal Government. The CFR online is a joint project authorized by the publisher, the National Archives and Records Administration's Office of the Federal Register, and the Government Printing Office (GPO) to provide the public with enhanced access to Government information. GPO will continue to make the paper editions of the CFR and Federal Register available through its Superintendent of Documents Sales service."

"The CFR is divided into 50 titles which represent broad areas subject to Federal regulation. Each title is divided into chapters which usually bear the name of the issuing agency. Each chapter is further subdivided into parts covering specific regulatory areas. Large parts may be subdivided into subparts. All parts are organized in sections, and most citations to the CFR will be provided at the section level." (The preceding is from NARA's About the CFR page.)

Note: The Government Printing Office (GPO) provides WAIS access to the CFR and its sections. The WAIS database is quite slow; downloading the text of a CFR section as text or HTML often requires several minutes. The GPO has not provided an explanation for the performance problems, despite requests. WAIS access to sections of the GPO doesn't appear to be practical at this time.

About the Beta Version of the Electronic CFR Database (eCFR Beta)

"The Electronic Code of Federal Regulations (e-CFR) is a prototype of a currently updated version of the Code of Federal Regulations (CFR). The e-CFR prototype is a demonstration project. It is not an official legal edition of the CFR.The e-CFR prototype is authorized and maintained by the National Archives and Records Administration's (NARA) Office of the Federal Register (OFR) and the Government Printing Office (GPO)."

"The e-CFR consists of two linked databases: the "current Code" and "amendment files." The OFR updates the current Code database according to the effective dates of amendments published in the Federal Register. As amendments become effective, the OFR integrates the changes into the current Code database to display the full text of the currently updated CFR. For future-effective amendments, the OFR inserts hypertext links into the affected sections or parts of the current Code to take users to the pertinent amendment files. The amendment files contain amendatory instructions, the text of amendments (if any) and their effective dates. If the effective date of a regulation falls on a weekend or federal holiday, the amendments will be integrated into the current Code on the next federal business day."

"Most users will prefer to use the HTML option for viewing and downloading e-CFR material. For the convenience of users, e-CFR material is also available in SGML. The SGML option enables users to view and download fully tagged data files, which can be repurposed in other documents and applications. We do not recommend viewing files in SGML since they do not display well on a browser, and do not link to the amendment files." (The preceding is from the GPO's Important Information for the User page.)

Note: Retrieval of the text of a section from the primary e-CFR database (ecfr) requires an average of about 6 to 10 seconds with a fast Internet connection. When the GPO switches to the backup database (ecfrback), retrieval time often increases to two minutes or more.

Structure of the U.S. Code of Federal Regulations

The CFR is organized in a hierarchy of titles, subtitles, chapters, subchapters, parts, subparts, sections/appendices. Sections contain the text of a regulation; appendices to parts contain explanatory matter. Only a few titles have subtitles, and not all chapters have subchapters. Some subtitles contain parts but no chapters. Roman numerals identify most (but not all) chapters. These inconsitencies add to the difficulty of constructing individual tables of contents (TOCs) as well-formed XML documents. TOCs are the primary method for navigating the OakLeaf CFR database.

The SGML text retrieved from the e-CFR database has a header to identify the title/subtitle, chapter/subchapter, part, and section or appendix. (See the sample SGML text in the About page.) The text of the section is unstructured (flat) SGML; that is, there is no hierarchy of paragraphs (a), sub-paragraphs (1), sub-sub-paragraphs (i), and sub-sub-sub-paragraphs (A). In many cases, paragraph levels are concatenated, as in (a)(1) and many sections have missing levels, for instance, (a)...(i) or (1)...(A). Most exhibits and extracts have structure based on headings, so these elements are converted by a process similar to that for section text.

The section text also contains elements such as <EAR>, <EXAMPLE>, and <FTNT>. These elements are collected in an EndNotes XML element at the end of the page. Tables are provided from the e-CFR database as pre-formatted HTML text (<PRE>...</PRE> and in GPO format for typesetting. This version of the transformation software doesn't generate the XML for tables. Instead, a link is provided to the e-CFR URL for the SGML text of the section. A similar link is provided if the code detects that some content might be missing. (The link returns SGML so you can view the typesetting code for GPO tables.) A future version of the software might attempt to transform the GPO table data to an XML structure and then to an XHTML table.

The printed version of the 2001 CFR, which is available from the Government Printing Office, has 204 volumes and costs $1,094 for a one-year subscription. The subscription includes back issues of volumes and updates to the volumes as they are printed during the year.

The OakLeaf Demonstration Version of the Electronic CFR Database

The 840-MB OakLeaf CFR SQL Server 2000 development database runs under Windows 2000 Server (Standard Edition) on a 866-MHZ Pentium III computer with 512 MB of RAM and a 20-GB UDMA/100 drive. IIS and the components run on a 667-MHz Pentium III box with 512 MB of RAM connected to the Internet by a DSL connection having a maximum upload speed of 128 kbps. Internet performance is limited by the speed of the site's DSL connection, not the hardware.

Full-text search of the XML TOCs and section text is supported by separate full-text indexes of the TOCs (9,179 entries, 700,203 usable keywords, 15 MB) and section text (172,801 entries, 869,105 usable keywords, 107 MB). Searches can be limited to a single title and TOC searches can be limited to any level of the TOC hierarchy.