How to Use the OakLeaf CFR Full-Text Search Feature

The search page provides default values for all required entries, including a search term that retrieves a typical mix of sections, parts, and chapters. Click Search Now to return a list of hits with the default options. Elements are displayed in reverse hierarchical order: sections, parts, subchapters, chapters, subtitles, and titles. Relevence ranking isn't implemented. Highlighted search terms in the XHTML document might be added to a future version, but doing so requires maintaining session state.

To perform a custom search, modify the following values:

Search Tip: Restricting searches to a single relevent title greatly reduces the response time, especially when you include section text in the search. Typical times to generate the XHTML for a 50-item search against a single title is less than half a second on the development hardware.


Technical Descripiton of the Search Feature

The Search form uses HTTP GET (not POST) to make the query visible in the Address bar. The URL for the default query modified to search in title 47 only is: http://localhost/cfr/results.asp?Term=satellite+NEAR+broadcast&InTOC=on&InSect=on&Title=47+-+Telecommunication&Level=Parts&MaxHits=50.

The search component translates the form data to multiple Transact-SQL SELECT statements. To conserve table space, the database stores TOCs and section text XML documents having a length less than 7,000 characters in varchar fields. Larger documents are stored in text fields. (The CFR is in English so Unicode isn't required for the XML text.) Thus, a UNION query is required to combine the results from the SectText (text) and SectChar varchar(7000) fields, and the corresponding TOC fields. Storing all XML text in text fields would result in a 3-GB database instead of the curren size of less than 1 GB. For additional information on SQL Server 2000's full-text search features, read the "Run Full-Text Database Searches on a Shoestring" article from the December 19, 2001 issue of Fawcette Technical Publications' .NETInsight newsletter.

Following are are the first two of the six SQL statements required to return hits for section text, part TOCs, subchapter TOCs, chapter TOCs, subtitle TOCs, and title TOCs.

SELECT TOP 50 TitleID, PartID, SectionID, Section 
   FROM Sections 
   WHERE CONTAINS(SectText, 'satellite NEAR broadcast') AND TitleID = 47 
UNION SELECT TOP 50 TitleID, PartID, SectionID, Section 
   FROM Sections 
   WHERE CONTAINS(SectChar, 'satellite NEAR broadcast') AND TitleID = 47

SELECT TOP 50 TitleID, SubtitleID, ChapterID, SubchapterID, PartID, Part 
   FROM Parts 
   WHERE CONTAINS(TOCText, 'satellite NEAR broadcast') AND TitleID = 47 
UNION SELECT TOP 50 TitleID, SubtitleID, ChapterID, SubchapterID, PartID, Part 
   FROM Parts 
   WHERE CONTAINS(TOCChar, 'satellite NEAR broadcast') AND TitleID = 47 
         

Following are the statistics (as of 12/31/01) generated by SQL Server's fulltextcatalogproperty function for the two full-text catalogs:

Full-text indexing status for tables of contents:
    Number of entries indexed:  8,660
    Size of index in megabytes: 15
    Number of usable keywords:  714,432

 Full-text indexing status for sections:
    Number of entries indexed:  181,447
    Size of index in megabytes: 148
    Number of usable keywords:  1,288,380