How to Use the OakLeaf CFR Full-Text Search Feature
The search page provides default values for all required entries, including a search term that retrieves a typical mix of sections, parts, and chapters. Click Search Now to return a list of hits with the default options. Elements are displayed in reverse hierarchical order: sections, parts, subchapters, chapters, subtitles, and titles. Relevence ranking isn't implemented. Highlighted search terms in the XHTML document might be added to a future version, but doing so requires maintaining session state.
To perform a custom search, modify the following values:
"search text"
), and separate individual words or phrases by by commas. Terms that use the *
wildcard must be enclosed in double quotation marks, as in "agricultur*"
to find references to "agriculture"
and "agricultural"
. You also can use boolean search expressions with AND
, AND NOT
, OR
, or NEAR
. The tilde (~
is a synonym for NEAR
. Don't use URI special characters, such as &
, +
, or ?
in the search string.
Search Tip: Restricting searches to a single relevent title greatly reduces the response time, especially when you include section text in the search. Typical times to generate the XHTML for a 50-item search against a single title is less than half a second on the development hardware.
Technical Descripiton of the Search Feature
The Search form uses HTTP GET
(not POST
) to make the query visible in the Address bar. The URL for the default query modified to search in title 47 only is: http://localhost/cfr/results.asp?Term=satellite+NEAR+broadcast&InTOC=on&InSect=on&Title=47+-+Telecommunication&Level=Parts&MaxHits=50
.
The search component translates the form data to multiple Transact-SQL SELECT
statements. To conserve table space, the database stores TOCs and section text XML documents having a length less than 7,000 characters in varchar
fields. Larger documents are stored in text
fields. (The CFR is in English so Unicode isn't required for the XML text.) Thus, a UNION
query is required to combine the results from the SectText (text
) and SectChar varchar(7000)
fields, and the corresponding TOC fields. Storing all XML text in text
fields would result in a 3-GB database instead of the curren size of less than 1 GB. For additional information on SQL Server 2000's full-text search features, read the "Run Full-Text Database Searches on a Shoestring" article from the December 19, 2001 issue of Fawcette Technical Publications' .NETInsight newsletter.
Following are are the first two of the six SQL statements required to return hits for section text, part TOCs, subchapter TOCs, chapter TOCs, subtitle TOCs, and title TOCs.
SELECT TOP 50 TitleID, PartID, SectionID, Section FROM Sections WHERE CONTAINS(SectText, 'satellite NEAR broadcast') AND TitleID = 47 UNION SELECT TOP 50 TitleID, PartID, SectionID, Section FROM Sections WHERE CONTAINS(SectChar, 'satellite NEAR broadcast') AND TitleID = 47 SELECT TOP 50 TitleID, SubtitleID, ChapterID, SubchapterID, PartID, Part FROM Parts WHERE CONTAINS(TOCText, 'satellite NEAR broadcast') AND TitleID = 47 UNION SELECT TOP 50 TitleID, SubtitleID, ChapterID, SubchapterID, PartID, Part FROM Parts WHERE CONTAINS(TOCChar, 'satellite NEAR broadcast') AND TitleID = 47
Following are the statistics (as of 12/31/01) generated by SQL Server's fulltextcatalogproperty
function for the two full-text catalogs:
Full-text indexing status for tables of contents: Number of entries indexed: 8,660 Size of index in megabytes: 15 Number of usable keywords: 714,432 Full-text indexing status for sections: Number of entries indexed: 181,447 Size of index in megabytes: 148 Number of usable keywords: 1,288,380