Posted on: September 10, 2015in Blog
4 Methods to Help Improve Your Keyword Search Results
This post explains four best practices for developing better functioning keywords: visibility, dictionary access, iteration/testing, and keyword expansion via concept analytics.
In 1985, the Blair and Maron study, found that keywords alone identified less than 20 percent of the relevant documents in their test case of roughly 350,000 pages of text. Thirty years and a many conversation later, the majority of cases we work continue to rely on keyword searching as the primary means by which to reduce document collections by identifying a relevant, document-review set. Despite these findings and judicial statements about the inefficacy of keyword searching in eDiscovery, the prevailing practice continues to be search, process, review, produce.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
So, given that keywords are not very effective and that they still remain as the dominant culling tool in eDiscovery, how do we make them function better? There are four keys to developing better functioning keywords. Those keys are: visibility, dictionary access, iteration/testing and keyword expansion via concept analytics.
Too often search terms are created in a vacuum with no insight into what words or phrases actually exist in the dataset. In the past when processing rates exceeded $2000 per gigabyte terms were used to reduce collections prior to processing thereby saving vast amounts of money. Today the market has changed and processing rates are at historic lows. Searching in the review tool affords greater visibility into search results. Case teams have immediate access to search hits and can quickly determine which terms are successful and which terms need refinement. Visibility allows you to see keywords in context rather than as a number on a report.
All keyword indexes are built off a dictionary or list of terms in a particular database. Some review tools expose this dictionary to end users so that you can look up the number of instances and documents that contain any specific term. This is a powerful tool because before negotiating terms with the opposition, you can get an understanding of what terms exist. Dictionaries also allow you the ability to submit fuzzy searches to search terms with variable spellings. Fuzzy searching is a good strategy for commonly misspelled last names. For databases with a high volume of OCR, it may not be enough to simply search for the name, “Johnson.” A dtSearch dictionary will tell you how many variations of the word “Johnson” exist, while allowing you to eliminate legitimate variations such as “Johanson” and “Johnston.”
Although keyword expansion requires that addition of an analytics engine, it can be a powerful tool for creating search terms lists to run against the dtSearch index. When you submit a keyword to the expansion engine, the analytics index will identify all the terms that are “conceptually” related to your term. By conceptually related we don’t mean synonyms, but terms that share conceptual meaning within your specific dataset. For example, an R&D team may be working on a project code named “falcon” but the product name might be GL4800. The case team probably knows about GL4800, but may not know to also search for “falcon.” Keyword expansion will help you find those conceptually similar terms.
Finally, it is important to test your terms and to run multiple iterations of terms to ensure that your recall is what you might expect. Too often, we run terms only to have case teams want to refine them after review has started because of the large number of false positives. Testing and sampling terms is an iterative process. Most review tools allow you to sample search hits, allowing you to confirm the results of your search terms before finalizing and building a review. Building this practice into your future reviews may require a little time at the outset, but will save you time in the end.
While search terms remain the primary means by which case teams cull and identify documents to review for litigation, the fact is that they are terribly unreliable. That being said, the use of keywords as a review strategy is not going away any time soon. The four keys noted here will help to improve keyword recall and make sure that you are making the most of your terms. The success of your search terms is predicated upon the time you put into development and testing before you finalize them.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted October 27, 2016
Unplugged: What China’s Internet and Data Restrictions Mean for U.S. Companies and China’s Economy
Posted October 20, 2016
3 Areas of Focus When Migrating Data to a New Document Review Tool
Posted October 13, 2016
3 Best Practices for eDiscovery Custodian Interviews
Posted October 06, 2016
Latest Developments in U.S. FCPA Enforcement
Posted September 22, 2016
The Hybrid Approach to an eDiscovery Managed Services Model
Posted September 15, 2016
5 Document Management Best Practices for Beginners
Posted September 13, 2016
Innoxcell Annual Symposium 2016 | Shanghai Series
Posted September 08, 2016
Maintaining a Great Wall of Data Control in eDiscovery
Posted September 08, 2016
PREX16 | 5th Annual Conference on Preservation Excellence
Posted September 01, 2016
Uncovering Enterprise Vault Stub Files and Their Missing Attachments