Posted on: October 07, 2015in Blog
How to Remove False Positives From Proximity Searches
This post explains how to reduce the number of false positives returned in your keyword search results by using these customizable dtSearch operators.
Counsel conducting discovery often need to remove false positives from their search hits through the use of exclusion criteria. This is typically the case when a search term hits on words contained in an email footer. The NOT W/ or the AND NOT dtSearch operators can be used to exclude most false positives. For example, if we are searching for the word privileged, we can use privileged NOT w/1 “This message is privileged and confidential.” However, other cases can be much more difficult.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
Consider the following hypothetical case. Suppose your client is a milk company and they want to search an employee’s email account for any correspondence regarding price skimming. A search for skim* W/15 price yields too many false positives that reference skim milk instead of price skimming. You might consider using (skim W/15 price) AND NOT milk, however, this construction is over-exclusive and would filter out documents that discuss both price skimming and skim milk. You might also be tempted to use the following invalid construction:
INVALID SEARCH: (skim* NOT W/3 milk) W/15 (price)
Though the above search may appear to return your intended search hits, a NOT W/ proximity operator should not be nested within an additional W/ proximity operator because this results in unclear search syntax and an inaccurate set of search hits. The above string may only return a partial list of hits instead of the full population of hits responsive to your discovery request. Additionally, there is a chance the above string will return a handful of documents that are completely outside the intended search criteria. A more accurate method of discovering our responsive documents will combine the following two search formulas:
CERTAIN HITS: (skim* W/15 price) AND NOT (skim* W/3 milk)
- AND -
POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price) AND (skim* W/3 milk)
The Certain Hits search string will return a subset of documents that meet our search criteria without returning any of the false positives that reference milk. The Potential Hits string will return additional documents meeting our search criteria as well as a limited number of false positives. In the first search above, we verify that skim* is near price and that milk is not near skim*. These hits meet our criteria and we can prioritize for review. In the second search above, we verify a document has the word skim* without milk nearby and we verify that an instance of skim* is within 15 words of price. However, if this document also contains the phrase skim* near milk somewhere in the document, then we cannot be sure which instance of skim is actually falling within 15 words of price. Only a careful review of each document in the Potential Hits group will determine whether or not it is a false positive. Alternatively, you may want to return all potential and certain hits in one search, and that string is below:
ALL CERTAIN AND POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price)
The above case is certainly an exception but highlights the need to intimately understand how different dtSearch operators function together and where problems can arise. As advanced as today’s search tools are, they do not always throw errors when search terms contain invalid syntax. If there’s any doubt how a search term is functioning, it’s always best to have an industry expert assist with formatting and testing search terms. Included below is a resource for your personal use, each string is substituted with generic placeholders that can be modified with terms specific to your search.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted December 01, 2016
How to Handle Salesforce ESI Discovery Requests
Posted November 24, 2016
Managed Document Review: What it is and How it Benefits You
Posted November 17, 2016
25 Predictive Coding and Analytics Definitions for eDiscovery
Posted November 14, 2016
Innoxcell Annual Symposium 2016 | New York City
Posted November 10, 2016
Who Determines eDiscovery Requirements for ESI Production?
Posted November 03, 2016
Social Media and Mobile Device Data Collection and Defensibility [On-Demand Webinar]
Posted November 03, 2016
What Predictive Coding Court Rulings Can Teach Us
Posted October 27, 2016
Unplugged: What China’s Internet and Data Restrictions Mean for U.S. Companies and China’s Economy
Posted October 20, 2016
3 Areas of Focus When Migrating Data to a New Document Review Tool
Posted October 13, 2016
3 Best Practices for eDiscovery Custodian Interviews