Posted on: October 07, 2015in Blog
How to Remove False Positives From Proximity Searches
This post explains how to reduce the number of false positives returned in your keyword search results by using these customizable dtSearch operators.
Counsel conducting discovery often need to remove false positives from their search hits through the use of exclusion criteria. This is typically the case when a search term hits on words contained in an email footer. The NOT W/ or the AND NOT dtSearch operators can be used to exclude most false positives. For example, if we are searching for the word privileged, we can use privileged NOT w/1 “This message is privileged and confidential.” However, other cases can be much more difficult.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
Consider the following hypothetical case. Suppose your client is a milk company and they want to search an employee’s email account for any correspondence regarding price skimming. A search for skim* W/15 price yields too many false positives that reference skim milk instead of price skimming. You might consider using (skim W/15 price) AND NOT milk, however, this construction is over-exclusive and would filter out documents that discuss both price skimming and skim milk. You might also be tempted to use the following invalid construction:
INVALID SEARCH: (skim* NOT W/3 milk) W/15 (price)
Though the above search may appear to return your intended search hits, a NOT W/ proximity operator should not be nested within an additional W/ proximity operator because this results in unclear search syntax and an inaccurate set of search hits. The above string may only return a partial list of hits instead of the full population of hits responsive to your discovery request. Additionally, there is a chance the above string will return a handful of documents that are completely outside the intended search criteria. A more accurate method of discovering our responsive documents will combine the following two search formulas:
CERTAIN HITS: (skim* W/15 price) AND NOT (skim* W/3 milk)
- AND -
POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price) AND (skim* W/3 milk)
The Certain Hits search string will return a subset of documents that meet our search criteria without returning any of the false positives that reference milk. The Potential Hits string will return additional documents meeting our search criteria as well as a limited number of false positives. In the first search above, we verify that skim* is near price and that milk is not near skim*. These hits meet our criteria and we can prioritize for review. In the second search above, we verify a document has the word skim* without milk nearby and we verify that an instance of skim* is within 15 words of price. However, if this document also contains the phrase skim* near milk somewhere in the document, then we cannot be sure which instance of skim is actually falling within 15 words of price. Only a careful review of each document in the Potential Hits group will determine whether or not it is a false positive. Alternatively, you may want to return all potential and certain hits in one search, and that string is below:
ALL CERTAIN AND POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price)
The above case is certainly an exception but highlights the need to intimately understand how different dtSearch operators function together and where problems can arise. As advanced as today’s search tools are, they do not always throw errors when search terms contain invalid syntax. If there’s any doubt how a search term is functioning, it’s always best to have an industry expert assist with formatting and testing search terms. Included below is a resource for your personal use, each string is substituted with generic placeholders that can be modified with terms specific to your search.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted January 19, 2017
Legal Hold Triggers: When Should You Document Your Reasonable Anticipation of Litigation?
Posted January 12, 2017
5 New Year's Resolutions from an Experienced eDiscovery Team
Posted January 11, 2017
"Advanced" Analytics Roundtables - Legaltech 2017 | New York
Posted January 06, 2017
2017 Sedona Conference | Discovery in a Dynamic Digital World
Posted January 06, 2017
Corporate eDiscovery Hero Awards Celebration | Zapproved
Posted January 05, 2017
Creating Strategic eDiscovery Workflows for Small Teams
Posted December 28, 2016
Predictive Coding vs. Search Terms: Who Determines the Method of Review?
Posted December 22, 2016
5 Things You Need to Know About the Managed Review Process
Posted December 15, 2016
Where Lawyers Can’t Practice
Posted December 08, 2016
Wearable Tech: The Impact on Cases and eDiscovery