Posted on: October 07, 2015in Blog
How to Remove False Positives From Proximity Searches
This post explains how to reduce the number of false positives returned in your keyword search results by using these customizable dtSearch operators.
Counsel conducting discovery often need to remove false positives from their search hits through the use of exclusion criteria. This is typically the case when a search term hits on words contained in an email footer. The NOT W/ or the AND NOT dtSearch operators can be used to exclude most false positives. For example, if we are searching for the word privileged, we can use privileged NOT w/1 “This message is privileged and confidential.” However, other cases can be much more difficult.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
Consider the following hypothetical case. Suppose your client is a milk company and they want to search an employee’s email account for any correspondence regarding price skimming. A search for skim* W/15 price yields too many false positives that reference skim milk instead of price skimming. You might consider using (skim W/15 price) AND NOT milk, however, this construction is over-exclusive and would filter out documents that discuss both price skimming and skim milk. You might also be tempted to use the following invalid construction:
INVALID SEARCH: (skim* NOT W/3 milk) W/15 (price)
Though the above search may appear to return your intended search hits, a NOT W/ proximity operator should not be nested within an additional W/ proximity operator because this results in unclear search syntax and an inaccurate set of search hits. The above string may only return a partial list of hits instead of the full population of hits responsive to your discovery request. Additionally, there is a chance the above string will return a handful of documents that are completely outside the intended search criteria. A more accurate method of discovering our responsive documents will combine the following two search formulas:
CERTAIN HITS: (skim* W/15 price) AND NOT (skim* W/3 milk)
- AND -
POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price) AND (skim* W/3 milk)
The Certain Hits search string will return a subset of documents that meet our search criteria without returning any of the false positives that reference milk. The Potential Hits string will return additional documents meeting our search criteria as well as a limited number of false positives. In the first search above, we verify that skim* is near price and that milk is not near skim*. These hits meet our criteria and we can prioritize for review. In the second search above, we verify a document has the word skim* without milk nearby and we verify that an instance of skim* is within 15 words of price. However, if this document also contains the phrase skim* near milk somewhere in the document, then we cannot be sure which instance of skim is actually falling within 15 words of price. Only a careful review of each document in the Potential Hits group will determine whether or not it is a false positive. Alternatively, you may want to return all potential and certain hits in one search, and that string is below:
ALL CERTAIN AND POTENTIAL HITS: (skim* NOT W/3 milk) AND (skim* W/15 price)
The above case is certainly an exception but highlights the need to intimately understand how different dtSearch operators function together and where problems can arise. As advanced as today’s search tools are, they do not always throw errors when search terms contain invalid syntax. If there’s any doubt how a search term is functioning, it’s always best to have an industry expert assist with formatting and testing search terms. Included below is a resource for your personal use, each string is substituted with generic placeholders that can be modified with terms specific to your search.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted February 23, 2017
Women in eDiscovery Atlanta | New Data Technology Trends
Posted February 23, 2017
Corporate Internal Investigations Best Practices
Posted February 13, 2017
4 Key Internal Roles Involved with Conducting Corporate Investigations
Posted February 09, 2017
Corporate Internal Investigations: A Legal & IT Love Story [Webinar]
Posted February 09, 2017
Intellectual Property Theft: How to Ensure a Defensible Investigation
Posted February 02, 2017
Could the Amazon Echo be a New Source of ESI?
Posted January 26, 2017
Information Governance Policies: The Fundamental Building Block to eDiscovery
Posted January 25, 2017
4 Urban Legends about Analytics and e-Discovery
Posted January 19, 2017
Legal Hold Triggers: When Should You Document Your Reasonable Anticipation of Litigation?
Posted January 12, 2017
5 New Year's Resolutions from an Experienced eDiscovery Team