Posted on: February 24, 2015in Blog
PDF Files Can Wreak Havoc in ESI Processing, Review and Production
This post explains how the PDF format can complicate document review and production workflows, and three key things to keep in mind when using this format.
While it may be easy to create PDF files for day-to-day business use, when it comes to document review and production deliverables, the PDF format complicates things.
3 Things to Consider When Using PDF Format in Review and Production Deliverables
Here are three important considerations to remember when including PDF files in data deliverables:
1. PDF files are considered native files, not image files, by nearly all review applications.
Simply put, in order to get PDF files into a review tool, some form of processing/conversion must occur unless you don’t mind reviewing in Adobe Acrobat directly, or in an application with Adobe Acrobat-like functionality. The PDF leads to inefficient workflows and higher review costs.
2. Bad Document Breaks / Boundaries
You can do a lot with PDF files: merge multiple PDFs into a single document; create PDF portfolio documents from multiple sources; create e-books with hyperlinks to external files; convert electronic files, such as Microsoft Office documents, to PDFs “on the fly.” During document processing, review, and production, however, the PDF format can be a nightmare.
How many of us have been on the receiving end of a single, 1,000 page, 100 MB PDF file, which actually turns out to be 100-200 individual documents combined into one giant file? If the file itself doesn’t generate errors, then how do you review something that large? Some documents will be responsive, some will be privileged, but you cannot code them separately.
D4 often gets requests to have Logical Document Determination (LDD) performed on PDF files after they have been loaded into a review platform. Bad document breaks in PDF files are causing a huge uptick in LDD, which in turn leads to multiple processing requests, numerous and inefficient reviews, project delays, and additional costs, which are often unnecessary. When agreeing to form of production, go with the industry—and review platform—standard: single-page, Group IV TIFF images with load files instead of the PDF. This will significantly streamline the review and/or production process.
3. It is difficult to work with large files.
What can be done with a single, 1,000 page, 100 MB PDF file? Not a lot, and here’s why:
- Large files are difficult to email. Most email systems have file size restrictions in place and emails will get kicked back to the sender. The maximum file size is typically 10 MB.
- Large files usually crash native file viewers in review platforms. This leads end-users to mistakenly believe something is wrong with the review application, resulting in time wasted troubleshooting “technical/database” issues.
- Large files usually take a long time to upload and download. Data deliverables and productions can be delayed, leading to missed deadlines.
- Large files sometimes crash computer systems. Applications typically get “hung up” because they cannot handle the large file sizes. Many times this causes project and production delays as well as lost work product.
What is the solution for large PDF files? In my experience at D4, if customers are unable to request new deliverables from their client, or the producing party, they almost always choose to convert the PDFs to single-page, Group IV TIFF images for their in-house, or hosted review platform.
Processing and Review Platforms Handle these Types of PDF Formats Differently
Here are a few examples of PDF formats that, if handled improperly, have been known to wreak havoc on processing, document review and production, leading to inefficient workflows and higher review costs.
Pay close attention to PDF files that are/contain:
- Text-searchable, with poor OCR text
- Image-only PDF files (which cannot be easily identified from searchable PDF files)
- Secured/password protected
- Saved/created with a variety of different compressions that impact processing (especially PDF files created from scans on in-house copiers)
- Portfolio files
- Hyperlinks and bookmarks to external content
- Embedded files
- Embedded fonts/text
There is a common misconception in the industry: many people treat PDF files the same way they do TIFF images. However, there is one key difference. A PDF is considered a native file, and a TIFF image is considered an image, which is why review platforms handle them differently and offer different functionality for each format.
Keeping that in mind can help save your team and clients many work hours and unnecessary costs.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted September 13, 2017
Taking a Team Approach to eDiscovery Projects
Posted September 06, 2017
3 Document Review Tips from eDiscovery Project Management Experts
Posted August 31, 2017
China’s VPN Crackdown Weighs on Foreign Companies There
Posted August 30, 2017
A Simple Approach to Managing Healthcare Data and eDiscovery
Posted August 23, 2017
Why New Healthcare Technology Needs to Keep eDiscovery in Mind
Posted August 17, 2017
Healthcare and eDiscovery: Top Challenges for Providers, Counsel, and Litigation Support
Posted August 10, 2017
Webinar Q&A Featuring Panelists from Office 365 and X1
Posted August 02, 2017
PREX17 | 6th Annual Conference on Preservation Excellence
Posted August 02, 2017
ILTACON 2017 | D4 Booth #238 and Executive Roundtables
Posted July 28, 2017
Far East Review: Experts Weigh In on China & Japan's Growing eDiscovery Markets