Posted on: February 24, 2015in Blog
PDF Files Can Wreak Havoc in ESI Processing, Review and Production
This post explains how the PDF format can complicate document review and production workflows, and three key things to keep in mind when using this format.
While it may be easy to create PDF files for day-to-day business use, when it comes to document review and production deliverables, the PDF format complicates things.
3 Things to Consider When Using PDF Format in Review and Production Deliverables
Here are three important considerations to remember when including PDF files in data deliverables:
1. PDF files are considered native files, not image files, by nearly all review applications.
Simply put, in order to get PDF files into a review tool, some form of processing/conversion must occur unless you don’t mind reviewing in Adobe Acrobat directly, or in an application with Adobe Acrobat-like functionality. The PDF leads to inefficient workflows and higher review costs.
2. Bad Document Breaks / Boundaries
You can do a lot with PDF files: merge multiple PDFs into a single document; create PDF portfolio documents from multiple sources; create e-books with hyperlinks to external files; convert electronic files, such as Microsoft Office documents, to PDFs “on the fly.” During document processing, review, and production, however, the PDF format can be a nightmare.
How many of us have been on the receiving end of a single, 1,000 page, 100 MB PDF file, which actually turns out to be 100-200 individual documents combined into one giant file? If the file itself doesn’t generate errors, then how do you review something that large? Some documents will be responsive, some will be privileged, but you cannot code them separately.
D4 often gets requests to have Logical Document Determination (LDD) performed on PDF files after they have been loaded into a review platform. Bad document breaks in PDF files are causing a huge uptick in LDD, which in turn leads to multiple processing requests, numerous and inefficient reviews, project delays, and additional costs, which are often unnecessary. When agreeing to form of production, go with the industry—and review platform—standard: single-page, Group IV TIFF images with load files instead of the PDF. This will significantly streamline the review and/or production process.
3. It is difficult to work with large files.
What can be done with a single, 1,000 page, 100 MB PDF file? Not a lot, and here’s why:
- Large files are difficult to email. Most email systems have file size restrictions in place and emails will get kicked back to the sender. The maximum file size is typically 10 MB.
- Large files usually crash native file viewers in review platforms. This leads end-users to mistakenly believe something is wrong with the review application, resulting in time wasted troubleshooting “technical/database” issues.
- Large files usually take a long time to upload and download. Data deliverables and productions can be delayed, leading to missed deadlines.
- Large files sometimes crash computer systems. Applications typically get “hung up” because they cannot handle the large file sizes. Many times this causes project and production delays as well as lost work product.
What is the solution for large PDF files? In my experience at D4, if customers are unable to request new deliverables from their client, or the producing party, they almost always choose to convert the PDFs to single-page, Group IV TIFF images for their in-house, or hosted review platform.
Processing and Review Platforms Handle these Types of PDF Formats Differently
Here are a few examples of PDF formats that, if handled improperly, have been known to wreak havoc on processing, document review and production, leading to inefficient workflows and higher review costs.
Pay close attention to PDF files that are/contain:
- Text-searchable, with poor OCR text
- Image-only PDF files (which cannot be easily identified from searchable PDF files)
- Secured/password protected
- Saved/created with a variety of different compressions that impact processing (especially PDF files created from scans on in-house copiers)
- Portfolio files
- Hyperlinks and bookmarks to external content
- Embedded files
- Embedded fonts/text
There is a common misconception in the industry: many people treat PDF files the same way they do TIFF images. However, there is one key difference. A PDF is considered a native file, and a TIFF image is considered an image, which is why review platforms handle them differently and offer different functionality for each format.
Keeping that in mind can help save your team and clients many work hours and unnecessary costs.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted November 16, 2017
5 Workflow Tips for Conducting a Foreign Language Review
Posted November 10, 2017
What You Need to Know About Managed Review and the eDiscovery Process
Posted November 02, 2017
7 Steps to Help You Defensibly Migrate eDiscovery Data
Posted October 27, 2017
CLE Webinar with Lewis Brisbois: How to Do Social Media Collection and Presentation Right
Posted October 26, 2017
Despite Clawback, Defendant’s Reckless Abandon of Rule 502 Bites Back
Posted October 20, 2017
How to Use the eDiscovery PST Export Tool in Office 365 E3
Posted October 12, 2017
Recent eDiscovery Cases for Mobile Phones and Social Media
Posted October 05, 2017
Raising Objections to the Format of ESI Productions: Do it Early and Do it Clearly
Posted September 27, 2017
5 Reasons eDiscovery Alternative Fee Models Make Sense for You
Posted September 22, 2017
Why it's Crucial to Have a Corporate Mobile Device Policy