Organizations are lining up paperless processes heading into 2021. Document scanning is part one on the road to managing information. But what happens after scanning and what’s the best way to go about storing scanned documents? Good questions! First, let’s talk about why businesses are going paperless, what the easy part of that process is, and what elements are more difficult to accomplish during digital transformation.
While businesses proactively digitize internal documents and enthusiastically manage those day to day documents, it is not always possible to avoid an inflow of paper documents from external sources. So when paper does come in it is scanned into image formats and put through an OCR (Optical Character Recognition) engine to convert the image into electronic text. This is called Day Forward Scanning.
Sometimes, government regulations mandate long-term archives of vital documents. Digital files are more amenable to long-term preservation than paper so, sometimes in batches depending on budget, older records are also scanned on a schedule.
A common question asked at the beginning of the scanning process is “what’s the best format to store scanned documents?” You don’t want to fork out money to digitize unless the end result is digital documents that are well-preserved and readily accessible for years to come.
The file format you choose to save digital files depends on the attributes of the content contained in the file, and how you want to use them going forward.
PDF/A for Archival or Preservation of Digital Records
PDF/A is an ISO-standardized version of the typical PDF file format. It is ideal for archiving and preserving digital records long-term. Standard PDF file formats link to fonts as opposed to embedding and encrypting fonts, which isn’t well-suited for long term storage. The PDF/A format prohibits these features, making it more suitable for long-term storage.
The ISO requirements for PDF/A viewing applications include colour management guidelines, support for embedded fonts, and a user interface for reading embedded annotations. Simply put, the PDF/A format ensures your images will be true and accurate representations of the original documents.
PDF/A essentially has all the same functionality as a regular PDF file format but with the added advantage of being archival (/A). At MES, we use the PDF/A format for over 90% of the documents we scan.
PDF/A standards and conformance levels have changed slightly over time. PDF/A-1 (ISO 19005-1:2005) was the original and most restrictive version, followed by PDF/A-2: (ISO 19005-2:2011) that brought in better compression and added other features like layers. PDF/A-3 (ISO 19005-3:2012) allowed alternative content types as embedded files or attachments. PDF/A-4 (ISO 19005-4:2019) is based on PDF 2.0. It is also called PDF/A-NEXT and will introduce two new conformance levels to the format.
Companies prefer PDF/A as it is universally accepted and is not platform-dependent. It supports metadata, which ensures searchability. It preserves the original look, fonts, and layout of the original document.
There are several ways to save files in the PDF/A format. For instance, most modern document scanners are PDF/A-compliant. Configure the scanner software to create a PDF/A-compliant file and run a scan, ensuring that all pages in the documents are in the same file. You can also run an OCR tool to convert it to electronic text.
You can create a native PDF/A using a tool like Adobe Acrobat Professional that supports the format. Microsoft Office Suite products also support saving to the PDF/A format. Depending on the volume of the documents you create, you may have to purchase a product that supports PDF/A.
TIFF: The File Format for Scanned Images
At times, you may want to preserve scanned files in image format without converting them into electronic text. TIFF (Tagged Image File Format) is a format that allows a high degree of compression, making it possible to store images with a great deal of detail.
TIFFs are great for high-privacy documents like student records, health records, and government records that demand that high level of privacy according to industry regulations. For example, healthcare institutions prefer TIFF for a single-page file of a patient's test results instead of a PDF file with multiple pages with records of several patients in the same file.
Organizations also use the TIFF format when they want the file preserved in its 'as-original' state. In the PDF format, it is possible to manipulate the content after scanning.
However, if you want the text in the file to be searchable, TIFF is not the appropriate format unless you use a document management system that uses a separate text file for each TIFF.
Other Image Formats
There are other image formats like PNG (Portable Network Graphics) and JPEG (Joint Photographic Experts Group) that are suitable for graphical content. Companies use these formats to preserve logos, photographs, marketing collaterals, and website images. However, these formats are not suitable for documents.
Organizations often struggle with choosing PDF or TIFF formats for their scanned documents. It's a good idea to approach a professional document scanning company to analyze your organization's requirements thoroughly before you decide.
Ask these questions when deciding scanning formats:
- What is the volume of files to be stored?
- Who will use the files and how frequently?
- What are the privacy guidelines for the content of the files?
- What is the type of content in the files?
- How long will the copies be retained?
- Are aspects of searchability and instant retrieval critical?
- Are you using a document management system?
Do you have more questions? Our experienced staff takes the hassle out of document scanning by answering questions like file format and how best to store scanned documents. We manage every aspect of your scanning project. Contact MES today for a free quote.