Once you’ve chosen your ECM system, how do you get started? Here are a few things to consider as you move forward.
Determining Hard Drive Capacity
One question I often get is, “How do I know how much drive space I need?”
The process looks similar to step one in our sample ROI back in chapter 2. You can use the following procedure to get a good idea of how much information you have and, consequently, how much space you need.
- Estimate how many pages of paper documents you want to scan and store.
- If you’re going to store electronic files, such as Word Documents, AutoCAD or media files, determine the file space that these files are currently taking.
- Determine how many pages you will be storing each month.
- Determine how many years you want your storage capacity to last.
For example, say you have all 8.5 x 11” single-sided documents kept in 10 four-drawer filing cabinets, 5 four-drawer lateral filing cabinets, and 20 standard storage boxes. This adds up to about 260,000 pages.
You can use a spreadsheet like this one to calculate the number of pages you have:
|Standard 4-drawer filing cabinets (12,000 page each)||Standard cabinet||10||120,000|
|Lateral filing cabinets (18,000 page each)||Lateral
|Standard banker boxes (2500 page each)||Boxes||20||50,000|
|Large banker boxes (5000 page each)||Boxes||0||0|
|Feet of open shelving (1500 page per foot)||Feet||0||0|
|1” binders (265 page each)||Binders||0||0|
|2” binders (474 page each)||Binders||0||0|
|3” binders (660 page each)||Binders||0||0|
|Stacks of paper in inches (175 pages per inch)||Inches||0||0|
Next, we can calculate how much hard drive storage space these documents will take. If you’re scanning at 300 dpt and using Group IV TIFF files, these documents will only take 11 GB of space.
If you want to simplify the process even further, you can use my easy estimate method. Simply assume that each of the following will use about 1 GB of hard drive space:
- 1 four-drawer standard filing cabinet
- 1 four-drawer lateral filing cabinet
- 7 standard-sized storage boxes
Determining File Formats
Before documents are scanned, you should decide in which format you will store them. There are two primary choices: PDF or TIFF. While there are technical differences between the two, both require roughly the same amount of hard drive space.
PDF files are created using Adobe Acrobat software and require a proprietary reader to view the file.
Does that raise red flags for you? Perhaps you remember from my 7 Ingredients that one of the major components of a successful ECM project is a product that is not proprietary.
PDF document image files are technically just a file wrapper around a TIFF image, but unlike TIFF (which is pure raster image file), PDF files can be a combination of raster and vector, which can create issues in authenticity.
My opinion is that scanning your paperwork and storing it as PDFs on your server isn’t much better than keeping the paper! Here are a few reasons why:
- PDFs have very limited search capabilities. You have to know the name of the document in order to find it. This is not evident at first when you begin scanning, but as time goes on, and you accumulate more and more documents, you’ll find yourself opening each one to find the correct file. It’s not that much different than rummaging through a filing cabinet – and it’s just as time consuming!
For example, I recently met with an organization that had scanned all their important documents as PDFs. However, they’d failed to use a naming convention or index. There was no way to figure out where files were and which were in the wrong directories. It was a mess!
- PDFs can pose a security threat. It’s possible for PDFs to contain dangerous code execution vulnerability or malware. For example, a flaw in Adobe 8.1 could allow hackers to include dangerous code in PDF files to take control of Windows XP computers.
- PDFs can be corrupted. Not long ago, I tried to open a PDF of an important contract. I got the following error message: There was an error opening this document. The file is damaged and could not be repaired.
I did a Google search on “corrupt PDF” and found nearly 1,000,000 hits. I found that there are programs to repair corrupt files and even companies that specialize in attempting to recover severely corrupted files. It’s a common problem!I was lucky: the client was able to find a copy of the PDF that was not corrupt. But if he hadn’t, the details of the contract could have been permanently lost.
- You can’t secure PDF files. While storing PDFs on your network seems easy, you’ll find that securing them is much more difficult. How do you keep someone from renaming, moving, viewing and editing archived PDF documents? You can’t.
Conversely, the uncompressed TIFF file is an industry standard. The National Archives recommends it because it accommodates large file sizes, is widely used and supported, has a long track record, and has no native support in current web browsers, among other reasons.
I recommend the TIFF file to all of my clients, and I’d suggest that you use it as well.
Recommendations for Setting Up Your ECM
In my years in the field, I’ve set up my fair share of ECM projects. Here are four simple things that I recommend doing for every ECM set-up.
- Load the document imaging software on a server dedicated to commercial-grade servers, such as HP Proliant ML-350.
- Store the documents and metadata on the hard drives of the server using hardware RAID technology or on a SAN.
- Back up the imaging data on a tape backup unit or portable external drives using a proven rotation scheme, such as a 12-tape rotation. I highly recommend using the SQL backup agent of your backup software.The backup tapes or drives should be stored off-site in a secure location (such as a bank vault) at least 30 miles from your office. This can also be an effective disaster recovery backup plan.
- Burn the data to CDs or DVDs, and keep those off-site too. They should be kept in their protective cases and stored in a cool, dry location at least 30 miles from your office.
Scanning can be an overwhelming task that sometimes prevents organizations from purchasing and implementing an ECM. If the idea of a total conversion of all your old paper files sounds intimidating, you’re not alone!
However, there are a couple different ways to approach scanning:
- The scan-forward approach
- Back-file scanning
The Scan-Forward Approach
In a scan-forward implementation (also known as “day forward” or “forward scanning”), you begin by choosing a cut-off date. After that day, all new documents will be scanned. Existing paper records are only scanned if they’re used. In many cases, documents that aren’t scanned after a year can be archived or even thrown away.
This approach means that you can get started much more quickly, since there’s no delay while a backlog of paper is scanned. Once the value of the system is proven, you can decide whether or not to start scanning older files.Getting Started with the Scan-Forward Approach
Starting a scan-forward process can be intimidating, but it’s an easier transition than you’d imagine.
Here are 14 simple ideas for getting started:
- Pick a department that will run with the product. This should not be the department that most needs the technology.
- Keep things simple at the beginning. Don’t attempt a company-wide implementation, unless it’s absolutely necessary.
- Start scanning from this day forward to avoid adding daily to your problem.
- If you have lots of files you’d like scanned into your system, consider hiring a vendor. There’s more information about this in the following section on back-file scanning, and I’d be happy to recommend some great companies.
- Hire temporary workers or interns to do the scanning.
- Have each department scan their own files and set a completion date.
- Ask anyone who retrieves a paper file to scan it instead of re-filing it.
- Set goals and make scanning a priority.
- Make a game of it! Have a contest to see which department can scan the most documents in a month.
- Start as soon as possible and don’t stop. It’s important to keep up the momentum.
- Check your digital copier to see if you can add scanning capability to do large batches.
- Look into bar-coding or cover sheets to capture the necessary information to file large batches automatically.
- Check to make sure that your backups are working and are storing both images and databases.
- Once you have a good amount of documents scanned, archive your data to WORM media store it off site.
For some businesses, access to old paper files is essential. In those cases, you’ll need to invest in back-file scanning services. Document management vendors generally don’t perform back scanning for you, but they’ll be able to recommend file scanning services.
These niche vendors have expensive high-volume scanners and plenty of staff to operate them. It’s not uncommon for them to scan millions of documents for a single customer.
Typically, you ship your documents to these scanning services, but some also come to you. This can cost more than twice as much, but if your files contain highly sensitive material, it can be reassuring to keep them on-site.
Once the documents are scanned, the services provide CDs or DVDs of the images, or they import them directly into your ECM. Depending on your preference, they’ll either destroy or return your paper documents.
Choosing a Scanner
Depending on which approach you decide to take with your scanning effort, you might already have the scanning capabilities you need in-house: your copier.
A modern digital copier with an automatic document feed and network connectivity can be exactly what you need to keep up with your day-to-day scanning needs. Almost any network copier can be used with an ECM.
If you’re buying a scanner, your vendor can provide valuable assistance. They’ll be able to recommend certain brands or models that will work well with their software. Often, they can even sell you the hardware directly, which can simplify your purchasing process.
When choosing scanners, you’ll have to consider the balance between quality and speed. If you want to scan paper documents so that the full text is searchable, you’ll need optical character recognition (OCR). This requires more expensive, high-quality scanning. If your system will simply be storing your documents as images, the level of detail isn’t quite as important.
Additionally, document scanners are different than image scanners used for graphics. Image scanners have high resolution capacity, which is not necessary when scanning documents. Document scanners have very efficient document feeders. We typically scan documents at 200 or 300 dots per inch (dpi). If you use higher resolutions than these, you’ll use a lot more file space.
Whether or not you do back-file scanning will also impact your scanner purchase decision. If you plan to scan millions of pages yourself, you’ll want a high-end scanner – one that can scan more than 80 pages per minute (ppm). However, this could cost $5,000 or more.
A more reasonable scanner, rated at around 40 ppm, might cost around $1200. A machine like this can easily handle around 1500 pages per day, but it won’t be adequate in processing large amounts of back-files.
To meet high scanning volumes, I suggest that you consider getting multiple scanners instead of one ultra-high-capacity model. Three 40 ppm machines will be considerably less expensive than one 120 ppm device. Plus, they’ll allow three people to work on scanning simultaneously, and provide two people the ability to continue working if one of the scanners needs service.