Bureau of Competition Production Guide
Getting Started
Protocols for All Submissions
Before processing documents in response to a formal request, please note: The following protocols apply to ALL formats submitted to the Bureau of Competition. The Bureau has additional requirements pertaining to metadata, format, etc., for certain types of documents. See Preparing Collections for details.
Extracted Text / OCR
Submit text:
- as document-level text files,
- named for the beginning Bates number, and
- organized into a folder separate from images.
We cannot accept Unicode text files.
Deduplication
You must have the approval of the Bureau representative to globally de-dupe or to apply email threading. You do not need prior Bureau approval to deduplicate within a custodian’s document set.
Labeling & Numbering Files
For image file names, bates numbers and document identification numbers (Doc IDs), use a consistent number of numerals to prevent issues with image display, using leading zeros where necessary. Do not use a space to separate the prefix from numbers.
Acceptable formats (as long as you are consistent)
- ABC-0001
- ABC0001
Unacceptable format
- ABC 0001
Recommended Delimiters
We strongly recommend using these delimiters in delimited data load files:
Description Symbol ASCII Character Field Separator 20 Quote Character Þ 254 Multi Entry delimiter ® 174 <Return> Value in data ~ 126
Image Files
We only accept image files that are:
- 300 DPI
- single-page Group IV TIFF files
Load Files
The Bureau of Competition uses LexisNexis® Concordance® 2007 v 9.58. With the production, you must submit:
- an image load file containing a line for every image file in the production, and
- a delimited data load file containing a line for every document in the production.
Date & Time Format
Submit date and time data in separate fields so Concordance can load it.
