Bureau of Competition Production Guide

Getting Started

Protocols for All Submissions

Before processing documents in response to a formal request, please note: The following protocols apply to ALL formats submitted to the Bureau of Competition. The Bureau has additional requirements pertaining to metadata, format, etc., for certain types of documents. See Preparing Collections for details.

Extracted Text / OCR

Submit text:
  • as document-level text files,
  • named for the beginning Bates number, and
  • organized into a folder separate from images.

We cannot accept Unicode text files.

Deduplication

You must have the approval of the Bureau representative to globally de-dupe or to apply email threading. You do not need prior Bureau approval to deduplicate within a custodian’s document set.

Labeling & Numbering Files

For image file names, bates numbers and document identification numbers (Doc IDs), use a consistent number of numerals to prevent issues with image display, using leading zeros where necessary. Do not use a space to separate the prefix from numbers.

Acceptable formats (as long as you are consistent)

  • ABC-0001
  • ABC0001

Unacceptable format

  • ABC 0001

Recommended Delimiters

We strongly recommend using these delimiters in delimited data load files:

Description Symbol ASCII Character
Field Separator  20
Quote Character Þ 254
Multi Entry delimiter ® 174
<Return> Value in data ~ 126

Image Files

We only accept image files that are:
  • 300 DPI
  • single-page Group IV TIFF files

Load Files

The Bureau of Competition uses LexisNexis® Concordance® 2007 v 9.58. With the production, you must submit:

  1. an image load file containing a line for every image file in the production, and
  2. a delimited data load file containing a line for every document in the production.

Date & Time Format

Submit date and time data in separate fields so Concordance can load it.

Last Modified: Friday, March 27, 2009