Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

File Processing and Quality Control

Title: Fuel oil and kerosene sales [electronic resource]

Original diskette files: All diskettes were mounted on a 5 1/4" drive of a Windows 98 PC and the diskette files were copied to the PC hard drive. To preserve the original file content, all files were transferred to a UNIX server in binary format and retain their original time and date stamps.

Compressed file: The DOS self extracting file was executed on a Windows PC. The decompressed DOS ASCII files were transferred to a UNIX server in binary format and retain their original time and date stamps.

Processed documentation file and report: The original DOS ASCII documentation and report were examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. Form feeds (octal 14) and the substitute control code (octal 32) were replaced with a blank using Perl "fix". DOS carriage return characters (octal 015) were removed with UNIX "tr". The "fixed" documentation and report files were saved with .TXT file extensions.

Processed data table and file: The DOS ASCII data table TABLES91.TXT was examined with Perl "cio" to identify extraneous control and high ASCII characters. Form feeds (octal 14) and octal 257 were replaced with a blank using Perl "fix". DOS carriage return characters (octal 15) were removed with UNIX "tr". The "fixed" table was saved with the standard .TXT file extension.

You can view the 1991 "cio" output for the original ASCII diskette files.

The non-delimited fixed ASCII data file DATA91.TXT was translated to ASCII comma separated value (ASCII CSV) data file format with Windows DBMS/COPY according to the specifications in the User's Guide. Variable labels were added to the output file. The DBMS/COPY translation log was checked for data translation error messages. The first and last five records of the output file were checked for data value translation accuracy. The DBMS/COPY input and output data dictionaries follow:

INPUT DATA DICTIONARY STATEMENT:
dictionary
extension=TXT
missing=
numeric=n
fixed=y
dictionary=dct
date=mm/dd/yyyy
variables
     1      4 c YEAR
     5      3 c PRODUCT
     8      2 c END USE
    10      4 c GEO AREA
    14      1 c TABLE GROUP
    15      1 c NULL VALUE SWITCH
    16      8 R CELL VALUE
endvars

OUTPUT DATA DICTIONARY STATEMENT:
dictionary
names=y
separator=,
mustsurround=n
surroundchar="
quotechar='
extension=CSV
missing=
numeric=n
fixed=n
dictionary=dct
date=mm/dd/yyyy
variables
     1      4 c YEAR
     6      3 c PRODUCT
    10      2 c END USE
    13      4 c GEO AREA
    18      1 c TABLE GROUP
    20      1 c NULL VALUE SWITCH
    22      8 R CELL VALUE
endvars

 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster