Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

File Processing and Quality Control

Title: Annual report of major natural gas companies [electronic resource]

Original diskette files: The diskette was mounted on a 5 1/4" drive of a Windows 98 PC and the diskette files were copied to the PC hard drive. To preserve the original diskette file contents, all files were transferred to a UNIX server in binary format with their original time and date stamps.

Compressed file: The DOS compressed data file was extracted with the DOS extraction program on the original diskette (PKUNZIP.EXE). The decompressed file was transferred to a UNIX server in binary format and retains its original time and date stamp.

Processed data file: The decompressed non-delimited DOS ASCII data file was translated to ASCII comma separated value format (ASCII CSV) using Windows DBMS/COPY according to the specifications in the printed documentation, EIA Microcomputer Products: Annual Report of Major Natural Gas Companies. Variable labels were added to the output file. The last character in each "Data Field" was translated to an integer and if required, the "Data Field" was assigned a negative value. Blank fields were dropped. The first and last ten records of the translated file were checked for data accuracy.

This data file must be used with the printed documentation to determine the type of data reported by the gas companies in the "Data Field".

The data dictionary input and output statements for the DBMS/COPY translation are:

DATA DICTIONARY INPUT STATEMENT:
dictionary
extension=txt
missing=
numeric=n
fixed=y
dictionary=dct
date=mm/dd/yyyy
variables
     1      6 c Company ID Number
     7      2 c Schedule Number
     9      3 c Line Number
    12      1 c Column Number
    13      1 c Blank
    14     12 l Data Field
    26     10 c blank1
    36     34 c Company Name
    70      9 c blank2
    79      2 c Year
endvars

DATA DICTIONARY OUTPUT STATEMENT:
dictionary
names=y
separator=,
mustsurround=n
surroundchar="
quotechar='
extension=CSV
missing=
numeric=n
fixed=n
dictionary=dct
date=mm/dd/yyyy
variables
     1      6 c Company ID Number
     8      2 c Schedule Number
    11      3 c Line Number
    15      1 c Column Number
    17     11 l Data Field
    29     34 c Company Name
    64      2 c Year
endvars

 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster