Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

File Processing and Quality Control

Title: Performance profiles of major energy producers [electronic resource]

Original diskette files: All diskettes were mounted on a 5 1/4" drive of a Windows 98 PC and the diskette files were copied to the PC hard drive. To preserve the original file content, all files were transferred to a UNIX server in binary format and retain their original time and date stamps.

Processed documentation file: The original DOS ASCII documentation was examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. The substitute control code (octal 32) was replaced with a blank using Perl "fix". DOS carriage returns (octal 15) were removed with UNIX "tr". The "fixed" file was saved with a .TXT file extension.

Processed data tables: The DOS Lotus 123 worksheets were translated to ASCII comma separated value (ASCII CSV) data table format with Windows DBMS/COPY. Separate scripts for each table conversion were used because each DOS Lotus table contained a different range of data cells. DBMS log files were checked for errors. The first and last 10 rows of data in the files T157786.CSV and TABLE52.CSV files were compared to the corresponding .WK1 tables for data accuracy. No other checks were run on the translations.

In general, the following program script was used for DOS Lotus table translations:

compute;
in=c:\convert\pps\[1,34,a,k,0,0]t157786.wk1
   out=c:\convert\pps\out\t157786.ascii2;
run;

where [1,34,a,k,] are the beginning and ending column locations and
[0,0] sets the column header row values to "0".

The following dictionary script with the exception of the variables
list (which is unique to each Lotus spreadsheet, was used for all
translations:

dictionary
names=n
separator=,
mustsurround=n
surroundchar="
quotechar='
extension=CSV
missing=
numeric=n
fixed=n
dictionary=dct
date=mm/dd/yyyy
variables
     1    132 c A
   134     12 R B
   147     12 R C
   160     12 R D
   173     12 R E
   186     12 R F
   199     12 R G
   212     12 R H
   225     12 R I
   238     12 R J
   251     10 c K
endvars


 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster