Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

Processing and Quality Control

Title: Natural gas annual [electronic resource]

Original diskette files: All diskettes were mounted on a 5 1/4" drive of a Windows 98 PC and the diskette files were copied to the PC hard drive. To preserve the original file content, all files were transferred to a UNIX server in binary format and retain their original time and date stamps.

Processed documentation files: The original DOS ASCII documentation was examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. Form feeds (octal 14), nulls (octal 0) and the substitute control code (octal 32) were replaced with a blank using Perl "fix". DOS ASCII carriage return characters (octal 15) were removed with UNIX "tr". The "fixed" files were saved with .TXT file extensions.

Processed data files: The original DOS ASCII CSV data files were examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. Nulls (octal 0) and the substitute control code (octal 32) were replaced with a blank using Perl "fix". DOS ASCII carriage return characters (octal 15) were removed with UNIX "tr". The "fixed" files were saved with .CSV data file extensions.

Users are advised that the 1989 DOS ASCII CSV data file T18907.PRN has a "bad" data record (record or line 6). This record contains an octal 377. This octal ASCII code was replaced with a blank in the "fixed" file T18907.CORRUPTED.CSV.

Users are advised that the 1990.1 DOS ASCII CSV data file T19026.PRN is corrupted. The first eight records of the file contain WordPerfect for DOS binary code; the remaining records are ASCII CSV format. The last record contains the ASCII characters "??". In an attempt to salvage this file, it was converted in Microsoft Word to ASCII text format and the "fixed" file was saved as T19026.CORRUPTED.CSV.

Users are advised that the 1990.1 DOS ASCII CSV data file T19096.PRN has a "bad" data record (record or line 40). This record contains an octal 377. This octal ASCII code was replaced with a blank in the "fixed" file T19096.CORRUPTED.CSV.

You can view the "cio" output for the 1988.2, 1989, 1990.1, 1990.2, 1992.1, and 1992.2 original diskette files.


 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster