Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

File Processing and Quality Control

Title: Commercial buildings energy consumption survey. Public use data diskettes [electronic resource]

Original diskette files: All diskettes were mounted on a 5 1/4" drive of a Windows 98 PC and the diskette files were copied to the PC hard drive. To preserve the original file content, all files were transferred to a UNIX server in binary format and retain their original time and date stamps.

Compressed files: The 1989 DOS compressed data files were extracted using the file compression software included on the original diskettes (DOS ARCE.COM) on a Windows 98 PC. Users should note that Windows OS versions that use the NT (rather than DOS) kernel will not run the DOS ARCE program. The files were transferred in binary format with their original time and date stamps.

The 1992 DOS compressed files are Windows self-extracting files. These files were executed on a Windows PC. All self-extracting decompressed files were transferred in binary format and retain their original time and date stamps.

Processed documentation files: The original DOS ASCII documentation files were examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. Form feeds (octal 14) and the substitute control codes (octal 32) were replaced with a blank using Perl "fix". Octal 376 characters were replaced with a "-" (octal 55) and DOS carriage returns (octal 15) were deleted with Unix "tr" in the documentation files. The "fixed" files were saved with .TXT file extensions.

SAS data definitions are part of the original ASCII documentation files. These data definitions were "cut" from the documentation files and are available separately as ASCII text files.

Processed data tables and files: The original 1989 ASCII data files were examined with Perl "cio" (check-it-out) to identify extraneous control and high ASCII characters. The substitute control code (octal 32) was replaced with a blank using Perl "fix". DOS carriage return characters (octal 15) were removed with UNIX "tr". The "fixed" files were saved with .CSV data file extensions.

The original diskette 1992 ASCII CSV files (BC92*.TXT) are not "fixed" or translated because these files do not include variable names. The 1992 dBase data tables, which include variable names, are translated to ASCII CSV format. Users are advised that the original 1992 ASCII CSV files contain the substitute control code (octal 32) as the last record or line in the files.

You can view the 1989 and 1992 "cio" output for the original ASCII diskette files.

The 1992 dBase data tables were translated to ASCII comma separated value (ASCII CSV) data file format with Windows DBMS/COPY version 7 (www.conceptual.com). Variable names are part of the dBase tables and are included in the translated files. DBMS/COPY translation logs were checked for data translation error messages. No comparisons were made on the actual translated data cell values.

The DBMS data dictionary output statement for file BC92F01D.CSV follows. Users should note that this statement is specific for this file only and that each ASCII CSV file has a unique list of variable names that correspond to each dBase input file.

dictionary
names=y
separator=,
mustsurround=n
surroundchar="
quotechar='
extension=CSV
missing=
numeric=n
fixed=n
dictionary=dct
date=mm/dd/yyyy
variables
     1      5 c BLDGID5
     7      1 c REGION5
     9      1 c CENDIV5
    11      1 c MSA5
    13      1 c CLIMATE5
    15      9 R SQFT5
    25      2 c SQFTC5
    28      3 R NFLOOR5
    32      1 c PORBLG5
    34      3 R NUMBLG5
    38      4 R YRCON5
    43      2 R MONCON5
    46      2 c YRCONC5
    49      1 c EXPRED5
    51      9 R AMTDIF5
    61      2 c PBA5
    64      1 c HT15
    66      1 c HT25
    68      1 c COOL5
    70      1 c WATR5
    72      1 c COOK5
    74      1 c MANU5
    76      1 c GENR5
    78      1 c ELUSED5
    80      1 c NGUSED5
    82      1 c FKUSED5
    84      1 c STUSED5
    86      1 c HWUSED5
    88      3 R HEATP5
    92      3 R COOLP5
    96      3 R WKHRS5
   100      1 c WKHRSC5
   102      5 R TOTWK5
   108      2 c TOTWKC5
   111      5 R NWKER5
   117      2 c NWKERC5
   120      2 c WLCNS5
   123      2 c RFCNS5
   126      2 c BLDSHP5
   129      5 R BLDLEN5
   135      5 R BLDWID5
   141      1 c ATTWLL5
   143      1 c GLSSPC5
   145      3 R LTOHRP5
   149      8 R ADJWT5
   158      1 c PAIR5
   160      2 c STRATUM5
endvars

 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster