Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Downloading Tips  Glossary 
Data Migration Project Home

Processing and Quality Control

Original diskette files: All files on GPO distribution diskettes were copied and preserved in their original formats, with their original time and date stamps. The diskettes were mounted on the appropriate drive of a Windows 98 PC, and the files were copied to the PC hard drive. The files were transferred to a UNIX server in binary format. Software programs used to examine and transfer the original diskette files are:

  • V.EXE - A Windows file viewer with tools that can view files in text and Hex modes; available from www.fileviewer.com.
  • SSH Secure File Transfer - Windows client version 3.2; available from www.ssh.com.
  • "cio" - A UNIX Perl program that can be used to report extraneous ASCII characters (control and high) in ASCII document and data files. Specifically, the octal range of characters reported are [\000-\011] [\013-\037] [\177-\377]. You can get a copy of cio from our software page.
  • UNIX "tr" - A UNIX program that was used to remove DOS ASCII carriage returns (octal 015).

Compressed files: All original archived files were extracted using the DOS file decompression software included on the original diskettes. Some files are DOS self extracting files. All decompressed files are available in their original formats and retain their original time and date stamps. DOS software programs used to extract files from the original archived files are:

  • ARCE.COM - This DOS program will not run with the command processor on Windows versions using the NT kernel. Use Windows 98 or previous versions of Windows that use the DOS kernel.
  • PKUNZIP.EXE - A DOS program that usually works on all versions of Windows. If you have problems or get memory allocation errors, see above.
  • *.EXE - Self extracting DOS archive files may not run on Windows versions with the NT kernel. See above.

Processed documents and documentation files: DOS WordPerfect documents and data documentation files were opened and converted with Windows Word. The files were printed to portable document format (PDF) with Adobe Acrobat. PDF output was checked to ensure that the entire document was translated and that formatting was preserved. Minor reformatting was done with Word.

DOS ASCII text documents and data documentation files were examined with "cio". Extraneous ASCII (control and high) were replaced with blanks in these processed files. DOS carriage returns were removed with UNIX "tr". These "fixed" ASCII files were saved with standard .TXT file extensions.

Software programs used to process, translate and convert documents and documentation files are:

  • Word - Microsoft Word version 2002 (XP).
  • Acrobat - Adobe Acrobat version 5.0.5; available from www.adobe.com.
  • "fix" - A UNIX Perl program that can replace extraneous ASCII characters (control and high) with blanks. You can get a copy of "fix" from our software page.
  • "vi" - A UNIX ASCII file editor that was used to reformat some ASCII documents.
  • "tr" - A UNIX utility that was used to replace non-printing ASCII characters with other ASCII characters in some documentation files. Also used to remove DOS ASCII carriage returns.

Processed data tables and files: Whenever possible, DOS Lotus 123 and DOS dBase data tables were translated to ASCII comma separated value format (ASCII CSV). Delimited and non-delimited free and fixed format ASCII data files were also converted to ASCII CSV format. Variable names or labels were added to the ASCII CSV files whenever possible. Data translation logs were checked for data translation error messages. Data dictionary input and output statements are documented. Comparisons may have been made on translated data cell values.

If original data files exist in ASCII CSV format, they were examined with "cio" and extraneous ASCII (control and high) characters were replaced with blanks. DOS ASCII carriage returns were removed with UNIX "tr". These "fixed" data files have the standard .CSV file extension.

Software programs used to process, translate and convert DOS data tables and ASCII data files are:

  • DBMS/COPY - Version 7.0 for Windows translates spreadsheets, databases and ASCII data files to other formats. Requires data dictionary statements. Available from www.conceptual.com.
  • "fix" - A UNIX Perl program that can replace extraneous ASCII characters (control and high) with blanks. You can get a copy of "fix" from our software page.

File Processing and Quality Control: Each of the titles in the Data Migration Project have a separate file processing and quality control page that details specific file processing and quality control procedures.

  1. A Guide to Program design options ... file processing and quality control.
  2. Annual energy review database ... file processing and quality control.
  3. Annual report of major natural gas ... file processing and quality control.
  4. Assessment for at-risk youth [electronic resource] : a decision maker's ... file processing and quality control.
  5. Assessment for at-risk youth [electronic resource] : a practitioner's guide ... file processing and quality control.
  6. Commercial buildings energy consumption ... file processing and quality control.
  7. Epi info... file processing and quality control.
  8. Fuel oil and kerosene sales ... file processing and quality control.
  9. Health data on older Americans... file processing and quality control.
  10. Historical monthly energy review database... file processing and quality control.
  11. International coal statistics ... file processing and quality control.
  12. International mortality data base ... file processing and quality control.
  13. Market penetration models ... file processing and quality control.
  14. Monthly electric utility sales ... file processing and quality control.
  15. Monthly energy review database ... file processing and quality control.
  16. Monthly power plant report ... file processing and quality control.
  17. Natural gas annual ... file processing and quality control.
  18. Oak Ridge uranium market model ... file processing and quality control.
  19. Oil market simulation model (OMS) ... file processing and quality control.
  20. Performance profiles of major energy producers... file processing and quality control.
  21. Residential energy consumption ... file processing and quality control.
  22. State energy data system ... file processing and quality control.
  23. State energy price & expenditure data ... file processing and quality control.
  24. U.S. crude oil, natural gas, ... file processing and quality control.
  25. WEPS [electronic resource] : archival of world ... file processing and quality control.
  26. World integrated nuclear ... file processing and quality control.


 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster