Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
Downloading Tips  Glossary  Processing and Quality Control 
Data Migration Project Home

Data Migration Project FAQ

- QUESTIONS -

General Questions:

  • Q1. What is the Data Migration Project?
  • Q2. What are the objectives of the Data Migration Project?
  • Q3. Who worked on the Data Migration Project?
Study Description Questions:
  • Q4. What is a study description?
  • Q5. What are the elements of a study description?
Copying and archiving the original diskette files:
  • Q6. How did you copy the original diskette files?
  • Q7. Are the files in your archive exact copies of the originals?
  • Q8. How did you transfer the original files to your archive?
Extracting or decompressing original diskette files:
  • Q9. How did you decompress the original diskette files?
  • Q10. Why can't I decompress the original diskette files?
File processing and quality control:
  • Q11. How did you process or convert the original files so that they can be used on a many different types of computers running different types of software programs?
  • Q12. Why did you remove ASCII control or high code in ASCII files?
  • Q13. Why didn't you convert or process some of the DOS Lotus 123 spreadsheet files?
  • Q14. Where can I find detailed information about file processing and quality control?
  • Q15. Why did you choose PDF format for documents?
  • Q16. Why did you choose ASCII CSV format for data files?
  • Q17. Why did you choose DBMS to translate or convert the data files to ASCII CSV format?
Displaying files in your web browser:
  • Q18. Why don't the project files display correctly in my web browser?
Downloading files with your web browser:
  • Q19. Why do some of text files that I downloaded contain characters that won't display properly in my PC text viewer?
  • Q20. Why can't I use or display some of the files that I downloaded to my PC?
  • Q21. Downloading each file can be time consuming. Why didn't you create a compressed file or archive of all processed files for each title?
Other GPO diskette archives:
  • Q22. Are there other archives for GPO diskettes?
  • Q23. Do federal agency web sites archive the data on these diskettes?

- ANSWERS -

Q1. What is the Data Migration Project?

The UCSD Library Data Migration Project preserves legacy Government Printing Office machine readable computer files that were originally distributed on Microsoft DOS diskettes to the University of California, San Diego Library by the Government Printing Office (GPO).

Q2. What are the objectives of the Data Migration Project?

  • Provide a permanent archive for the original Microsoft DOS files.
  • Provide access to the archive.
  • Whenever possible, extract and/or process the files so they can be utilized by a variety of software programs running on many different computers.
  • Catalog the archive titles in our OPAC, ROGER

Q3. Who worked on the Data Migration Project?

The project is a joint effort of the UCSD Libraries' Government Information and Social Sciences Data Collection staff.

Q4. What is a study description?

The study description describes the documentation and/or data on the original GPO diskettes, file extraction and processing, minimum software requirements, a list of files that can be downloaded, related publications and a bibliographic citation.

Q5. What are the elements of a study description?

  • Title: The UCSD Pactech/Roger catalog title (MARC field 245 00).
  • Distribution Media: The type of media the data was originally distributed on.
  • SuDoc Number: Superintendent of Documents classification number.
  • Abstract: A brief description of the title.
  • Extent of Collection: - The files contained on the distribution media.
  • Data Conversion: Describes how the data was extracted and/or processed.
  • Minimum Software Requirements: Software required to use the original or processed documentation and data files.
  • Files: A list of all original, extracted and processed files by filename, type and format. These files can be downloaded.
  • Related Publications: Documentation for the original GPO diskettes, documentation and data available in other formats, historical and current (if any) documentation and data files.
  • Bibliographic Citation: A citation for the original GPO diskettes.

Q6. How did you copy the original diskette files?

A 5 1/4" inch floppy disk drive was mounted on a desktop computer running Microsoft Windows 98 (version 4.10.2222A). The files on the diskettes were copied to the hard disk on the computer.

Q7. Are the files in your archive exact copies of the originals?

Yes.

Q8. How did you transfer the original files to your archive?

Original diskette files were transferred in binary mode to a UNIX server using Windows SSH Secure Shell Client (version 3.2.0, Build 267). The mode was selected to ensure that the original file types were correctly transferred with their original time and date stamps.

Q9. How did you decompress the original diskette files?

Compressed diskette files were extracted with the original DOS decompression software supplied on the original diskettes. In some cases, the compressed files were self-extracting. Extracted files retain their original time and date stamps.

Q10. Why can't I decompress the original diskette files?

If you cannot extract the original compressed files, you are probably using a Microsoft operating system that uses the NT kernel, rather than the DOS kernel. Try using Microsoft Windows 98 or an earlier Microsoft Windows or DOS operating system.

Q11. How did you process or convert the original files so that they can be used on a many different types of computers running different types of software programs?

Original diskette DOS ASCII documents that have been processed are available in "fixed" ASCII text format. "Fixed" means replacing some ASCII control or high code with a blank and removing DOS carriage return characters. DOS WordPerfect documents were printed to Adobe Acrobat Portable Document File (PDF) format. Whenever possible, original data tables and data files were converted to ASCII comma separated value (ASCII CSV) files. Variable names or labels may have been added to the ASCII CSV files.

Q12. Why did you remove ASCII control or high code in ASCII files?

These ASCII codes (octal 0-11,13-37,177-377) are not required to display ASCII text files or use ASCII text data files. Moreover, they can cause display problems with some ASCII text viewers/editors and will import meaningless records into some spreadsheet or database software applications.

Q13. Why did'nt you convert or process some of the DOS Lotus 123 spreadsheet files?

Some of the DOS Lotus 123 data tables or spreadsheets have embedded macros. These macros are unique to Lotus and cannot be converted or translated to ASCII or other spreadsheet software. This information is noted in the study description.

Q14. Where can I find detailed information about file processing and quality control?

See, Processing and Quality Control

Q15. Why did you choose the PDF format for some documents?

The PDF format is a published standard and preserves the format of the original DOS WordPerfect files. The Adobe Acrobat PDF Reader is freely available (www.adobe.com) and runs on most computers.

Q16. Why did you choose ASCII CSV format for data files?

ASCII files can be utilized by a variety of computers. Moreover, ASCII CSV data files can be imported to most spreadsheet or database software.

Q17. Why did you choose DBMS to translate or convert the data files to ASCII CSV format?

The only function of DBMS is to translate or convert data files from one format to another. Moreover, DBMS directly reads and writes native binary and ASCII data files and requires data input and output data dictionaries. This methodology is more rigorous than using statistical or spreadsheet software to parse and convert the original data.

Q18. Why don't the project files display correctly in my web browser?

Your PC web browser software works with the server Hypertext Transport Protocol configuration software to display files correctly in your browser. The server uses the file extension to recognize and display files in your browser. Since many of the original GPO files do not use file extensions, or use non-standard file extensions, many original files will not display properly. You can download the files and use your PC software to display or import the files. If you are using Windows to display ASCII text files, Wordpad will work better than Notepad. Processed files use standard file extensions and should display properly in your web browser.

Q19. Why do some of text files that I downloaded contain characters that won't display properly in my PC text viewer?

As far as we can tell, many of the original diskette files were originally stored in EBCDIC format on tape media and used on IBM mainframe computers. The translated IBM EBCDIC to ASCII files on many of the original diskettes contain control or high ASCII (nulls, backspaces, carriage returns, form feeds, the substitute control character, and other non-printing codes) code. These characters may not alter documentation or data files to the extent that you cannot use them on your PC. These characters have been eliminated in the processed files.

Q20. Why can't I use or display some of the files that I downloaded to my PC?

See, Downloading Tips.

Q21. Downloading each file can be time consuming. Why didn't you create a compressed file or archive of all processed files for each title?

We don't know of a compression program that will run on every computer operating system.

Q22. Are there other archives for GPO diskettes?

The Floppy Disk Project (FDP) at Indiana University Bloomington maintains an archive of GPO floppy disks. All files are Windows compressed self-extracting executable files. The files at this archive have not been extracted or processed.

Q23. Do federal agency web sites archive any of the data on these GPO diskettes?

Some agencies maintain archives of historical data that include data originally distributed on these GPO diskettes. The "Related Publications:" section of each title study description contain links to these archives.


 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster