- QUESTIONS -
General Questions:
- Q1. What is the Data Migration Project?
- Q2. What are the objectives of the Data Migration Project?
- Q3. Who worked on the Data Migration Project?
Study Description Questions:
- Q4. What is a study description?
- Q5. What are the elements of a study description?
Copying and archiving the original diskette files:
- Q6. How did you copy the original diskette files?
- Q7. Are the files in your archive exact copies of the
originals?
- Q8. How did you transfer the original files to your
archive?
Extracting or decompressing original diskette files:
- Q9. How did you decompress the original diskette files?
- Q10. Why can't I decompress the original diskette files?
File processing and quality control:
- Q11. How did you process or convert the original files
so that they can be used on a many different types of computers running different
types of software programs?
- Q12. Why did you remove ASCII control or high code
in ASCII files?
- Q13. Why didn't you convert or process some of the DOS
Lotus 123 spreadsheet files?
- Q14. Where can I find detailed information about file
processing and quality control?
- Q15. Why did you choose PDF format for documents?
- Q16. Why did you choose ASCII CSV format for data files?
- Q17. Why did you choose DBMS to translate or convert the
data files to ASCII CSV format?
Displaying files in your web browser:
- Q18. Why don't the project files display correctly in my
web browser?
Downloading files with your web browser:
- Q19. Why do some of text files that I downloaded contain
characters that won't display properly in my PC text viewer?
- Q20. Why can't I use or display some of the files that
I downloaded to my PC?
- Q21. Downloading each file can be time consuming. Why
didn't you create a compressed file or archive of all processed files for each
title?
Other GPO diskette archives:
- Q22. Are there other archives for GPO diskettes?
- Q23. Do federal agency web sites archive the data
on these diskettes?
- ANSWERS -
Q1. What is the Data Migration Project?
The UCSD Library Data Migration Project preserves legacy
Government Printing Office machine readable computer files that were originally
distributed on Microsoft DOS diskettes to the University of California, San
Diego Library by the Government Printing Office (GPO).
Q2. What are the objectives of the Data Migration Project?
- Provide a permanent archive for the original Microsoft
DOS files.
- Provide access to the archive.
- Whenever possible, extract and/or process the files so they can be utilized
by a variety of software programs running on many different computers.
- Catalog the archive titles in our OPAC, ROGER
Q3. Who worked on the Data Migration Project?
The project is a joint effort of the UCSD Libraries'
Government Information and Social Sciences Data Collection staff.
Q4. What is a study description?
The study description describes the documentation
and/or data on the original GPO diskettes, file extraction and processing,
minimum software requirements, a list of files that can be downloaded,
related publications and a bibliographic citation.
Q5. What are the elements of a study description?
- Title: The UCSD Pactech/Roger catalog title (MARC field 245 00).
- Distribution Media: The type of media the data was
originally distributed on.
- SuDoc Number: Superintendent of Documents classification
number.
- Abstract: A brief description of the title.
- Extent of Collection: - The files contained on the distribution
media.
- Data Conversion: Describes how the data was extracted and/or
processed.
- Minimum Software Requirements: Software required to use
the original or processed documentation and data files.
- Files: A list of all original, extracted and processed files
by filename, type and format. These files can be downloaded.
- Related Publications: Documentation for the original GPO diskettes,
documentation and data available in other formats, historical and current
(if any) documentation and data files.
- Bibliographic Citation: A citation for the original GPO
diskettes.
Q6. How did you copy the original diskette files?
A 5 1/4" inch floppy disk drive was mounted on a desktop computer running
Microsoft Windows 98 (version 4.10.2222A). The files on the diskettes
were copied to the hard disk on the computer.
Q7. Are the files in your archive exact copies of
the originals?
Yes.
Q8. How did you transfer the original files to your
archive?
Original diskette files were transferred in binary
mode to a UNIX server using Windows SSH Secure Shell Client
(version 3.2.0, Build 267). The mode was selected to ensure that the
original file types were correctly transferred with their original
time and date stamps.
Q9. How did you decompress the original diskette files?
Compressed diskette files were extracted with the original DOS
decompression software supplied on the original diskettes. In some cases,
the compressed files were self-extracting. Extracted files retain their
original time and date stamps.
Q10. Why can't I decompress the original diskette files?
If you cannot extract the original compressed files, you are probably
using a Microsoft operating system that uses the NT kernel, rather
than the DOS kernel. Try using Microsoft Windows 98 or an earlier
Microsoft Windows or DOS operating system.
Q11. How did you process or convert the original files
so that they can be used on a many different types of computers running different
types of software programs?
Original diskette DOS ASCII documents that have been processed are available
in "fixed" ASCII text format. "Fixed" means replacing some ASCII control
or high code with a blank and removing DOS carriage return characters. DOS
WordPerfect documents were printed to Adobe Acrobat Portable Document File
(PDF) format. Whenever possible, original data tables and data files were
converted to ASCII comma separated value (ASCII CSV) files. Variable names or
labels may have been added to the ASCII CSV files.
Q12. Why did you remove ASCII control or high code
in ASCII files?
These ASCII codes (octal 0-11,13-37,177-377) are not required to display
ASCII text files or use ASCII text data files. Moreover, they can cause
display problems with some ASCII text viewers/editors and will import
meaningless records into some spreadsheet or database software applications.
Q13. Why did'nt you convert or process some of the DOS
Lotus 123 spreadsheet files?
Some of the DOS Lotus 123 data tables or spreadsheets have embedded macros.
These macros are unique to Lotus and cannot be converted or translated to
ASCII or other spreadsheet software. This information is noted in the study
description.
Q14. Where can I find detailed information about file
processing and quality control?
See, Processing and Quality Control
Q15. Why did you choose the PDF format for some documents?
The PDF format is a published standard and preserves the format of the original
DOS WordPerfect files. The Adobe Acrobat PDF Reader is freely available
(www.adobe.com) and runs on most computers.
Q16. Why did you choose ASCII CSV format for data files?
ASCII files can be utilized by a variety of computers. Moreover, ASCII CSV data files
can be imported to most spreadsheet or database software.
Q17. Why did you choose DBMS to translate or convert the
data files to ASCII CSV format?
The only function of DBMS is to translate or convert data files from
one format to another. Moreover, DBMS directly reads and writes native
binary and ASCII data files and requires data input and output data
dictionaries. This methodology is more rigorous than using statistical
or spreadsheet software to parse and convert the original data.
Q18. Why don't the project files display correctly
in my web browser?
Your PC web browser software works with the server Hypertext Transport Protocol
configuration software to display files correctly in your browser. The server uses
the file extension to recognize and display files in your browser. Since many of the
original GPO files do not use file extensions, or use non-standard file extensions,
many original files will not display properly. You can download the files and use your PC
software to display or import the files. If you are using Windows to display ASCII
text files, Wordpad will work better than Notepad. Processed files use standard
file extensions and should display properly in your web browser.
Q19. Why do some of text files that I downloaded contain
characters that won't display properly in my PC text viewer?
As far as we can tell, many of the original diskette files were originally
stored in EBCDIC format on tape media and used on IBM mainframe computers.
The translated IBM EBCDIC to ASCII files on many of the original diskettes
contain control or high ASCII (nulls, backspaces, carriage returns, form feeds,
the substitute control character, and other non-printing codes) code.
These characters may not alter documentation or data files to the extent that
you cannot use them on your PC. These characters have been eliminated in the
processed files.
Q20. Why can't I use or display some of the files that I
downloaded to my PC?
See, Downloading Tips.
Q21. Downloading each file can be time consuming. Why
didn't you create a compressed file or archive of all processed files for each title?
We don't know of a compression program that will run on every computer operating
system.
Q22. Are there other archives for GPO diskettes?
The
Floppy Disk Project (FDP) at Indiana University Bloomington maintains
an archive of GPO floppy disks. All files are Windows compressed
self-extracting executable files. The files at this archive have not
been extracted or processed.
Q23. Do federal agency web sites archive any of the
data on these GPO diskettes?
Some agencies maintain archives of historical data that include data
originally distributed on these GPO diskettes. The "Related Publications:"
section of each title study description contain links to these archives.