Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
FAQ  Glossary  Processing and Quality Control 
Data Migration Project Home

Downloading Tips

Many of the files in this archive do not have file extensions or do not use standard file extensions. If you download a file from this archive and it looks odd, or you cannot use it or read it, we have some tips for you.

Web browsers use the Hypertext Transfer Protocol (HTTP) to transmit files from one place to another. When web browsers download files they must choose between two file types - binary or text. Web browsers use file extensions to choose between binary and ASCII. If a file does not have a file extension, or the browser does not recognize the file extension, the browser will make a "default" choice, and the choice may be incorrect.

Netscape (version 4.78) - the default choice is text
Internet Explorer (version 5.5) - the default choice is binary
Mozilla (version 0.9.5) - the default choice is binary

If a binary file is downloaded as text, you will get a corrupted file. If a text file is downloaded as binary, the lines in the file may "run-on"; i.e. the lines in the text file may not wrap properly. If you are using a Windows PC, use Wordpad rather than Notepad.

Our study descriptions contain a table box that includes the file names, types, and formats. Unless the format is specifically specified as ASCII, you should assume the file is binary. In the example below, the decompressed file CBECS89.DOS is the only ASCII file. The other files are binary.

Original Diskette Files
(Name-Type-Format)
Decompressed Files
(Name-Type-Format)
Processed Files
(Name-Type-Format)
CBECS89.ARC - compressed file - DOS ARCE CBECS89.DOS - documentation file - ASCII CBECS89.PDF - documentation file - PDF

A binary file is a file whose content must be interpreted by a program that understands in advance exactly how it is formatted. Examples are Word files (*.doc), Excel files (*.xls) and PDF files (*.PDF). In terms of transmitting files from one place to another, browsers don't attempt to look within or change these files, but just pass them along as chunks of zeros and ones (0,1).

ASCII (American Standard Code for Information Interchange) is the most common format for text files on computers. UNIX and DOS-based operating systems use ASCII for text files, but the formatting codes for "end-of-line" are not the same on UNIX and DOS. When you transfer ASCII text files between these systems, the transfer protocol will either add or delete a "carriage-return" code.


 

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster