ASCII - American Standard Code for Information Interchange is the
most common format for text files in computers and on the Internet.
In an ASCII file, each alphabetic, numeric, or special character is
represented with a 7-bit binary number (a string of seven 0s or 1s).
128 possible characters are defined. UNIX and DOS-based operating systems
use ASCII for text files. Windows NT and 2000 use a newer code, Unicode.
IBM's S/390 systems use a proprietary 8-bit code called EBCDIC.
ASCII control characters -
The first 32 ASCII characters (octal codes 000 through 017) form a
special set of non-printing characters called the control characters.
They are called control characters because they perform various legacy printer/display
control operations rather than displaying symbols. Unfortunately, different
control characters perform different operations on different output devices
and there is very little standardization among output devices. With the
exception of the line feed character we have replaced these "extraneous"
control characters in the processed ASCII documents and data files with a
blank so that the processed files can be used with a variety of software on
many kinds of computers.
ASCII high characters - Extra or extended characters (octal codes 128-177)
added to the original ASCII character set. These can be foreign language
characters, math characters, symbols and etc. Unfortunately, extended
character sets vary and display differently on various ASCII file viewers/editors.
They can also cause meaningless records to be imported into statistical or
database software. Most of these characters in the original diskette files are
probably untranslated EBCDIC-ASCII characters. These high or extended
"extraneous" characters have been replaced with a blank or an english
alpha/numeric character in the processed files.
CSV -
A comma separated value data file is a physical ASCII file structure that
contains records whose values are delimited or separated by commas.
Within the context of the Data Migration Project, many of the
original diskette data files have been translated to ASCII CSV format
so they can be used by a variety of software on many types of
computers.
DOS -
The MS-DOS Disk Operating System was the Microsoft-marketed version of the first
widely installed operating system in personal computers. It was essentially
the same operating system that Bill Gates's young company developed for
IBM as Personal Computer Disk Operating System (PC-DOS). The Data Migration
Project original diskette files were written to be used on DOS computers.
data -
In computing, data is information that has been translated into a form that
is more convenient to move or process. Relative to today's computers and
transmission media, data is information converted into binary digital form.
database -
A database is a collection of data that is organized so that its contents
can easily be accessed, analyzed and updated. In the context of the
data migration project, databases are usually in the DOS dBase format.
data file -
In the context of the Data Migration Project, a data file is an ASCII
text file that has a header record consisting of variable names or values
that is followed by "n" records of data values. Records may contain actual
numbers or coded values.
data table -
In the context of the Data Migration Project, a data table can be an
ASCII text file that has many header records followed by "n" records
of data values. A data table may also be in DOS Lotus 123 (spreadsheet)
format. Typically, data tables are prepared for viewing data, rather
than processing or analyzing data.
EBCDIC - EBCDIC (pronounced either "ehb-suh-dik" or "ehb-kuh-dik") is
a binary code for alpha-numeric characters that IBM developed for its larger
operating systems. It is the code for text files that is used in IBM's OS/390
operating system for its S/390 servers and that thousands of corporations use
for their legacy applications and databases. In an EBCDIC file, each alphabetic
or numeric character is represented with an 8-bit binary number(a string of eight
0's or 1's). 256 possible characters (letters of the alphabet, numerals, and
special characters) are defined. We believe that many of the ASCII diskette
files are translations of the EBCDIC character set used on government
agency computers.
extension -
In computer operating systems, a file name extension is an optional addition
to the file name in a suffix of the form ".xxx" where "xxx" represents a
limited number of alphanumeric characters depending on the operating system.
The file name extension helps an application program recognize whether a file
is a type that it can work with.
file -
In data processing, a file is a related collection of records. For example,
you might put the records you have on each of your customers in a file.
In turn, each record would consist of fields for individual data items,
such as customer name, customer number, customer address, and so forth.
By providing the same information in the same fields in each record
(so that all records are consistent), your file will be easily accessible
for analysis and manipulation by a computer program.
kernel -
The kernel is the essential center of a computer operating system, the core
that provides basic services for all other parts of the operating system.
Windows NT and 2000 use the NT kernel, while earlier versions of Windows
use the DOS kernel. Some of the original DOS diskette applications will not
run on versions of Windows that use the NT kernel.
legacy application -
In information technology, legacy applications and data are those that
have been inherited from languages, platforms, and techniques earlier
than current technology. In the past, much programming has been written
for specific operating systems. Currently, efforts are underway to
migrate legacy applications to new programming languages and
operating systems that follow open or standard programming interfaces.
Theoretically, this will make it easier in the future to update applications
without having to rewrite them entirely and will allow applications to run
on any operating system.
octal -
Octal (pronounced AHK-tuhl, from Latin octo or "eight") is a term that
describes a base-8 number system. An octal number system consists
of eight single-digit numbers: 0, 1, 2, 3, 4, 5, 6, and 7. The number
after 7 is 10. The number after 17 is 20 and so forth. In computer programming,
the octal equivalent of a binary number is sometimes used to represent it
because it is shorter.
PDF -
Portable Document Format is a file format that has captured all
the elements of a printed document as an electronic image that you
can view, navigate, print, or forward to someone else. PDF files are
created using Adobe Acrobat, Acrobat Capture, or similar products.
To view and use the files, you need the free Acrobat Reader, which
you can easily download. Once you've downloaded the Reader, it will
start automatically whenever you want to look at a PDF file.
record -
A physical record is a chunk of data that has a specified and constant size
in bytes or that is clearly delimited from other records by a newline character
or sector of a disk or other means identifiable to a computer program reading
the file.
spreadsheet - A spreadsheet is a sheet of paper that shows accounting
or other data in rows and columns; a spreadsheet is also a computer application
program that simulates a physical spreadsheet by capturing, displaying, and
manipulating data arranged in rows and columns. The spreadsheet is one of
the most popular uses of the personal computer.
text file - A text file is a human-readable sequence of characters and
the words they form that can be encoded into computer-readable formats such
as ASCII.
Unicode - Unicode is an entirely new idea in setting up binary codes
for text or script characters. Officially called the Unicode Worldwide Character
Standard, it is a system for "the interchange, processing, and display
of the written texts of the diverse languages of the modern world." It
also supports many classical and historical texts in a number of
languages. Currently, the Unicode standard contains 34,168 distinct coded
characters derived from 24 supported language scripts. These
characters cover the principal written languages of the world.