Roger Sage CDL / MELVYL SSH Library Home
SSHL Home Data, Gov't & GIS Home
Social Sciences Data Collection (SSDC) Help

Creating Subsets:

Creating Subsets and Using the Subset/Convert Form:

For additional definitions and explanations of terminology used here, consult the Glossary of Social Science Data and Computer Terms or search the glossary here:

Type a word or phrase:

Rows:

The first option on the generic extraction screen allows you to select which rows in the datafile will be in your output file. ("Rows" of a rectangular data file are also sometimes called "lines" or "cases" or "observations" or "records." In a social survey or poll, for instance, each row is usually one case or observation and contains all the responses to a questionnaire of one person.) Example:
This example uses a very small
sample dataset. The sample data has the following variables that we'll use for examples:

Variable Name Column Location
of Variable
Values of Variable
rec no 1-2 (record number)
first name 3-9 (actual name)
sex 10 1=female 2=male
year of birth 11-14 (actual year)
(other variables) 15-71 (various)

Now, if you wanted to select only cases for female respondents, you would want to be sure that column 10 (the location of the sex variable) has a "1" (code for female). (See filled in form below.)

Notice that you can create fairly complex criteria by changing the = to a > or a < or other comparisons. Here is a complete list of the comparison operators and how they work:

operatordoes this
=numeric equal
< numeric less than
>numeric greater than
<= numeric less than or equal to
>= numeric greater than or equal to
!= numeric not equal to
EQ text equal to
!EQ text not equal to

Notice also that you can combine several criteria with AND and OR.

Columns:
The second option on the generic extraction screen allows you to select which columns in the original datafile will be in your output file.

For most studies, you will need to consult the codebook and determine the column locations of the variables you want and enter those column numbers in the "Select columns to output" box on the form.

Some studies, however, instead of having an empty box here for you to type in column numbers and ranges, have a "pick list" of variables available. For these studies, all you have to do is use your browser options to select one or more vaviables from the pick list. SSDC will take care of the rest for you. You can look at an example of a pick list or look at a study that has a pick list for a full example.

The following is an example of a study that does not have a pick list.

As you can see above, a column or a sequence of columns contain the codes for "variables." Thus, in the sample dataset, column 10 contains a code of "1" to indicate a female respondent and a code of "2" to indicate a male respondent. Some variable don't require codes; for instance, columns 3-9 contain the first name of the respondent.

Example:

The form below is filled out to get just female respondents (column 10 = 1) and to print to our output dataset just the record number (columns 1-2), first name (columns 3-9), and sex (column 10) of each of these.

Output Options:
The third option on the generic extraction screen allows you to select the format of the subset you are creating and the maximum number of records in the subset.

We support several different output formats and will be adding others. (If there are formats you'd like to see, please let us know. Send mail to ssdb@weber.ucsd.edu.) Currently, we support ASCII; dBase; Excel Versions 3, 4, and 5; Lotus 1-2-3 versions 1 through 5; Quatro Pro; Rats; SAS for Windows and PC/DOS; SPSS for PC/DOS, Windows and Sun; SPSS Portable files; and Systat

Simply choose the format you want in the Format box.

The maximum number of records option is useful for testing. By setting the number of records to a small number you can quickly get a few records to verify that you are getting the subset you want and that the format you chose works with your software.

When you are ready to get the complete subset (all records that match the criteria you set in the form), be sure to set the "Record #" box to "All."


Go ahead! Look at the form and then click the "Get the Data" button at the bottom of the form to see how this works!

Select rows with column locations and values: ex:
1-4 = 1234
Select on Variables/Columns
Select columns to output ex:
1-3 6-8 12-20
Output Variables/Columns
Select the maximum number of records to output and export format (optional) Output Options
Record # Format
   

ROGER | Sage | CDL/MELVYL | UCSD Home | UCSD Libraries Home

Official Web Page of the University of California, San Diego
© Copyright 2000, UCSD, All Rights Reserved. This site may not be reproduced.
Social Sciences & Humanities Library, 9500 Gilman Drive, La Jolla, CA 92093, 858-534-3336
Email SSDC Webmaster