The DualLabs Census Bureau 1960 State census tract data files use population and housing counts of zero to identify counts that are 1) zero, 2) suppressed or 3) not tabulated. The printed Census volume uses what the Bureau calls a "leader" (an ellipsis: "...") to identify any of these three counts. Neither source adequately distinguishes counts that are actually zero from counts the Census Bureau suppressed or did not tabulate.
The SSDC staff created a revised DualLabs data file for San Diego County that resolves this problem.
The SSDC data file uses a missing value indicator where values have been suppressed or not tabulated and the number zero to indicate a value of zero.
During the revision process, staff discovered additional motives for revising the DualLabs data file:
- Researchers must use the printed Census volume to identify suppressed counts in the DualLabs data file.1
- DualLabs labels for "Puerto Rican or Spanish Surname persons" should be changed to "Spanish Surname persons" in San Diego, California.2
- The suppression rule for "Nonwhites" and "Spanish Surname persons" is not applied to counts for these population groups in the DualLabs data file.
There are three versions of the SSDC data file. All versions change DualLabs labels for "Puerto Rican or Spanish Surname persons" to "Spanish Surname persons". In all three versions, the number zero is used only to indicate a count of zero, never to indicate a suppressed or not tabulated count. Not tabulated and suppressed values are indicated differently:
- Excel spreadsheet: Leaders "..." are used to indicate counts that are suppressed or not tabulated.
- ASCII data file: Values of 99999 are used to indicate counts that are suppressed or not tabulated.
- SPSS Portable file: System missing values (discrete values of 99999) are used to indicate counts that are suppressed or not tabulated.
Tabulation Rules
The Census Bureau developed and used "tabulation rules" that indicate whether a value for a particular cell in a table will be reported or not reported. Five tabulation rules are documented on page three of the DualLabs codebook.SSDC staff used the tabulation rules to change zeros in the DualLabs data file to missing value indicators in the SSDC data file, where appropriate.
Tabulation Rules Applied to the SSDC Data File
Tabulation rules are applied to tracts in places with more than or less than 50,000 persons. The only place with more than 50,000 persons in San Diego County is San Diego City. All other places (Chula Vista, El Cajon, La Mesa and National City) have less than 50,000 persons. The following tabulation rules were applied to the SSDC data file:Tabulation Rule 1 - In tables 7 and 10, counts are only included in tracts of places with 50,000 or more inhabitants. Therefore, in tables 7 and 10, in tracts of places with less than 50,000 persons, SSDC staff replaced zeros with missing value indicators in the SSDC data file.
- Exceptions - Census tracts 107-112 contain counts with values other than zero in the DualLabs data file. These counts are included in the SSDC data file.
Tabulation Rule 2 - In table 35, counts are only included for rural farm and non-farm residents in tracts of places with less than 50,000 inhabitants. Therefore, in table 35, in tracts of places with more than 50,000 persons, SSDC staff replaced zeros with missing value indicators in the SSDC data file.
Tabulation Rule 3 - No changes were made to Table 52 in the SSDC data file.
Tabulation Rule 4 - In table 56, counts are included only in tracts of places with 50,000 or more inhabitants. Therefore, in table 56, in tracts of places with less than 50,000 persons, SSDC staff replaced zeros with missing value indicators in the SSDC data file.
Tabulation Rule 5 - In tables 58 and 59, counts are included only in tracts of places with less than 50,000 inhabitants. Therefore, in tables 58 and 59, in tracts of places with more than 50,000 persons, SSDC staff replaced zeros with missing value indicators in the SSDC data file.
100% Suppression Rules
The Census Bureau suppressed counts of less than 5 persons or housing units in 100% complete count tables for reasons of confidentiality.Although individual cells in a table may be suppressed, in the printed volume the Bureau did not suppress totals for a tract. It is possible, therefore, to use the presence of totals in the printed volume to determine if cell values of zero in the DualLabs file were suppressed or were, in fact, zero.3
Figure 1 illustrates county and census tracts counts available in a typical table in the printed volume.
Printed Volume Table - Characteristics of the Population Subject San Diego County San Diego Total Tract A Tract B Tract C Tract D Total Population 2141 902 1235 4 ... Household Relationship Population in Households 1694 807 883 ... ... Head of primary family 56 27 29 ... ... Primary individual 514 236 278 ... ... Wife of head 821 370 451 ... ... Related single child under 18 255 155 100 ... ... Other relative of head 28 11 17 ... ... Non-relative of head 20 8 8 ... ... Population in Group Quarters 447 95 352 ... ... Inmate of institution 148 22 126 ... ... Other 299 73 226 ... ... Figure 1 The first row has total population counts for the county and tracts in the county; the value 2141 ("Total, San Diego County") is a sum of the "Total Population" values for all tracts in the county. The "leader" in Tract D is an indicator of a value of zero.
In Figure 1, the two yellow cells are subtotals ("population in households" and "population in group quarters") in Tract C that have been suppressed because the total population of Tract C is less than 5. Subject counts in the remaining rows in Tract C are also suppressed. The "leaders" in the Tract D column actually indicate zero because the total population count in Tract D is zero.
Table Name and Label DualLabs Table 000 - Household Relationship Table Cell Label Head of primary family Primary individual Wife of head Related single child under 18 Other relative of head Non-relative of head Inmate of Institution Other in group quarters Table Cell Name T000001 T000002 T000003 T000004 T000005 T000006 T000007 T000008 Census Tract A 27 236 370 155 11 8 22 73 Census Tract B 29 278 451 100 17 8 126 226 Census Tract C 0 0 0 0 0 0 0 0 Census Tract D 0 0 0 0 0 0 0 0 Figure 2 Figure 2 is an example of data In the DualLabs file. In this example, the zeros in Census Tracts C and D could indicate either an actual value of zero or could indicate that the actual value has been suppressed. By using the printed census volume (as illustrated in figure 1), we can see that, in Tract C the zeros should be replaced with missing value indicators because the total population count for Tract C is 4. In addition, we can see that, in Tract D, zeros would not be replaced because the printed volume reveals that Tract D's total population is zero.
100% Suppression Rules Applied to the SSDC Data File After applying the tabulation rule changes explained above, SSDC staff determined which zeros in the DualLabs file were actual values of zero and which indicated suppressed values by doing the following:
- Computed a total for each census tract in every table.
- If the computed total tract count was zero, staff looked up the value for that tract in row 1 of the printed volume.
- If the tract count was greater than zero in the printed volume, staff replaced zeros with missing value indicators in the SSDC data file.
SSDC staff followed the following procedure to make this determination:
Suppression Rule 1 - In tables 1, 8, and 11 through 16, if total population is greater than 0 and less than 5 (1-4) cells with value of zero were replaced with missing value indicators in the SSDC data file.
- SSDC staff computed total population for each census tract by summing all table cells across each tract in Tables 1 and 8 in the DualLabs file.
- Tract sums of zero were compared to the same tract total population counts in Table P-1 in the printed volume. Parts of census tracts 106 (place description 4, place code 9999), 119 (place description 7, place code 1885) and 120 (place description 7, place code 1885) had suppressed values greater than "..." (3, 4, and 4 respectively) in the printed volume. The total population of all census tracts in the DualLabs data file (county total=1,033,000) plus the sum of the suppressed census tract counts in the printed volume (11) is equal to the county total of Table P-1 (1,033,011) in the printed volume.
Suppression Rule 2 - In tables 2 through 7, if total housing unit count is greater than 0 and less than 5 (1-4), cells with a value of zero DualLabs data file were replaced with missing value indicators in the SSDC data file.
- Total housing units for each census tract were computed by summing all table cells across each tract in Table 4 in the DualLabs file. Tract sums of zero were compared the same tract total housing unit counts in Table H-1 in the printed volume. Parts of census tracts X-97 (place description code 7, place code 2475), X-98 (place description code 7, place code 2475), 119 (place description code 7, place code 1885), 120 (place description code 7, place code 1885), 132 (place description code 3, place code 2475), 133 (place description code 3, place code 2475), had suppressed values greater than "..." (2, 2, 2, 3, 3, and 2 respectively) in the printed volume. The total housing units count of all census tracts in the DualLabs data file (county total=339,426) plus the sum of the suppressed counts in the printed volume (14) is 339,440. The total county housing units count in Table H-1 in the printed volume is 339,442. The difference of 2 housing unit counts is due to the counts assigned to census tracts 122 and 124 in the printed volume (921+1719=2640) and the DualLabs file (960+1678=2638).4
- Exceptions - Parts of census tracts G-32B (place description code 7, place code 2475) and U-85H (place description code 7, place code 2475) have total housing unit counts of 4 and the DualLabs file does not suppress housing unit table cell counts for these tracts. These DualLabs counts are not replaced in the SSDC data file.
Suppression Rule 3a - In table 7, if total housing unit count is 5 or more but total count of "owner occupied units reporting value" is greater than 0 but less than 5 (1-4), then cell counts of zero were replaced with missing value indicators in the SSDC data file.
- Note that Table 7 was previously subject to Tabulation Rule 1 and Suppression Rule 2. Census tract I-37 and part of census tract L-54 (place description code 3, place code 2475, crews of vessels code 0) were not previously suppressed.
- Note also that the total occupied housing unit count for the county in DualLabs Table 7 is not relevant (many counts are not tabulated or suppressed). The printed volume uses counts from DualLabs Table 7 (100% counts) and Table 53 (25% sample counts) for census tract owner occupied units reporting value in Table H-2. The total county count for Table H-2 in the printed volume is derived from the sum of the tract counts of owner occupied units reporting value in DualLabs sample Table 53 (155,055). See Tabulation Rule 1 for exceptions to Suppression Rule 3a.
Suppression Rule 3b - In table 10, if total count of housing units is 5 or more but total count of "renter occupied units reporting rent" is less than 5, then cells with a value of zero in the DualLabs file were replaced with missing value indicators in the SSDC data file.
- Note that table 10 was previously subject to Tabulation Rule 1.
- Note that, in cells where the count of "renter occupied units reporting rent" has a value of zero, the value is suppressed in Table 10 because "renter occupied units reporting no cash rent" are excluded from the universe in Table 10 (i.e., there are no cash rent values of $0.00).
- The printed volume uses counts from DualLabs Table 54 for the county total of renter occupied units reporting rent (120,193) in Table H-2.
- There is no equivalent table for DualLabs Table 10 in the printed volume (printed volume Table H-2 reports "median contract rent" rather than DualLabs Table 10 "aggregate contract rent").
- The printed volume suppresses the median value of rent if total census tract renter occupied units are less than 200 units because these tracts median rent values are subject to high degrees of uncertainty.
- Researchers should note that the DualLabs data file Table 10 includes aggregate rent values that have large sampling errors.
- The number of tracts in Table 10 that have suppressed values are too numerous to be mentioned here. Researchers can refer to the SSDC SPSS file "supprule3btospss.sav" for a complete listing of census tract table cells that have missing value indicators in the SSDC data file. See Tabulation Rule 1 for exceptions to Suppression Rule 3b.
Sample (25%, 20%, and 5%) Suppression Rules
Estimated subject counts in sample tables have a degree of uncertainty associated with the counts. In general, the smaller the sample, the higher the level of sampling error. The DualLabs codebook documents one sample suppression rule:
- "If Nonwhite (Spanish Surname population) is less than 400, then sample detail for these universes will not be shown and all pertinent fields in Tables 17 through 63 will be filled with zeros."5
SSDC staff examined sample table cell counts in the DualLabs data file and determined that this sample suppression rule was not applied to any "Nonwhite" or "Spanish Surname" tables in the DualLabs data file. In fact, there are no suppression rules applied to sample Tables 17 through 63.
Therefore, SSDC staff made no changes to the DualLabs tables 17 through 63 in the SPSS and ASCII SSDC data files. Researchers can determine confidence intervals and levels suitable for their research in sample table counts in the DualLabs data file.
SSDC staff did make changes to the spreadsheet version of the SSDC data file to make it conform to the printed census volume. The Census Bureau did suppress sample counts for "Nonwhite" and "Spanish Surname" persons in the printed Census volume. Population counts for "Nonwhite" and Spanish Surname" persons were suppressed in census tracts with fewer than 400 persons in Tables P-4 and P-5. Housing unit counts for "Nonwhites" were suppressed in census tracts with fewer than 100 units in Table H-3 and housing unit counts for "Spanish Surname" persons were suppressed in census tracts with fewer than 400 units in Table H-4.
Therefore, in the spreadsheet version of the SSDC data file, SSDC staff suppressed the estimated counts for the above "Nonwhite" and "Spanish Surname" sample table cells. Staff does not want to supply counts that have high levels of sampling errors to users who are looking-up counts rather than performing analysis on counts.
Users can subset the SSDC data file for "Nonwhite" or "Spanish Surname" persons on the SSDC extraction Web page in CSV or Excel output formats if they require subsets without suppressed census tract counts for these population groups in spreadsheet format.
Footnotes
1United States. Bureau of the Census. U.S. Censuses Of Population and Housing 1960. Census Tracts. Final Report PHC(1)-135 [San Diego, Calif.] U.S. Government Printing Office, Washington, D.C. 1962.
2"... white persons of Spanish surname were distinguished separately in five Southwestern States (Arizona, California, Colorado, New Mexico and Texas). In all other States, Puerto Rican persons ... were identified" (Source: Page 1, column 2 of the Introduction to the Census Printed Volume).
3The "total" columns (for the SMSA, counties, cities, etc.) include statistics [suppressed counts] for those tracts which are omitted from the tables because they have fewer than the specified number of persons or housing units. These totals, therefore, are not necessarily the sum of the figures for the tracts [which exclude suppressed counts] that are shown in the tables. (Source: Page 1, column 2 of the Introduction to the Census Printed Volume).
4The computed sum of all census tract housing units in the printed Census volume table H-1 is 339,440 because there is an error in the count of housing units in Census Tract 183 (1186). The housing units count in Tract 183 should be 1188. This can be confirmed by summing owner occupied + renter occupied + available vacant + other vacant housing units in Tract 183 (270+715+135+68=1188). Therefore, the total county count of housing units in the printed volume (339,442) is correct.
5It is important to make a distinction between the "Nonwhite" and "Spanish Surname" population groups. The Census Bureau defines "Spanish Surname" persons as a subgroup of "White" persons. "In order to obtain data on Spanish- and Mexican-Americans ... white persons (and white heads of households) of Spanish surname were distinguished separately ...". (Source: Page 3, column 2 of the Introduction to the Census Printed Volume). A detailed discussion of Spanish Surname population counts are available in Hernandez, et al. "Census Data and the Problem of Conceptually Defining the Mexican American Population." SOCIAL SCIENCE QUARTERLY, Vol. 53, no.4. March 1973, pp. 677 ff.