Skip to contents

Translating between hierarchies

Most surveys that contain occupation related variables have 4 digit ISCO occupations. What does that mean? That you’re working with the most fine-grained definition of an occupation. In some cases, you want to work with aggregated groups. Instead of knowing something about a mathematician, you’d rather group all math related occupation into a “Scientist” category. DIGCLASS has this implemented following the rules of each ISCO schema. Let’s load DIGCLASS:

library(DIGCLASS)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

In ISCO parlance, the most granular occupation have what it’s called 4 digits. This means that the occupation 4 non-zero digits. Occupation 2111 is a 4 digit occupation because it does not contain any zeroes. In contrast, 2110 is the “parent” category of 2111. To make it even more simple, think that 2111 is the occupation “Physicists and astronomers” while 2110 is “Physicists, chemists and related professionals”. You can intuitively group physicists in that broader category. Similarly, the occupation 2110 or “Physicists, chemists and related professionals” is nested within the more broader group 2100 or “Physical, mathematical and engineering science professionals”. Finally, the broadest group is 2000, for which the general group definition is “Professionals.

This was just an intuitive explanation of how ISCO codes works. You don’t have to remember what each category is. You can always look up these values yourself for better understanding but DIGCLASS will do the work of translating everything for you. The important thing to remember is that something you’ll want to group fine grained occupations into more broader occupation categories. An example would be that all categories that are within “Physicists, chemists and related professionals” are grouped together. This means that we “convert” the column from 4 digits into 2 digits, for example. In DIGCLASS you can do that with the function isco*_swap where * is the ISCO of preference. Let’s look at the ISCO variables we have in the ESS data in DIGCLASS:

ess
#> # A tibble: 48,285 × 12
#>    isco68 isco88 isco88com isco08 emplno self_employed is_supervisor
#>    <chr>  <chr>  <chr>     <chr>   <dbl>         <dbl>         <dbl>
#>  1 5890   5169   5169      5414        0             1             0
#>  2 2120   1222   1222      1321        0             0             1
#>  3 7200   8120   8120      3135        0             0             0
#>  4 9310   7141   7141      7131        0             0             1
#>  5 6220   6111   6111      6111        0             0             0
#>  6 6220   6111   6111      6111        0             0             1
#>  7 9595   9313   9313      9313        0             0             1
#>  8 6000   1221   1221      1311        0             0             1
#>  9 6000   1221   1221      1311        2             1             1
#> 10 6220   6111   6111      6111        0             0             1
#> # ℹ 48,275 more rows
#> # ℹ 5 more variables: control_work <dbl>, control_daily <dbl>,
#> #   work_status <dbl>, main_activity <dbl>, agea <dbl>

All three ISCO variables are in four digits but we can convert them to three digits:

ess %>%
  transmute(
    isco88,
    isco88_three = isco88_swap(isco88, from = 4, to = 3)
  )
#> # A tibble: 48,285 × 2
#>    isco88 isco88_three
#>    <chr>  <chr>       
#>  1 5169   5160        
#>  2 1222   1220        
#>  3 8120   8120        
#>  4 7141   7140        
#>  5 6111   6110        
#>  6 6111   6110        
#>  7 9313   9310        
#>  8 1221   1220        
#>  9 1221   1220        
#> 10 6111   6110        
#> # ℹ 48,275 more rows

As you can see, the three digit translation always has a zero, meaning that it was translated into a broder group. We can do the same for an even broader group, translating from 4 to 2 digits:

ess %>%
  transmute(
    isco08,
    isco08_two = isco08_swap(isco08, from = 4, to = 2)
  )
#> # A tibble: 48,285 × 2
#>    isco08 isco08_two
#>    <chr>  <chr>     
#>  1 5414   5400      
#>  2 1321   1300      
#>  3 3135   3100      
#>  4 7131   7100      
#>  5 6111   6100      
#>  6 6111   6100      
#>  7 9313   9300      
#>  8 1311   1300      
#>  9 1311   1300      
#> 10 6111   6100      
#> # ℹ 48,275 more rows

We can see that the two digit translation is a broader category than the original four digit occupation. Note that we can translate everything from 4 to 1 but not the other way around:

ess %>%
  transmute(
    isco08,
    isco08_two = isco08_swap(isco08, from = 2, to = 4)
  )
#> Error in `transmute()`:
#>  In argument: `isco08_two = isco08_swap(isco08, from = 2, to = 4)`.
#> Caused by error in `isco08_swap()`:
#> ! `from` should always be a bigger digit group than `to`.

That’s because we can’t translate a more broader group into a finer occupation because it could be many specific occupation within a broder group. Finally, do note that for ISCO68, there are some 1 digit groups missing (0000 and 1000 don’t have a broader category), so when you translate from any digit to the 1 digit in ISCO68 you might some missing values for occupation within the major group 0000 and 1000:

ess %>%
  transmute(
    isco68,
    isco68_one = isco68_swap(isco68, from = 4, to = 1)
  )
#> # A tibble: 48,285 × 2
#>    isco68 isco68_one
#>    <chr>  <chr>     
#>  1 5890   5000      
#>  2 2120   2000      
#>  3 7200   7000      
#>  4 9310   9000      
#>  5 6220   6000      
#>  6 6220   6000      
#>  7 9595   9000      
#>  8 6000   6000      
#>  9 6000   6000      
#> 10 6220   6000      
#> # ℹ 48,275 more rows

Note that the 1 digit groups 2000, 3000, 5000 and 8000 are translated correctly. Yet the 1 digit group 1000 or 0000 are never translated because they don’t exist in ISCO68. DIGCLASS makes the translation either way but note that you’ll lose that information when you translate it to other schemas because it’s an NA.

Using translated hierarchies for translation between schemas

isco*_swap are important functions because some translations require ISCO variables to be in different digits. For example, to translate ISCO08 to the ESEC class schema, ISCO08 needs to be in 3-digits. How would that translation look like? Here’s an example:

library(dplyr)

# convert isco08 to three digits
ess$isco08_three <- isco08_swap(ess$isco08, from = 4, to = 3)

ess %>%
  transmute(
    isco08_three,
    esec = isco08_to_esec(
      isco08_three,
      is_supervisor,
      self_employed,
      emplno,
      label = FALSE
    )
  )
#> # A tibble: 48,285 × 2
#>    isco08_three esec 
#>    <chr>        <chr>
#>  1 5410         3    
#>  2 1320         2    
#>  3 3130         6    
#>  4 7130         6    
#>  5 6110         8    
#>  6 6110         6    
#>  7 9310         6    
#>  8 1310         2    
#>  9 1310         5    
#> 10 6110         6    
#> # ℹ 48,275 more rows

Similarly, ESEC has another translation but based on ISCO08 being 2-digits. Here’s an example:

# convert to two digits
ess$isco08_two <- isco08_swap(ess$isco08, from = 4, to = 2)

ess %>%
  transmute(
    isco08_two,
    esec = isco08_two_to_esec(
      isco08_two,
      is_supervisor,
      self_employed,
      emplno,
      label = FALSE
    )
  )
#> # A tibble: 48,285 × 2
#>    isco08_two esec 
#>    <chr>      <chr>
#>  1 5400       3    
#>  2 1300       4    
#>  3 3100       2    
#>  4 7100       4    
#>  5 6100       6    
#>  6 6100       5    
#>  7 9300       5    
#>  8 1300       4    
#>  9 1300       1    
#> 10 6100       5    
#> # ℹ 48,275 more rows

As you can see, isco*_swap are functions that serve and facilitate a common task in ISCO translations.

Filling ISCO codes with trailing zeros

When working with ISCO codes at different digit levels, you may need to convert them to a specific digit format for certain functions. The isco_fill() function makes this easy by automatically appending the appropriate number of trailing zeros:

# Example with 2-digit ISCO codes filled to 4-digit (default)
isco_2digit <- c("11", "21", "31", "61")
isco_filled <- isco_fill(isco_2digit)
#>  Filled 2-digit ISCO codes to 4-digit format
print(isco_filled)
#> [1] "1100" "2100" "3100" "6100"

# Now you can use with OEP calculation
oep_scores <- isco08_to_oep(isco_filled)
print(oep_scores)
#> [1] "87" "79" "63" "26"

You can also fill to other digit levels:

# You can also fill to other digit levels
isco_1digit <- c("1", "2", "6")
isco_fill(isco_1digit, digits = 3)  # Fill to 3-digit
#>  Filled 1-digit ISCO codes to 3-digit format
#> [1] "100" "200" "600"

isco_fill(isco_1digit, digits = 2)  # Fill to 2-digit
#>  Filled 1-digit ISCO codes to 2-digit format
#> [1] "10" "20" "60"

This is particularly useful when your data contains ISCO codes at the 1, 2, or 3-digit level but you need to use functions that expect 4-digit codes, such as isco*_to_oep(). The function follows the same logic as CROSSWALK/ISCOGEN in Stata.

Here’s a typical workflow combining isco_fill() with other DIGCLASS functions:

# Simulate some 2-digit ISCO data
isco_data <- data.frame(
  respondent_id = 1:4,
  isco08_2digit = c("11", "21", "31", "61"),
  self_employed = c(0, 0, 1, 0),
  emplno = c(0, 0, 3, 0)
)

# Fill to 4-digit and calculate OEP and OESCH
isco_data %>%
  mutate(
    isco08_4digit = isco_fill(isco08_2digit),
    oep = isco08_to_oep(isco08_4digit),
    oesch = isco08_to_oesch(isco08_4digit, self_employed, emplno, label = TRUE)
  )
#>  Filled 2-digit ISCO codes to 4-digit format
#>   respondent_id isco08_2digit self_employed emplno isco08_4digit oep
#> 1             1            11             0      0          1100  87
#> 2             2            21             0      0          2100  79
#> 3             3            31             1      3          3100  63
#> 4             4            61             0      0          6100  26
#>                                        oesch
#> 1 'Higher-grade managers and administrators'
#> 2                        'Technical experts'
#> 3     'Small business owners with employees'
#> 4                           'Skilled manual'

The isco_fill() function is designed to work seamlessly with the rest of the DIGCLASS ecosystem, making it easy to prepare your ISCO data for analysis regardless of the original digit level.