Loading Data with Repeating Groups

KoboToolbox enables grouping of questions, allowing them to be answered multiple times. This feature is particularly useful during household surveys where a set of questions is designed to be answered by each member of the household.

Loading data

KoboToolbox implements this feature by incorporating the concept of repeat group, enabling the repetition of a group of questions. This method involves enclosing the questions intended for repetition within a begin_repeat/end_repeat loop. Furthermore, repeat group allows for nesting, thus enabling the repetition of a question group within another repeat group. This concept can be demonstrated using the project and associated form below.

type name label::English (en) label::Francais (fr) repeat_count calculation
start start
end end
today today
begin_repeat demo Demographic Characteristics Caracteristique Demographique
text name Name Nom
integer age Age Age
select_one sex sex Sex Sexe
integer hobby How many hobbies does ${name} have? Combien de hobbies ${name} a ?
select_one yesno morelang Does ${name} speak more than one language? Est-ce que ${name} parle plus d’une langue ?
calculate name_individual indexed-repeat(${name}, ${demo}, position(..))
begin_repeat hobbies_list List of Hobbies Liste de hobbies ${hobby}
text hobbies Hobbies of ${name_individual} Hobbies de ${name_individual}
end_repeat
begin_repeat lang_list List of Languages Liste de langues ${morelang}
select_multiple lang langs Languages spoken by ${name_individual} Langue parle par ${name_individual}
end_repeat
end_repeat
calculate family_count count(${demo})
note family_count_note Number of family members: ${family_count} Nombre de membre dans la famille: ${family_count}
begin_repeat education Education information Information sur l’education ${family_count}
calculate name_individual2 indexed-repeat(${name}, ${demo}, position(..))
select_one edu_level edu_level What is ${name_individual2}’s level of education Quel est le niveau d’education de ${name_individual2}
end_repeat
list_name name label::English (en) label::Francais (fr)
sex 1 Male Homme
sex 2 Female Femme
sex 3 Prefer not to say Prefere ne pas dire
edu_level 1 Primary Primaire
edu_level 2 Secondary Secondaire
edu_level 3 Higher Secondary & Above Lycee et superieur
yesno 1 Yes Oui
yesno 0 No Non
lang 1 French Francais
lang 2 Spanish Espagnol
lang 3 Arabic Arabe
lang 99 Other Autre

Loading the survey

The aforementioned survey, named nested_roster, was uploaded to the server. It can be accessed from the list of asset asset_list.

library(robotoolbox)
library(dplyr)
asset_list <- kobo_asset_list()
uid <- filter(asset_list, name == "nested_roster") |>
  pull(uid)
asset <- kobo_asset(uid)
asset
#> <robotoolbox asset>  aANhxwX9S6BCsiYMgQj9kV 
#>   Asset name: nested_roster
#>   Asset type: survey
#>   Asset owner: dickoa
#>   Created: 2022-01-05 21:22:51
#>   Last modified: 2022-01-06 15:15:31
#>   Submissions: 3

Extracting the data

The output here deviates from a standard data.frame. It consists of a listing of each repeat group loop present in our form.

df <- kobo_data(asset)
df
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `main`, `demo`, `hobbies_list`, `lang_list`, `education`
#> Columns: 49
#> Primary keys: 5
#> Foreign keys: 4

The output is a dm object, sourced from the the dm package.

#> [1] "dm"

Manipulating repeat group as dm object

A dm object, which is a list of interconnected data.frame instances, can be manipulated using the dm package.

Visualizing the relationship between tables

To comprehend the data storage structure, we can visualize the relationships among tables (repeat group loops) and the schema of the dataset. This schema can be depicted using the dm_draw function.

library(dm)
dm_draw(df)

Number of rows of each table

The dm package offers numerous helper functions for manipulating dm objects. For instance, the dm_nrow function can be used to ascertain the number of rows in each table.

dm_nrow(df)
#>         main         demo hobbies_list    lang_list    education 
#>            3            7           14            4            7

A dm object is a list of data.frame

A dm object is a list of data.frame. Similar to any list of data.frame, you can extract each table (data.frame), and analyze it separately. The principal table, where you have the first repeat group, is termede as main.

glimpse(df$main)
#> Rows: 3
#> Columns: 17
#> $ start                <dttm> 2022-01-06 15:25:12, 2022-01-06 15:17:18, 2022-0…
#> $ end                  <dttm> 2022-01-06 15:28:36, 2022-01-06 15:25:11, 2022-0…
#> $ today                <date> 2022-01-06, 2022-01-06, 2022-01-06
#> $ family_count         <dbl> 3, 2, 2
#> $ family_count_note    <lgl> NA, NA, NA
#> $ `_index`             <int> 1, 2, 3
#> $ `_id`                <int> 17727576, 17727538, 17727380
#> $ uuid                 <chr> "ee485fd6655b4e328fdd895ac0451656", "ee485fd6655…
#> $ education_count      <dbl> 3, 2, 2
#> $ `__version__`        <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxBo8G5uQa94oD…
#> $ instanceID           <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505d617", "uui…
#> $ `_xform_id_string`   <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BCsiYMgQj9kV…
#> $ `_uuid`              <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617", "06552d3d…
#> $ `_status`            <chr> "submitted_via_web", "submitted_via_web", "submit…
#> $ `_submission_time`   <dttm> 2022-01-06 15:28:47, 2022-01-06 15:25:23, 2022-01…
#> $ `_validation_status` <chr> NA, NA, NA
#> $ `_submitted_by`      <lgl> NA, NA, NA

The other tables are named following the names of their associated repeat groups. For instance, the education table is named after the education repeat group.

glimpse(df$education)
#> Rows: 7
#> Columns: 6
#> $ name_individual2     <chr> "Jemelle", "Fatim", "Ali", "Shannon", "Skip", "Ah…
#> $ edu_level            <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index`             <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index`      <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "main", "main…
#> $ `_validation_status` <chr> NA, NA, NA, NA, NA, NA, NA

Filtering data

One key benefit of using the dm package is its capability to dynamically filter tables while maintaining their interconnections. For example, filtering the main table will automatically extend to the education and demo tables. As the hobbies_list and lang_list tables are linked to the demo table, they will be filtered as well.

df |>
  dm_filter(main = (`_index` == 2)) |>
  dm_nrow()
#>         main         demo hobbies_list    lang_list    education 
#>            1            2            4            0            2

Joining tables

In certain instances, analyzing joined data may prove simpler. The dm_flatten_to_tbl function can be used to join data safely while preserving its structure and the connections between tables. We can merge the education table with the main table using the dm_flatten_to_tbl function, with the operation starting from education.

df |>
  dm_flatten_to_tbl(.start = education,
                    .join = left_join) |>
  glimpse()
#> Rows: 7
#> Columns: 22
#> $ name_individual2               <chr> "Jemelle", "Fatim", "Ali", "Shannon", "…
#> $ edu_level                      <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index`                       <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index`                <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name`           <chr> "main", "main", "main", "main", "ma…
#> $ `_validation_status.education` <chr> NA, NA, NA, NA, NA, NA, NA
#> $ start                          <dttm> 2022-01-06 15:25:12, 2022-01-06 15:25:1…
#> $ end                            <dttm> 2022-01-06 15:28:36, 2022-01-06 15:28:3…
#> $ today                          <date> 2022-01-06, 2022-01-06, 2022-01-06, 202…
#> $ family_count                   <dbl> 3, 3, 3, 2, 2, 2, 2
#> $ family_count_note              <lgl> NA, NA, NA, NA, NA, NA, NA
#> $ `_id`                          <int> 17727576, 17727576, 17727576, 17727538,…
#> $ uuid                           <chr> "ee485fd6655b4e328fdd895ac0451656", "e…
#> $ education_count                <dbl> 3, 3, 3, 2, 2, 2, 2
#> $ `__version__`                  <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxB…
#> $ instanceID                     <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505…
#> $ `_xform_id_string`             <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BC…
#> $ `_uuid`                        <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617",…
#> $ `_status`                      <chr> "submitted_via_web", "submitted_via_web…
#> $ `_submission_time`             <dttm> 2022-01-06 15:28:47, 2022-01-06 15:28:4…
#> $ `_validation_status.main`      <chr> NA, NA, NA, NA, NA, NA, NA
#> $ `_submitted_by`                <lgl> NA, NA, NA, NA, NA, NA, NA

This logic can be extended to create the widest possible table through a cascade of joins, commencing from a deeper table (.start argument) and ending at the main table. Taking .start = hobbies_list as an example, two joins will be performed: hobbies_list will be merged with the demo table, and subsequently, the demo table will be combined with the main table.

df |>
  dm_flatten_to_tbl(.start = hobbies_list,
                    .join = left_join,
                    .recursive = TRUE) |>
  glimpse()
#> Rows: 14
#> Columns: 32
#> $ hobbies                           <chr> "Reading", "MMA", "Origami", "Soccer…
#> $ `_index`                          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
#> $ `_parent_index.hobbies_list`      <int> 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 6, …
#> $ `_parent_table_name.hobbies_list` <chr> "demo", "demo", "demo", "demo", "dem…
#> $ `_validation_status.hobbies_list` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ name                              <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ age                               <dbl> 35, 35, 35, 15, 15, 9, 9, 40, 40, 40…
#> $ sex                               <chr+lbl> "2", "2", "2", "2", "2", "1", "1…
#> $ hobby                             <dbl> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ morelang                          <chr+lbl> "1", "1", "1", "1", "1", "1", "1…
#> $ name_individual                   <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ `_parent_index.demo`              <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, …
#> $ hobbies_list_count                <dbl> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ lang_list_count                   <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, …
#> $ `_parent_table_name.demo`         <chr> "main", "main", "main", "main", "mai…
#> $ `_validation_status.demo`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ start                             <dttm> 2022-01-06 15:25:12, 2022-01-06 15:…
#> $ end                               <dttm> 2022-01-06 15:28:36, 2022-01-06 15:…
#> $ today                             <date> 2022-01-06, 2022-01-06, 2022-01-06,…
#> $ family_count                      <dbl> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ family_count_note                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_id`                             <int> 17727576, 17727576, 17727576, 177275…
#> $ uuid                              <chr> "ee485fd6655b4e328fdd895ac0451656", …
#> $ education_count                   <dbl> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ `__version__`                     <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGK…
#> $ instanceID                        <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d65…
#> $ `_xform_id_string`                <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S…
#> $ `_uuid`                           <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d61…
#> $ `_status`                         <chr> "submitted_via_web", "submitted_via_…
#> $ `_submission_time`                <dttm> 2022-01-06 15:28:47, 2022-01-06 15:…
#> $ `_validation_status.main`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_submitted_by`                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …

You can gain extensive knowledge about the dm package by going through its detailed documentation.