KoboToolbox
enables grouping of questions, allowing them
to be answered multiple times. This feature is particularly useful
during household surveys where a set of questions is designed to be
answered by each member of the household.
KoboToolbox
implements this feature by incorporating the
concept of repeat group
,
enabling the repetition of a group of questions. This method involves
enclosing the questions intended for repetition within a
begin_repeat
/end_repeat
loop. Furthermore,
repeat group
allows for nesting, thus enabling the
repetition of a question group within another repeat group
.
This concept can be demonstrated using the project and associated form
below.
type | name | label::English (en) | label::Francais (fr) | repeat_count | calculation |
---|---|---|---|---|---|
start | start | ||||
end | end | ||||
today | today | ||||
begin_repeat | demo | Demographic Characteristics | Caracteristique Demographique | ||
text | name | Name | Nom | ||
integer | age | Age | Age | ||
select_one sex | sex | Sex | Sexe | ||
integer | hobby | How many hobbies does ${name} have? | Combien de hobbies ${name} a ? | ||
select_one yesno | morelang | Does ${name} speak more than one language? | Est-ce que ${name} parle plus d’une langue ? | ||
calculate | name_individual | indexed-repeat(${name}, ${demo}, position(..)) | |||
begin_repeat | hobbies_list | List of Hobbies | Liste de hobbies | ${hobby} | |
text | hobbies | Hobbies of ${name_individual} | Hobbies de ${name_individual} | ||
end_repeat | |||||
begin_repeat | lang_list | List of Languages | Liste de langues | ${morelang} | |
select_multiple lang | langs | Languages spoken by ${name_individual} | Langue parle par ${name_individual} | ||
end_repeat | |||||
end_repeat | |||||
calculate | family_count | count(${demo}) | |||
note | family_count_note | Number of family members: ${family_count} | Nombre de membre dans la famille: ${family_count} | ||
begin_repeat | education | Education information | Information sur l’education | ${family_count} | |
calculate | name_individual2 | indexed-repeat(${name}, ${demo}, position(..)) | |||
select_one edu_level | edu_level | What is ${name_individual2}’s level of education | Quel est le niveau d’education de ${name_individual2} | ||
end_repeat |
list_name | name | label::English (en) | label::Francais (fr) |
---|---|---|---|
sex | 1 | Male | Homme |
sex | 2 | Female | Femme |
sex | 3 | Prefer not to say | Prefere ne pas dire |
edu_level | 1 | Primary | Primaire |
edu_level | 2 | Secondary | Secondaire |
edu_level | 3 | Higher Secondary & Above | Lycee et superieur |
yesno | 1 | Yes | Oui |
yesno | 0 | No | Non |
lang | 1 | French | Francais |
lang | 2 | Spanish | Espagnol |
lang | 3 | Arabic | Arabe |
lang | 99 | Other | Autre |
The aforementioned survey, named nested_roster
, was
uploaded to the server. It can be accessed from the list of asset
asset_list
.
library(robotoolbox)
library(dplyr)
asset_list <- kobo_asset_list()
uid <- filter(asset_list, name == "nested_roster") |>
pull(uid)
asset <- kobo_asset(uid)
asset
#> <robotoolbox asset> aANhxwX9S6BCsiYMgQj9kV
#> Asset name: nested_roster
#> Asset type: survey
#> Asset owner: dickoa
#> Created: 2022-01-05 21:22:51
#> Last modified: 2022-01-06 15:15:31
#> Submissions: 3
The output here deviates from a standard data.frame
. It
consists of a listing of each repeat group
loop present in
our form.
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `main`, `demo`, `hobbies_list`, `lang_list`, `education`
#> Columns: 49
#> Primary keys: 5
#> Foreign keys: 4
The output is a dm
object, sourced from the the
dm
package.
#> [1] "dm"
repeat group
as dm
objectA dm
object, which is a list of interconnected
data.frame
instances, can be manipulated using the
dm
package.
To comprehend the data storage structure, we can visualize the
relationships among tables (repeat group loops) and the schema of the
dataset. This schema can be depicted using the dm_draw
function.
The dm
package offers numerous helper functions for
manipulating dm
objects. For instance, the
dm_nrow
function can be used to ascertain the number of
rows in each table.
dm
object is a list of data.frame
A dm
object is a list
of
data.frame
. Similar to any list of data.frame
,
you can extract each table (data.frame
), and analyze it
separately. The principal table, where you have the first
repeat group
, is termede as main
.
glimpse(df$main)
#> Rows: 3
#> Columns: 17
#> $ start <dttm> 2022-01-06 15:25:12, 2022-01-06 15:17:18, 2022-0…
#> $ end <dttm> 2022-01-06 15:28:36, 2022-01-06 15:25:11, 2022-0…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06
#> $ family_count <dbl> 3, 2, 2
#> $ family_count_note <lgl> NA, NA, NA
#> $ `_index` <int> 1, 2, 3
#> $ `_id` <int> 17727576, 17727538, 17727380
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", "ee485fd6655…
#> $ education_count <dbl> 3, 2, 2
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxBo8G5uQa94oD…
#> $ instanceID <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505d617", "uui…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BCsiYMgQj9kV…
#> $ `_uuid` <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617", "06552d3d…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_web", "submit…
#> $ `_submission_time` <dttm> 2022-01-06 15:28:47, 2022-01-06 15:25:23, 2022-01…
#> $ `_validation_status` <chr> NA, NA, NA
#> $ `_submitted_by` <lgl> NA, NA, NA
The other tables are named following the names of their associated
repeat groups
. For instance, the education
table is named after the education
repeat group
.
glimpse(df$education)
#> Rows: 7
#> Columns: 6
#> $ name_individual2 <chr> "Jemelle", "Fatim", "Ali", "Shannon", "Skip", "Ah…
#> $ edu_level <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index` <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "main", "main…
#> $ `_validation_status` <chr> NA, NA, NA, NA, NA, NA, NA
One key benefit of using the dm
package is its
capability to dynamically filter tables while maintaining their
interconnections. For example, filtering the main
table
will automatically extend to the education
and
demo
tables. As the hobbies_list
and
lang_list
tables are linked to the demo
table,
they will be filtered as well.
In certain instances, analyzing joined data may prove simpler. The
dm_flatten_to_tbl
function can be used to join data safely
while preserving its structure and the connections between tables. We
can merge the education
table with the main
table using the dm_flatten_to_tbl
function, with the
operation starting from education
.
df |>
dm_flatten_to_tbl(.start = education,
.join = left_join) |>
glimpse()
#> Rows: 7
#> Columns: 22
#> $ name_individual2 <chr> "Jemelle", "Fatim", "Ali", "Shannon", "…
#> $ edu_level <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index` <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "ma…
#> $ `_validation_status.education` <chr> NA, NA, NA, NA, NA, NA, NA
#> $ start <dttm> 2022-01-06 15:25:12, 2022-01-06 15:25:1…
#> $ end <dttm> 2022-01-06 15:28:36, 2022-01-06 15:28:3…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06, 202…
#> $ family_count <dbl> 3, 3, 3, 2, 2, 2, 2
#> $ family_count_note <lgl> NA, NA, NA, NA, NA, NA, NA
#> $ `_id` <int> 17727576, 17727576, 17727576, 17727538,…
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", "e…
#> $ education_count <dbl> 3, 3, 3, 2, 2, 2, 2
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxB…
#> $ instanceID <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BC…
#> $ `_uuid` <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617",…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_web…
#> $ `_submission_time` <dttm> 2022-01-06 15:28:47, 2022-01-06 15:28:4…
#> $ `_validation_status.main` <chr> NA, NA, NA, NA, NA, NA, NA
#> $ `_submitted_by` <lgl> NA, NA, NA, NA, NA, NA, NA
This logic can be extended to create the widest possible table
through a cascade of joins, commencing from a deeper table
(.start
argument) and ending at the main table. Taking
.start = hobbies_list
as an example, two joins will be
performed: hobbies_list
will be merged with the
demo
table, and subsequently, the demo
table
will be combined with the main
table.
df |>
dm_flatten_to_tbl(.start = hobbies_list,
.join = left_join,
.recursive = TRUE) |>
glimpse()
#> Rows: 14
#> Columns: 32
#> $ hobbies <chr> "Reading", "MMA", "Origami", "Soccer…
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
#> $ `_parent_index.hobbies_list` <int> 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 6, …
#> $ `_parent_table_name.hobbies_list` <chr> "demo", "demo", "demo", "demo", "dem…
#> $ `_validation_status.hobbies_list` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ name <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ age <dbl> 35, 35, 35, 15, 15, 9, 9, 40, 40, 40…
#> $ sex <chr+lbl> "2", "2", "2", "2", "2", "1", "1…
#> $ hobby <dbl> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ morelang <chr+lbl> "1", "1", "1", "1", "1", "1", "1…
#> $ name_individual <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ `_parent_index.demo` <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, …
#> $ hobbies_list_count <dbl> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ lang_list_count <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, …
#> $ `_parent_table_name.demo` <chr> "main", "main", "main", "main", "mai…
#> $ `_validation_status.demo` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ start <dttm> 2022-01-06 15:25:12, 2022-01-06 15:…
#> $ end <dttm> 2022-01-06 15:28:36, 2022-01-06 15:…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06,…
#> $ family_count <dbl> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ family_count_note <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_id` <int> 17727576, 17727576, 17727576, 177275…
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", …
#> $ education_count <dbl> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGK…
#> $ instanceID <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d65…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S…
#> $ `_uuid` <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d61…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_…
#> $ `_submission_time` <dttm> 2022-01-06 15:28:47, 2022-01-06 15:…
#> $ `_validation_status.main` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_submitted_by` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
You can gain extensive knowledge about the dm
package
by going through its detailed documentation.