The vld_ functions are used within the chk_
functions. The chk_ functions (and their vld_
equivalents) can be divided into the following families.
In the code in this examples, we will use vld_*
functions
If you want to learn more about the logic behind some of the functions explained here, we recommend reading the book Advanced R (Wickham, 2019).
For reasons of space, the x_name = NULL argument is not
shown. For a more simplified list of the chk functions, you
can see the Reference
section.
chk_ FunctionsCheck if the function input is missing or not
chk_missing function uses missing() to
check if an argument has been left out when the function is called.
| Function | Code |
|---|---|
chk_missing() |
missing() |
chk_not_missing() |
!missing() |
... CheckerCheck if the function input comes from ...
(dot-dot-dot) or not
The functions chk_used(...) and
chk_unused(...) check if any arguments have been provided
through ... (called dot-dot-dot or ellipsis),
which is commonly used in R to allow a variable number of arguments.
| Function | Code |
|---|---|
chk_used(...) |
length(list(...)) != 0L |
chk_unused(...) |
length(list(...)) == 0L |
Check if the function input is a valid external data source.
These chk functions check the existence of a file, the
validity of its extension, and the existence of a directory.
| Function | Code |
|---|---|
chk_file(x) |
vld_string(x) && file.exists(x) && !dir.exists(x) |
chk_ext(x, ext) |
vld_string(x) && vld_subset(tools::file_ext(x), ext) |
chk_dir(x) |
vld_string(x) && dir.exists(x) |
Check if the function input is NULL or not
| Function | Code |
|---|---|
chk_null(x) |
is.null(x) |
chk_not_null(x) |
!is.null(x) |
Check if the function input is a scalar. In R, scalars are vectors of length 1.
| Function | Code |
|---|---|
chk_scalar(x) |
length(x) == 1L |
The following functions check if the functions inputs are vectors of length 1 of a particular data type. Each data type has a special syntax to create an individual value or “scalar”.
| Function | Code |
|---|---|
chk_string(x) |
is.character(x) && length(x) == 1L && !anyNA(x) |
chk_number(x) |
is.numeric(x) && length(x) == 1L && !anyNA(x) |
For logical data types, you can check flags using
chk_flag(), which considers TRUE or
FALSE as possible values, or use chk_lgl() to
verify if a scalar is of type logical, including NA as element.
| Function | Code |
|---|---|
chk_flag(x) |
is.logical(x) && length(x) == 1L && !anyNA(x) |
chk_lgl(x) |
is.logical(x) && length(x) == 1L |
It is also possible to check if the user-provided argument is only
TRUE or only FALSE:
| Function | Code |
|---|---|
chk_true(x) |
is.logical(x) && length(x) == 1L && !anyNA(x) && x |
chk_false(x) |
is.logical(x) && length(x) == 1L && !anyNA(x) && !x |
Check if the function input is of class Date or DateTime
Date and datetime classes can be checked with chk_date
and chk_datetime.
| Function | Code |
|---|---|
chk_date(x) |
inherits(x, "Date") && length(x) == 1L && !anyNA(x) |
chk_date_time(x) |
inherits(x, "POSIXct") && length(x) == 1L && !anyNA(x) |
Also you can check the time zone with chk_tz(). The
available time zones can be retrieved using the function
OlsonNames().
| Function | Code |
|---|---|
chk_tz(x) |
is.character(x) && length(x) == 1L && !anyNA(x) && x %in% OlsonNames() |
Check if the function input has a specific data structure.
Vectors are a family of data types that come in two forms: atomic vectors and lists. When vectors consist of elements of the same data type, they can be considered atomic, matrices, or arrays. The elements in a list, however, can be of different types.
To check if a function argument is a vector you can use
chk_vector().
| Function | Code |
|---|---|
chk_vector(x) |
is.atomic(x) && !is.matrix(x) && !is.array(x)) || is.list(x) |
Pay attention that chk_vector() and
vld_vector() are different from is.vector(),
that will return FALSE if the vector has any attributes except
names.
vector <- c(1, 2, 3)
is.vector(vector) # TRUE
#> [1] TRUE
vld_vector(vector) # TRUE
#> [1] TRUE
attributes(vector) <- list("a" = 10, "b" = 20, "c" = 30)
is.vector(vector) # FALSE
#> [1] FALSE
vld_vector(vector) # TRUE
#> [1] TRUE| Function | Code |
|---|---|
chk_atomic(x) |
is.atomic(x) |
Notice that is.atomic is true for the types logical,
integer, numeric, complex, character and raw. Also, it is TRUE for
NULL.
vector <- c(1, 2, 3)
is.atomic(vector) # TRUE
#> [1] TRUE
vld_vector(vector) # TRUE
#> [1] TRUE
is.atomic(NULL) # TRUE
#> [1] FALSE
vld_vector(NULL) # TRUE
#> [1] FALSEThe dimension attribute converts vectors into matrices and arrays.
| Function | Code |
|---|---|
chk_array(x) |
is.array(x) |
chk_matrix(x) |
is.matrix(x) |
When a vector is composed by heterogeneous data types, can be a list. Data frames are among the most important S3 vectors, constructed on top of lists.
| Function | Code |
|---|---|
chk_list(x) |
is.list() |
chk_data(x) |
inherits(x, "data.frame") |
Be careful not to confuse the function chk_data with
check_data. Please read the check_ functions
section below and the function documentation.
Check if the function input has a data type. You can use the function
typeof() to confirm the data type.
| Function | Code |
|---|---|
chk_environment(x) |
is.environment(x) |
chk_logical(x) |
is.logical(x) |
chk_character(x) |
is.character(x) |
For numbers there are four functions. R differentiates between
doubles (chk_double()) and integers
(chk_integer()). You can also use the generic function
chk_numeric(), which will detect both. The third type of
number is complex (chk_complex()).
| Function | Code |
|---|---|
chk_numeric(x) |
is.numeric(x) |
chk_double(x) |
is.double(x) |
chk_integer(x) |
is.integer(x) |
chk_complex(x) |
is.complex(x) |
Consider that to explicitly create an integer in R, you need to use
the suffix L.
These functions accept whole numbers, whether they are explicitly integers or double types without fractional parts.
| Function | Code |
|---|---|
chk_whole_numeric |
is.integer(x) || (is.double(x) && vld_true(all.equal(x[!is.na(x)], trunc(x[!is.na(x)])))) |
chk_whole_number |
vld_number(x) && (is.integer(x) || vld_true(all.equal(x, trunc(x)))) |
chk_count |
vld_whole_number(x) && x >= 0 |
If you want to consider both 3.0 and 3L as integers, it is safer to
use the function chk_whole_numeric. Here, x is
valid if it’s an integer or a double that can be converted to an integer
without changing its value.
# Integer vector
vld_whole_numeric(c(1L, 2L, 3L)) # TRUE
#> [1] TRUE
# Double vector representing whole numbers
vld_whole_numeric(c(1.0, 2.0, 3.0)) # TRUE
#> [1] TRUE
# Double vector with fractional numbers
vld_whole_numeric(c(1.0, 2.2, 3.0)) # FALSE
#> [1] FALSEThe function chk_whole_number is similar to
chk_whole_numeric. chk_whole_number checks if
the number is of length(x) == 1L
# Integer vector
vld_whole_numeric(c(1L, 2L, 3L)) # TRUE
#> [1] TRUE
vld_whole_number(c(1L, 2L, 3L)) # FALSE
#> [1] FALSE
vld_whole_number(c(1L)) # TRUE
#> [1] TRUEchk_count() is a special case of
chk_whole_number, differing in that it ensures values are
non-negative whole numbers.
Check if the function input is a factor
| Function | Code |
|---|---|
chk_factor |
is.factor(x) |
chk_character_or_factor |
is.character(x) || is.factor(x) |
Factors can be specially confusing for users, because despite they are displayed as characters are built in top of integer vectors.
chk provides the function
chk_character_or_factor() that allows detecting if the
argument that the user is providing contains strings.
# Factor with specified levels
vector_fruits <- c("apple", "banana", "apple", "orange", "banana", "apple")
factor_fruits <- factor(c("apple", "banana", "apple", "orange", "banana", "apple"),
levels = c("apple", "banana", "orange"))
is.factor(factor_fruits) # TRUE
#> [1] TRUE
vld_factor(factor_fruits) # TRUE
#> [1] TRUE
is.character(factor_fruits) # FALSE
#> [1] FALSE
vld_character(factor_fruits) # FALSE
#> [1] FALSE
vld_character_or_factor(factor_fruits) # TRUE
#> [1] TRUECheck if the function input has a characteristic shared by all its elements.
If you want to apply any of the previously defined functions for
length(x) == 1L to the elements of a vector, you can use
chk_all().
| Function | Code |
|---|---|
chk_all(x, chk_fun, ...) |
all(vapply(x, chk_fun, TRUE, ...)) |
Check if the function input is another function
formals refers to the count of the number of formal
arguments
| Function | Code |
|---|---|
chk_function |
is.function(x) && (is.null(formals) || length(formals(x)) == formals) |
Check if the function input has names and are valid
chk_named function works with vectors, lists, data frames,
and matrices that have named columns or rows. Do not confuse with
check_names.
chk_valid_name function specifically designed to check
if the elements of a character vector are valid R names. If you want to
know what is considered a valid name, please refer to the documentation
for the make.names function.
| Function | Code |
|---|---|
chk_named(x) |
!is.null(names(x)) |
chk_valid_name(x) |
identical(make.names(x[!is.na(x)]), as.character(x[!is.na(x)])) |
vld_valid_name(c("name1", NA, "name_2", "validName")) # TRUE
#> [1] TRUE
vld_valid_name(c(1, 2, 3)) # FALSE
#> [1] FALSE
vld_named(data.frame(a = 1:5, b = 6:10)) # TRUE
#> [1] TRUE
vld_named(list(a = 1, b = 2)) # TRUE
#> [1] TRUE
vld_named(c(a = 1, b = 2)) # TRUE
#> [1] TRUE
vld_named(c(1, 2, 3)) # FALSE
#> [1] FALSECheck if the function input is part of a range of values. The function input should be numeric.
| Function | Code |
|---|---|
chk_range(x, range = c(0, 1)) |
all(x[!is.na(x)] >= range[1] & x[!is.na(x)] <= range[2]) |
chk_lt(x, value = 0) |
all(x[!is.na(x)] < value) |
chk_lte(x, value = 0) |
all(x[!is.na(x)] <= value) |
chk_gt(x, value = 0) |
all(x[!is.na(x)] > value) |
chk_gte(x, value = 0) |
all(x[!is.na(x)] >= value) |
Check if the function input is equal or similar to a predefined object.
The functions chk_identical(), chk_equal(),
and chk_equivalent() are used to compare two objects, but
they differ in how strict the comparison is.
chk_equal and chk_equivalentchecks if x and
y are numerically equivalent within a specified tolerance, but
chk_equivalent ignores differences in attributes.
| Function | Code |
|---|---|
chk_identical(x, y) |
identical(x, y) |
chk_equal(x, y, tolerance = sqrt(.Machine$double.eps)) |
vld_true(all.equal(x, y, tolerance)) |
chk_equivalent(x, y, tolerance = sqrt(.Machine$double.eps)) |
vld_true(all.equal(x, y, tolerance, check.attributes = FALSE)) |
In the case you want to compare the elements of a vector, you can use
the check_all_* functions.
| Function | Code |
|---|---|
chk_all_identical(x) |
length(x) < 2L || all(vapply(x, vld_identical, TRUE, y = x[[1]])) |
chk_all_equal(x, tolerance = sqrt(.Machine$double.eps)) |
length(x) < 2L || all(vapply(x, vld_equal, TRUE, y = x[[1]], tolerance = tolerance)) |
chk_all_equivalent(x, tolerance = sqrt(.Machine$double.eps)) |
length(x) < 2L || all(vapply(x, vld_equivalent, TRUE, y = x[[1]], tolerance = tolerance)) |
vld_all_identical(c(1, 2, 3)) # FALSE
#> [1] FALSE
vld_all_identical(c(1, 1, 1)) # TRUE
#> [1] TRUE
vld_identical(c(1, 2, 3), c(1, 2, 3)) # TRUE
#> [1] TRUE
vld_all_equal(c(0.1, 0.12, 0.13))
#> [1] FALSE
vld_all_equal(c(0.1, 0.12, 0.13), tolerance = 0.2)
#> [1] TRUE
vld_equal(c(0.1, 0.12, 0.13), c(0.1, 0.12, 0.13)) # TRUE
#> [1] TRUE
vld_equal(c(0.1, 0.12, 0.13), c(0.1, 0.12, 0.4), tolerance = 0.5) # TRUE
#> [1] TRUE
x <- c(0.1, 0.1, 0.1)
y <- c(0.1, 0.12, 0.13)
attr(y, "label") <- "Numbers"
vld_equal(x, y, tolerance = 0.5) # FALSE
#> [1] FALSE
vld_equivalent(x, y, tolerance = 0.5) # TRUE
#> [1] TRUECheck if the function input are numbers in increasing order
chk_sorted function checks if x is sorted in
non-decreasing order, ignoring any NA values.
| Function | Code |
|---|---|
chk_sorted(x) |
!is.unsorted(x, na.rm = TRUE) |
Check if the function input is composed by certain elements
The setequal function in R is used to check if two
vectors contain exactly the same elements, regardless of the order or
number of repetitions.
| Function | Code |
|---|---|
chk_setequal(x, values) |
setequal(x, values) |
vld_setequal(c(1, 2, 3), c(3, 2, 1)) # TRUE
#> [1] TRUE
vld_setequal(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
#> [1] FALSE
vld_setequal(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
#> [1] FALSE
vld_setequal(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
#> [1] TRUEFirst, the %in% function is used to check whether the
elements of a vector x are present in a specified set of
values. This returns a logical vector, which is then simplified by
all(). The all() function checks if all values
in the vector are TRUE. If the result is TRUE, it indicates that for
vld_ and chk_subset(), all elements in the
x vector are present in values. Similarly, for
vld_ and chk_superset(), it indicates that all
elements of values are present in x.
| Function | Code |
|---|---|
chk_subset(x, values) |
all(x %in% values) |
chk_not_subset(x, values) |
!any(x %in% values) || !length(x) |
chk_superset(x, values) |
all(values %in% x) |
# When both function inputs have the same elements,
# all functions return TRUE
vld_setequal(c(1, 2, 3), c(3, 2, 1)) # TRUE
#> [1] TRUE
vld_subset(c(1, 2, 3), c(3, 2, 1)) # TRUE
#> [1] TRUE
vld_superset(c(1, 2, 3), c(3, 2, 1)) # TRUE
#> [1] TRUE
vld_setequal(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
#> [1] TRUE
vld_subset(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
#> [1] TRUE
vld_superset(c(1, 2), c(1, 1, 1, 1, 1, 1, 2, 1)) # TRUE
#> [1] TRUE
# When there are elements present in one vector but not the other,
# `vld_setequal()` will return FALSE
vld_setequal(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
#> [1] FALSE
vld_setequal(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
#> [1] FALSE
# When some elements of the `x` input are not present in `values`,
# `vld_subset()` returns FALSE
vld_subset(c(1, 2, 3, 4), c(3, 2, 1)) # FALSE
#> [1] FALSE
vld_superset(c(1, 2, 3, 4), c(3, 2, 1)) # TRUE
#> [1] TRUE
# When some elements of the `values` input are not present in `x`,
# `vld_superset()` returns FALSE
vld_subset(c(1, 2, 3), c(3, 2, 1, 4)) # TRUE
#> [1] TRUE
vld_superset(c(1, 2, 3), c(3, 2, 1, 4)) # FALSE
#> [1] FALSE
# An empty set is considered a subset of any set, and any set is a superset of an empty set.
vld_subset(c(), c("apple", "banana")) # TRUE
#> [1] TRUE
vld_superset(c("apple", "banana"), c()) # TRUE
#> [1] TRUEchk_orderset() validate whether a given set of
values in a vector x matches a specified set of allowed
values (represented by values) while
preserving the order of those values.
| Function | Code |
|---|---|
chk_orderset |
vld_equivalent(unique(x[x %in% values]), values[values %in% x]) |
Check if the function input belongs to a class or type.
These functions check if x is an S3 or S4 object of the
specified class.
| Function | Code |
|---|---|
chk_s3_class(x, class) |
!isS4(x) && inherits(x, class) |
chk_s4_class(x, class) |
isS4(x) && methods::is(x, class) |
chk_is() checks if x inherits from a specified class,
regardless of whether it is an S3 or S4 object.
| Function | Code |
|---|---|
chk_is(x, class) |
inherits(x, class) |
Check if the function input matches a regular expression (REGEX).
chk_match(x, regexp = ".+") checks if the regular
expression pattern specified by regexp matches all the
non-missing values in the vector x. If regexp
it is not specified by the user, chk_match checks whether
all non-missing values in x contain at least one character
(regexp = “.+”)
| Function | Code |
|---|---|
chk_match(x, regexp = ".+") |
all(grepl(regexp, x[!is.na(x)])) |
Check if the function input meet some user defined quality criteria.
chk_not_empty function checks if the length of the
object is not zero. For a data frame or matrix, the length corresponds
to the number of elements (not rows or columns), while for a vector or
list, it corresponds to the number of elements.
chk_not_any_na function checks if there are no NA values
present in the entire object.
| Function | Code |
|---|---|
chk_not_empty(x) |
length(x) != 0L |
chk_not_any_na(x) |
!anyNA(x) |
vld_not_empty(c()) # FALSE
#> [1] FALSE
vld_not_empty(list()) # FALSE
#> [1] FALSE
vld_not_empty(data.frame()) # FALSE
#> [1] FALSE
vld_not_empty(data.frame(a = 1:3, b = 4:6)) # TRUE
#> [1] TRUE
vld_not_any_na(data.frame(a = 1:3, b = 4:6)) # TRUE
#> [1] TRUE
vld_not_any_na(data.frame(a = c(1, NA, 3), b = c(4, 5, 6))) # FALSE
#> [1] FALSEThe chk_unique() function is designed to verify that
there are no duplicates elements in a vector.
| Function | Code |
|---|---|
chk_unique(x, incomparables = FALSE) |
!anyDuplicated(x, incomparables = incomparables) |
The function chk_length checks whether the length of
x is within a specified range. It ensures that the length
is at least equal to length and no more than
upper. It can be used with vectors, lists and data
frames.
| Function | Code |
|---|---|
chk_length(x, length = 1L, upper = length) |
length(x) >= length && length(x) <= upper |
vld_length(c(1, 2, 3), length = 2, upper = 5) # TRUE
#> [1] TRUE
vld_length(c("a", "b"), length = 3) # FALSE
#> [1] FALSE
vld_length(list(a = 1, b = 2, c = 3), length = 2, upper = 4) # TRUE
#> [1] TRUE
vld_length(list(a = 1, b = 2, c = 3), length = 4) # FALSE
#> [1] FALSE
# 2 columns
vld_length(data.frame(x = 1:3, y = 4:6), length = 1, upper = 3) # TRUE
#> [1] TRUE
vld_length(data.frame(x = 1:3, y = 4:6), length = 3) # FALSE
#> [1] FALSE
# length of NULL is 0
vld_length(NULL, length = 0) # TRUE
#> [1] TRUE
vld_length(NULL, length = 1) # FALSE
#> [1] FALSEAnother useful function is chk_compatible_lenghts().
This function helps to check vectors could be ‘strictly recycled’.
a <- integer(0)
b <- numeric(0)
vld_compatible_lengths(a, b) # TRUE
#> [1] TRUE
a <- 1
b <- 2
vld_compatible_lengths(a, b) # TRUE
#> [1] TRUE
a <- 1:3
b <- 1:3
vld_compatible_lengths(a, b) # TRUE
#> [1] TRUE
b <- 1
vld_compatible_lengths(a, b) # TRUE
#> [1] TRUE
b <- 1:2
vld_compatible_lengths(a, b) # FALSE
#> [1] FALSE
b <- 1:6
vld_compatible_lengths(a, b) # FALSE
#> [1] FALSEThe chk_join() function is designed to validate whether
the number of rows in the resulting data frame from merging two data
frames (x and y) is equal to the number of
rows in the first data frame (x). This is useful when you
want to ensure that a join operation does not change the number of rows
in your main data frame.
| Function | Code |
|---|---|
chk_join(x, y, by) |
identical(nrow(x), nrow(merge(x, unique(y[if (is.null(names(by))) by else names(by)]), by = by))) |
x <- data.frame(id = c(1, 2, 3), value_x = c("A", "B", "C"))
y <- data.frame(id = c(1, 2, 3), value_y = c("D", "E", "F"))
vld_join(x, y, by = "id") # TRUE
#> [1] TRUE
# Perform a join that reduces the number of rows
y <- data.frame(id = c(1, 2, 1), value_y = c("D", "E", "F"))
vld_join(x, y, by = "id") # FALSE
#> [1] FALSEcheck_ functionsThe check_ functions combine several chk_
functions internally. Read the documentation for each function to learn
more about its specific use.
| Function | Description |
|---|---|
check_values(x, values) |
Checks values and S3 class of an atomic object. |
check_key(x, key = character(0), na_distinct = FALSE) |
Checks if columns have unique rows. |
check_data(x, values, exclusive, order, nrow, key) |
Checks column names, values, number of rows and key for a data.frame. |
check_dim(x, dim, values, dim_name) |
Checks dimension of an object. |
check_dirs(x, exists) |
Checks if all directories exist (or if exists = FALSE do not exist as directories or files). |
check_files(x, exists) |
Checks if all files exist (or if exists = FALSE do not exist as files or directories). |
check_names(x, names, exclusive, order) |
Checks the names of an object. |
Wickham, H. (2019). Advanced R, Second Edition (2nd ed.). Chapman and Hall/CRC.