Missing Data Explorer

Nelson Gonzabato

2020-02-24

The acronym mde stands for Missing Data Explorer, a package that is intended to make missing data exploration as smooth and easy as possible.

The goal of mde is to ease exploration of missingness without feeling overwhelmed by syntax with particular focus on simplicity.

Installation

We can install mde as follows:


# install.packages("mde")

Loading the package

library(mde)
#> Welcome to mde. This is mde version 0.1.0.
#>  Please file issues and feedback at https://www.github.com/Nelson-Gon/mde/issues
#> Turn this message off using 'suppressPackageStartupMessages(library(mde))'
#>  Happy Exploration :)

Currently available functions.

  1. get_na_counts

This provides a convenient way to show the number of missing values columnwise. It is relatively fast(tests done on about 400,000 rows, took a few microseconds.)

get_na_counts(airquality)
#>   Ozone Solar.R    Wind    Temp   Month     Day 
#>      37       7       0       0       0       0

The above might be less useful if one would like to get the results by group. In that case, one can set grouped to TRUE and provide a vector of names in grouping_cols that will be used for grouping.


test <- structure(list(Subject = structure(c(1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), res = c(NA, 1, 2, 3), ID = structure(c(1L, 
1L, 2L, 2L), .Label = c("1", "2"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

get_na_counts(test, grouped = TRUE, grouping_cols = "ID")
#> # A tibble: 2 x 3
#>   ID    Subject   res
#>   <fct>   <int> <int>
#> 1 1           0     1
#> 2 2           0     0
  1. percent_missing

This is a very simple to use but quick way to take a look at the percentage of data that is missing columnwise.


percent_missing(airquality)
#>     Ozone   Solar.R      Wind      Temp     Month       Day 
#> 24.183007  4.575163  0.000000  0.000000  0.000000  0.000000

We can get the results by group by providing an optional grouping_cols character vector.


percent_missing(test, grouping_cols = "Subject")
#> # A tibble: 2 x 3
#>   Subject   res    ID
#>   <fct>   <dbl> <dbl>
#> 1 A          50     0
#> 2 B           0     0

To exclude some columns from the above exploration, one can provide an optional character vector in exclude_cols.

percent_missing(airquality,exclude_cols = c("Day","Temp"))
#>     Ozone   Solar.R      Wind     Month 
#> 24.183007  4.575163  0.000000  0.000000
  1. recode_as_na

As the name might imply, this converts any value or vector of values with NA i.e we take a value such as “missing” and convert it to R’s known handler for missing values(NA).

To use the function out of the box(with default arguments), one simply does something like:


dummy_test <- data.frame(ID = c("A","B","B","A"), 
                         values = c("n/a",NA,"Yes","No"))
# Convert n/a to NA
recode_as_na(dummy_test, value = "n/a")
#> Warning in recode_as_na.data.frame(dummy_test, value = "n/a"): Factor columns
#> have been converted to character
#>   ID values
#> 1  A   <NA>
#> 2  B   <NA>
#> 3  B    Yes
#> 4  A     No

Great, but I want to do so for specific columns not the entire dataset. You can do this by setting subset_df to TRUE and providing column names to subset_cols.


another_dummy <- data.frame(ID = 1:5, Subject = 7:11, 
Change = c("missing","n/a",2:4 ))
# Only change values at the column Change
recode_as_na(another_dummy, subset_df = TRUE,
             subset_cols = "Change", value = c("n/a",
                                               "missing"))
#> Warning in recode_as_na.data.frame(another_dummy, subset_df = TRUE, subset_cols
#> = "Change", : Factor columns have been converted to character
#>   ID Subject Change
#> 1  1       7   <NA>
#> 2  2       8   <NA>
#> 3  3       9      2
#> 4  4      10      3
#> 5  5      11      4

To use tidy selection, one can do the following:

mde::recode_as_na(airquality, subset_df = TRUE,
tidy=TRUE, pattern_type="starts_with",
pattern="Solar")
#>     Ozone Solar.R Wind Temp Month Day
#> 1      41     190  7.4   67     5   1
#> 2      36     118  8.0   72     5   2
#> 3      12     149 12.6   74     5   3
#> 4      18     313 11.5   62     5   4
#> 5      NA      NA 14.3   56     5   5
#> 6      28      NA 14.9   66     5   6
#> 7      23     299  8.6   65     5   7
#> 8      19      99 13.8   59     5   8
#> 9       8      19 20.1   61     5   9
#> 10     NA     194  8.6   69     5  10
#> 11      7      NA  6.9   74     5  11
#> 12     16     256  9.7   69     5  12
#> 13     11     290  9.2   66     5  13
#> 14     14     274 10.9   68     5  14
#> 15     18      65 13.2   58     5  15
#> 16     14     334 11.5   64     5  16
#> 17     34     307 12.0   66     5  17
#> 18      6      78 18.4   57     5  18
#> 19     30     322 11.5   68     5  19
#> 20     11      44  9.7   62     5  20
#> 21      1       8  9.7   59     5  21
#> 22     11     320 16.6   73     5  22
#> 23      4      25  9.7   61     5  23
#> 24     32      92 12.0   61     5  24
#> 25     NA      66 16.6   57     5  25
#> 26     NA     266 14.9   58     5  26
#> 27     NA      NA  8.0   57     5  27
#> 28     23      13 12.0   67     5  28
#> 29     45     252 14.9   81     5  29
#> 30    115     223  5.7   79     5  30
#> 31     37     279  7.4   76     5  31
#> 32     NA     286  8.6   78     6   1
#> 33     NA     287  9.7   74     6   2
#> 34     NA     242 16.1   67     6   3
#> 35     NA     186  9.2   84     6   4
#> 36     NA     220  8.6   85     6   5
#> 37     NA     264 14.3   79     6   6
#> 38     29     127  9.7   82     6   7
#> 39     NA     273  6.9   87     6   8
#> 40     71     291 13.8   90     6   9
#> 41     39     323 11.5   87     6  10
#> 42     NA     259 10.9   93     6  11
#> 43     NA     250  9.2   92     6  12
#> 44     23     148  8.0   82     6  13
#> 45     NA     332 13.8   80     6  14
#> 46     NA     322 11.5   79     6  15
#> 47     21     191 14.9   77     6  16
#> 48     37     284 20.7   72     6  17
#> 49     20      37  9.2   65     6  18
#> 50     12     120 11.5   73     6  19
#> 51     13     137 10.3   76     6  20
#> 52     NA     150  6.3   77     6  21
#> 53     NA      59  1.7   76     6  22
#> 54     NA      91  4.6   76     6  23
#> 55     NA     250  6.3   76     6  24
#> 56     NA     135  8.0   75     6  25
#> 57     NA     127  8.0   78     6  26
#> 58     NA      47 10.3   73     6  27
#> 59     NA      98 11.5   80     6  28
#> 60     NA      31 14.9   77     6  29
#> 61     NA     138  8.0   83     6  30
#> 62    135     269  4.1   84     7   1
#> 63     49     248  9.2   85     7   2
#> 64     32     236  9.2   81     7   3
#> 65     NA     101 10.9   84     7   4
#> 66     64     175  4.6   83     7   5
#> 67     40     314 10.9   83     7   6
#> 68     77     276  5.1   88     7   7
#> 69     97     267  6.3   92     7   8
#> 70     97     272  5.7   92     7   9
#> 71     85     175  7.4   89     7  10
#> 72     NA     139  8.6   82     7  11
#> 73     10     264 14.3   73     7  12
#> 74     27     175 14.9   81     7  13
#> 75     NA     291 14.9   91     7  14
#> 76      7      48 14.3   80     7  15
#> 77     48     260  6.9   81     7  16
#> 78     35     274 10.3   82     7  17
#> 79     61     285  6.3   84     7  18
#> 80     79     187  5.1   87     7  19
#> 81     63     220 11.5   85     7  20
#> 82     16       7  6.9   74     7  21
#> 83     NA     258  9.7   81     7  22
#> 84     NA     295 11.5   82     7  23
#> 85     80     294  8.6   86     7  24
#> 86    108     223  8.0   85     7  25
#> 87     20      81  8.6   82     7  26
#> 88     52      82 12.0   86     7  27
#> 89     82     213  7.4   88     7  28
#> 90     50     275  7.4   86     7  29
#> 91     64     253  7.4   83     7  30
#> 92     59     254  9.2   81     7  31
#> 93     39      83  6.9   81     8   1
#> 94      9      24 13.8   81     8   2
#> 95     16      77  7.4   82     8   3
#> 96     78      NA  6.9   86     8   4
#> 97     35      NA  7.4   85     8   5
#> 98     66      NA  4.6   87     8   6
#> 99    122     255  4.0   89     8   7
#> 100    89     229 10.3   90     8   8
#> 101   110     207  8.0   90     8   9
#> 102    NA     222  8.6   92     8  10
#> 103    NA     137 11.5   86     8  11
#> 104    44     192 11.5   86     8  12
#> 105    28     273 11.5   82     8  13
#> 106    65     157  9.7   80     8  14
#> 107    NA      64 11.5   79     8  15
#> 108    22      71 10.3   77     8  16
#> 109    59      51  6.3   79     8  17
#> 110    23     115  7.4   76     8  18
#> 111    31     244 10.9   78     8  19
#> 112    44     190 10.3   78     8  20
#> 113    21     259 15.5   77     8  21
#> 114     9      36 14.3   72     8  22
#> 115    NA     255 12.6   75     8  23
#> 116    45     212  9.7   79     8  24
#> 117   168     238  3.4   81     8  25
#> 118    73     215  8.0   86     8  26
#> 119    NA     153  5.7   88     8  27
#> 120    76     203  9.7   97     8  28
#> 121   118     225  2.3   94     8  29
#> 122    84     237  6.3   96     8  30
#> 123    85     188  6.3   94     8  31
#> 124    96     167  6.9   91     9   1
#> 125    78     197  5.1   92     9   2
#> 126    73     183  2.8   93     9   3
#> 127    91     189  4.6   93     9   4
#> 128    47      95  7.4   87     9   5
#> 129    32      92 15.5   84     9   6
#> 130    20     252 10.9   80     9   7
#> 131    23     220 10.3   78     9   8
#> 132    21     230 10.9   75     9   9
#> 133    24     259  9.7   73     9  10
#> 134    44     236 14.9   81     9  11
#> 135    21     259 15.5   76     9  12
#> 136    28     238  6.3   77     9  13
#> 137     9      24 10.9   71     9  14
#> 138    13     112 11.5   71     9  15
#> 139    46     237  6.9   78     9  16
#> 140    18     224 13.8   67     9  17
#> 141    13      27 10.3   76     9  18
#> 142    24     238 10.3   68     9  19
#> 143    16     201  8.0   82     9  20
#> 144    13     238 12.6   64     9  21
#> 145    23      14  9.2   71     9  22
#> 146    36     139 10.3   81     9  23
#> 147     7      49 10.3   69     9  24
#> 148    14      20 16.6   63     9  25
#> 149    30     193  6.9   70     9  26
#> 150    NA     145 13.2   77     9  27
#> 151    14     191 14.3   75     9  28
#> 152    18     131  8.0   76     9  29
#> 153    20     223 11.5   68     9  30
  1. sort_by_missingness

This provides a very simple but relatively fast way to sort variables by missingness. Unless otherwise stated, this does not currently support arranging grouped percents.

Usage:


sort_by_missingness(airquality, sort_by = "counts")
#>   variable count
#> 1     Wind     0
#> 2     Temp     0
#> 3    Month     0
#> 4      Day     0
#> 5  Solar.R     7
#> 6    Ozone    37

# sort in descending order

sort_by_missingness(airquality, sort_by = "counts",
descend = TRUE)
#>   variable count
#> 1    Ozone    37
#> 2  Solar.R     7
#> 3     Wind     0
#> 4     Temp     0
#> 5    Month     0
#> 6      Day     0

# Use percents
sort_by_missingness(airquality, sort_by = "percents")
#>   variable   percent
#> 1     Wind  0.000000
#> 2     Temp  0.000000
#> 3    Month  0.000000
#> 4      Day  0.000000
#> 5  Solar.R  4.575163
#> 6    Ozone 24.183007
  1. recode_na_as

Sometimes, for whatever reason one would like to replace NAs with whatever value they would like. recode_na_as provides a very simple way to do just that.

# defaults
head(recode_na_as(airquality))
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5     0       0 14.3   56     5   5
#> 6    28       0 14.9   66     5   6

To use a different value,

head(recode_na_as(airquality, value=NaN))
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5   NaN     NaN 14.3   56     5   5
#> 6    28     NaN 14.9   66     5   6

As a “bonus”, you can manipulate the data only at specific columns as shown here:


head(recode_na_as(airquality, value=0, subset_df=TRUE, subset_cols="Ozone"))
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5     0      NA 14.3   56     5   5
#> 6    28      NA 14.9   66     5   6

The above also supports tidy selection as follows:


head(mde::recode_na_as(airquality, subset_df=TRUE, tidy=TRUE,
                  value=0, pattern_type="starts_with",
                  pattern="solar",ignore.case=TRUE))
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5    NA       0 14.3   56     5   5
#> 6    28       0 14.9   66     5   6
  1. recode_na_if

Given a data.frame object, one can recode NAs as another value based on a grouping variable. In the example below, we replace all NAs in all columns with 0s if the ID is A2 or A3

some_data <- data.frame(ID=c("A1","A2","A3", "A4"), 
                        A=c(5,NA,0,8), B=c(10,0,0,1),
                        C=c(1,NA,NA,25))
                        
recode_na_if(some_data,grouping_col="ID", target_groups=c("A2","A3"),
           replacement= 0)  
#> # A tibble: 4 x 4
#>   ID        A     B     C
#>   <fct> <dbl> <dbl> <dbl>
#> 1 A1        5    10     1
#> 2 A2        0     0     0
#> 3 A3        0     0     0
#> 4 A4        8     1    25
  1. drop_na_if

Suppose you wanted to drop any column that has a percentage of NAs greater than or equal to a certain value? drop_na_if does just that.

We can drop any columns that have greater than or equal 24% of the values missing from airquality:

drop_na_if(airquality, sign = "gteq",percent_na = 24)
#>     Solar.R Wind Temp Month Day
#> 1       190  7.4   67     5   1
#> 2       118  8.0   72     5   2
#> 3       149 12.6   74     5   3
#> 4       313 11.5   62     5   4
#> 5        NA 14.3   56     5   5
#> 6        NA 14.9   66     5   6
#> 7       299  8.6   65     5   7
#> 8        99 13.8   59     5   8
#> 9        19 20.1   61     5   9
#> 10      194  8.6   69     5  10
#> 11       NA  6.9   74     5  11
#> 12      256  9.7   69     5  12
#> 13      290  9.2   66     5  13
#> 14      274 10.9   68     5  14
#> 15       65 13.2   58     5  15
#> 16      334 11.5   64     5  16
#> 17      307 12.0   66     5  17
#> 18       78 18.4   57     5  18
#> 19      322 11.5   68     5  19
#> 20       44  9.7   62     5  20
#> 21        8  9.7   59     5  21
#> 22      320 16.6   73     5  22
#> 23       25  9.7   61     5  23
#> 24       92 12.0   61     5  24
#> 25       66 16.6   57     5  25
#> 26      266 14.9   58     5  26
#> 27       NA  8.0   57     5  27
#> 28       13 12.0   67     5  28
#> 29      252 14.9   81     5  29
#> 30      223  5.7   79     5  30
#> 31      279  7.4   76     5  31
#> 32      286  8.6   78     6   1
#> 33      287  9.7   74     6   2
#> 34      242 16.1   67     6   3
#> 35      186  9.2   84     6   4
#> 36      220  8.6   85     6   5
#> 37      264 14.3   79     6   6
#> 38      127  9.7   82     6   7
#> 39      273  6.9   87     6   8
#> 40      291 13.8   90     6   9
#> 41      323 11.5   87     6  10
#> 42      259 10.9   93     6  11
#> 43      250  9.2   92     6  12
#> 44      148  8.0   82     6  13
#> 45      332 13.8   80     6  14
#> 46      322 11.5   79     6  15
#> 47      191 14.9   77     6  16
#> 48      284 20.7   72     6  17
#> 49       37  9.2   65     6  18
#> 50      120 11.5   73     6  19
#> 51      137 10.3   76     6  20
#> 52      150  6.3   77     6  21
#> 53       59  1.7   76     6  22
#> 54       91  4.6   76     6  23
#> 55      250  6.3   76     6  24
#> 56      135  8.0   75     6  25
#> 57      127  8.0   78     6  26
#> 58       47 10.3   73     6  27
#> 59       98 11.5   80     6  28
#> 60       31 14.9   77     6  29
#> 61      138  8.0   83     6  30
#> 62      269  4.1   84     7   1
#> 63      248  9.2   85     7   2
#> 64      236  9.2   81     7   3
#> 65      101 10.9   84     7   4
#> 66      175  4.6   83     7   5
#> 67      314 10.9   83     7   6
#> 68      276  5.1   88     7   7
#> 69      267  6.3   92     7   8
#> 70      272  5.7   92     7   9
#> 71      175  7.4   89     7  10
#> 72      139  8.6   82     7  11
#> 73      264 14.3   73     7  12
#> 74      175 14.9   81     7  13
#> 75      291 14.9   91     7  14
#> 76       48 14.3   80     7  15
#> 77      260  6.9   81     7  16
#> 78      274 10.3   82     7  17
#> 79      285  6.3   84     7  18
#> 80      187  5.1   87     7  19
#> 81      220 11.5   85     7  20
#> 82        7  6.9   74     7  21
#> 83      258  9.7   81     7  22
#> 84      295 11.5   82     7  23
#> 85      294  8.6   86     7  24
#> 86      223  8.0   85     7  25
#> 87       81  8.6   82     7  26
#> 88       82 12.0   86     7  27
#> 89      213  7.4   88     7  28
#> 90      275  7.4   86     7  29
#> 91      253  7.4   83     7  30
#> 92      254  9.2   81     7  31
#> 93       83  6.9   81     8   1
#> 94       24 13.8   81     8   2
#> 95       77  7.4   82     8   3
#> 96       NA  6.9   86     8   4
#> 97       NA  7.4   85     8   5
#> 98       NA  4.6   87     8   6
#> 99      255  4.0   89     8   7
#> 100     229 10.3   90     8   8
#> 101     207  8.0   90     8   9
#> 102     222  8.6   92     8  10
#> 103     137 11.5   86     8  11
#> 104     192 11.5   86     8  12
#> 105     273 11.5   82     8  13
#> 106     157  9.7   80     8  14
#> 107      64 11.5   79     8  15
#> 108      71 10.3   77     8  16
#> 109      51  6.3   79     8  17
#> 110     115  7.4   76     8  18
#> 111     244 10.9   78     8  19
#> 112     190 10.3   78     8  20
#> 113     259 15.5   77     8  21
#> 114      36 14.3   72     8  22
#> 115     255 12.6   75     8  23
#> 116     212  9.7   79     8  24
#> 117     238  3.4   81     8  25
#> 118     215  8.0   86     8  26
#> 119     153  5.7   88     8  27
#> 120     203  9.7   97     8  28
#> 121     225  2.3   94     8  29
#> 122     237  6.3   96     8  30
#> 123     188  6.3   94     8  31
#> 124     167  6.9   91     9   1
#> 125     197  5.1   92     9   2
#> 126     183  2.8   93     9   3
#> 127     189  4.6   93     9   4
#> 128      95  7.4   87     9   5
#> 129      92 15.5   84     9   6
#> 130     252 10.9   80     9   7
#> 131     220 10.3   78     9   8
#> 132     230 10.9   75     9   9
#> 133     259  9.7   73     9  10
#> 134     236 14.9   81     9  11
#> 135     259 15.5   76     9  12
#> 136     238  6.3   77     9  13
#> 137      24 10.9   71     9  14
#> 138     112 11.5   71     9  15
#> 139     237  6.9   78     9  16
#> 140     224 13.8   67     9  17
#> 141      27 10.3   76     9  18
#> 142     238 10.3   68     9  19
#> 143     201  8.0   82     9  20
#> 144     238 12.6   64     9  21
#> 145      14  9.2   71     9  22
#> 146     139 10.3   81     9  23
#> 147      49 10.3   69     9  24
#> 148      20 16.6   63     9  25
#> 149     193  6.9   70     9  26
#> 150     145 13.2   77     9  27
#> 151     191 14.3   75     9  28
#> 152     131  8.0   76     9  29
#> 153     223 11.5   68     9  30

If for whatever reason one would like to use decimals instead of percentages, then:


drop_na_if(airquality, sign="gteq",percent_na = 0.24)
#>     Wind Temp Month Day
#> 1    7.4   67     5   1
#> 2    8.0   72     5   2
#> 3   12.6   74     5   3
#> 4   11.5   62     5   4
#> 5   14.3   56     5   5
#> 6   14.9   66     5   6
#> 7    8.6   65     5   7
#> 8   13.8   59     5   8
#> 9   20.1   61     5   9
#> 10   8.6   69     5  10
#> 11   6.9   74     5  11
#> 12   9.7   69     5  12
#> 13   9.2   66     5  13
#> 14  10.9   68     5  14
#> 15  13.2   58     5  15
#> 16  11.5   64     5  16
#> 17  12.0   66     5  17
#> 18  18.4   57     5  18
#> 19  11.5   68     5  19
#> 20   9.7   62     5  20
#> 21   9.7   59     5  21
#> 22  16.6   73     5  22
#> 23   9.7   61     5  23
#> 24  12.0   61     5  24
#> 25  16.6   57     5  25
#> 26  14.9   58     5  26
#> 27   8.0   57     5  27
#> 28  12.0   67     5  28
#> 29  14.9   81     5  29
#> 30   5.7   79     5  30
#> 31   7.4   76     5  31
#> 32   8.6   78     6   1
#> 33   9.7   74     6   2
#> 34  16.1   67     6   3
#> 35   9.2   84     6   4
#> 36   8.6   85     6   5
#> 37  14.3   79     6   6
#> 38   9.7   82     6   7
#> 39   6.9   87     6   8
#> 40  13.8   90     6   9
#> 41  11.5   87     6  10
#> 42  10.9   93     6  11
#> 43   9.2   92     6  12
#> 44   8.0   82     6  13
#> 45  13.8   80     6  14
#> 46  11.5   79     6  15
#> 47  14.9   77     6  16
#> 48  20.7   72     6  17
#> 49   9.2   65     6  18
#> 50  11.5   73     6  19
#> 51  10.3   76     6  20
#> 52   6.3   77     6  21
#> 53   1.7   76     6  22
#> 54   4.6   76     6  23
#> 55   6.3   76     6  24
#> 56   8.0   75     6  25
#> 57   8.0   78     6  26
#> 58  10.3   73     6  27
#> 59  11.5   80     6  28
#> 60  14.9   77     6  29
#> 61   8.0   83     6  30
#> 62   4.1   84     7   1
#> 63   9.2   85     7   2
#> 64   9.2   81     7   3
#> 65  10.9   84     7   4
#> 66   4.6   83     7   5
#> 67  10.9   83     7   6
#> 68   5.1   88     7   7
#> 69   6.3   92     7   8
#> 70   5.7   92     7   9
#> 71   7.4   89     7  10
#> 72   8.6   82     7  11
#> 73  14.3   73     7  12
#> 74  14.9   81     7  13
#> 75  14.9   91     7  14
#> 76  14.3   80     7  15
#> 77   6.9   81     7  16
#> 78  10.3   82     7  17
#> 79   6.3   84     7  18
#> 80   5.1   87     7  19
#> 81  11.5   85     7  20
#> 82   6.9   74     7  21
#> 83   9.7   81     7  22
#> 84  11.5   82     7  23
#> 85   8.6   86     7  24
#> 86   8.0   85     7  25
#> 87   8.6   82     7  26
#> 88  12.0   86     7  27
#> 89   7.4   88     7  28
#> 90   7.4   86     7  29
#> 91   7.4   83     7  30
#> 92   9.2   81     7  31
#> 93   6.9   81     8   1
#> 94  13.8   81     8   2
#> 95   7.4   82     8   3
#> 96   6.9   86     8   4
#> 97   7.4   85     8   5
#> 98   4.6   87     8   6
#> 99   4.0   89     8   7
#> 100 10.3   90     8   8
#> 101  8.0   90     8   9
#> 102  8.6   92     8  10
#> 103 11.5   86     8  11
#> 104 11.5   86     8  12
#> 105 11.5   82     8  13
#> 106  9.7   80     8  14
#> 107 11.5   79     8  15
#> 108 10.3   77     8  16
#> 109  6.3   79     8  17
#> 110  7.4   76     8  18
#> 111 10.9   78     8  19
#> 112 10.3   78     8  20
#> 113 15.5   77     8  21
#> 114 14.3   72     8  22
#> 115 12.6   75     8  23
#> 116  9.7   79     8  24
#> 117  3.4   81     8  25
#> 118  8.0   86     8  26
#> 119  5.7   88     8  27
#> 120  9.7   97     8  28
#> 121  2.3   94     8  29
#> 122  6.3   96     8  30
#> 123  6.3   94     8  31
#> 124  6.9   91     9   1
#> 125  5.1   92     9   2
#> 126  2.8   93     9   3
#> 127  4.6   93     9   4
#> 128  7.4   87     9   5
#> 129 15.5   84     9   6
#> 130 10.9   80     9   7
#> 131 10.3   78     9   8
#> 132 10.9   75     9   9
#> 133  9.7   73     9  10
#> 134 14.9   81     9  11
#> 135 15.5   76     9  12
#> 136  6.3   77     9  13
#> 137 10.9   71     9  14
#> 138 11.5   71     9  15
#> 139  6.9   78     9  16
#> 140 13.8   67     9  17
#> 141 10.3   76     9  18
#> 142 10.3   68     9  19
#> 143  8.0   82     9  20
#> 144 12.6   64     9  21
#> 145  9.2   71     9  22
#> 146 10.3   81     9  23
#> 147 10.3   69     9  24
#> 148 16.6   63     9  25
#> 149  6.9   70     9  26
#> 150 13.2   77     9  27
#> 151 14.3   75     9  28
#> 152  8.0   76     9  29
#> 153 11.5   68     9  30

The above also supports less than or equal to(lteq), equal to(eq), greater than(gt) and less than(lt).

To keep certain columns despite fitting the target percent_na criteria, one can provide an optional keep_columns character vector.


head(drop_na_if(airquality, percent_na = 24, keep_columns = "Ozone"))
#>   Solar.R Wind Temp Month Day Ozone
#> 1     190  7.4   67     5   1    41
#> 2     118  8.0   72     5   2    36
#> 3     149 12.6   74     5   3    12
#> 4     313 11.5   62     5   4    18
#> 5      NA 14.3   56     5   5    NA
#> 6      NA 14.9   66     5   6    28

Compare the above result to the following:


head(drop_na_if(airquality, percent_na = 24))
#>   Solar.R Wind Temp Month Day
#> 1     190  7.4   67     5   1
#> 2     118  8.0   72     5   2
#> 3     149 12.6   74     5   3
#> 4     313 11.5   62     5   4
#> 5      NA 14.3   56     5   5
#> 6      NA 14.9   66     5   6

For more information, please see the documentation for drop_na_if especially for grouping support.

  1. drop_na_at

This provides a simple way to drop missing values only at specific columns. It currently only returns those columns with their missing values removed. See usage below. Further details are given in the documentation. It is currently case sensitive.


drop_na_at(airquality,pattern_type = "starts_with","O")
#>     Ozone
#> 1      41
#> 2      36
#> 3      12
#> 4      18
#> 5      28
#> 6      23
#> 7      19
#> 8       8
#> 9       7
#> 10     16
#> 11     11
#> 12     14
#> 13     18
#> 14     14
#> 15     34
#> 16      6
#> 17     30
#> 18     11
#> 19      1
#> 20     11
#> 21      4
#> 22     32
#> 23     23
#> 24     45
#> 25    115
#> 26     37
#> 27     29
#> 28     71
#> 29     39
#> 30     23
#> 31     21
#> 32     37
#> 33     20
#> 34     12
#> 35     13
#> 36    135
#> 37     49
#> 38     32
#> 39     64
#> 40     40
#> 41     77
#> 42     97
#> 43     97
#> 44     85
#> 45     10
#> 46     27
#> 47      7
#> 48     48
#> 49     35
#> 50     61
#> 51     79
#> 52     63
#> 53     16
#> 54     80
#> 55    108
#> 56     20
#> 57     52
#> 58     82
#> 59     50
#> 60     64
#> 61     59
#> 62     39
#> 63      9
#> 64     16
#> 65     78
#> 66     35
#> 67     66
#> 68    122
#> 69     89
#> 70    110
#> 71     44
#> 72     28
#> 73     65
#> 74     22
#> 75     59
#> 76     23
#> 77     31
#> 78     44
#> 79     21
#> 80      9
#> 81     45
#> 82    168
#> 83     73
#> 84     76
#> 85    118
#> 86     84
#> 87     85
#> 88     96
#> 89     78
#> 90     73
#> 91     91
#> 92     47
#> 93     32
#> 94     20
#> 95     23
#> 96     21
#> 97     24
#> 98     44
#> 99     21
#> 100    28
#> 101     9
#> 102    13
#> 103    46
#> 104    18
#> 105    13
#> 106    24
#> 107    16
#> 108    13
#> 109    23
#> 110    36
#> 111     7
#> 112    14
#> 113    30
#> 114    14
#> 115    18
#> 116    20
  1. recode_as_na_for

For all values greater/less/less or equal/greater or equal than some value, can I convert them to NA?!

Yes You Can! All we have to do is use recode_as_na_for:


recode_as_na_for(airquality,criteria="gt",value=25)
#>     Ozone Solar.R Wind Temp Month Day
#> 1      NA      NA  7.4   NA     5   1
#> 2      NA      NA  8.0   NA     5   2
#> 3      12      NA 12.6   NA     5   3
#> 4      18      NA 11.5   NA     5   4
#> 5      NA      NA 14.3   NA     5   5
#> 6      NA      NA 14.9   NA     5   6
#> 7      23      NA  8.6   NA     5   7
#> 8      19      NA 13.8   NA     5   8
#> 9       8      19 20.1   NA     5   9
#> 10     NA      NA  8.6   NA     5  10
#> 11      7      NA  6.9   NA     5  11
#> 12     16      NA  9.7   NA     5  12
#> 13     11      NA  9.2   NA     5  13
#> 14     14      NA 10.9   NA     5  14
#> 15     18      NA 13.2   NA     5  15
#> 16     14      NA 11.5   NA     5  16
#> 17     NA      NA 12.0   NA     5  17
#> 18      6      NA 18.4   NA     5  18
#> 19     NA      NA 11.5   NA     5  19
#> 20     11      NA  9.7   NA     5  20
#> 21      1       8  9.7   NA     5  21
#> 22     11      NA 16.6   NA     5  22
#> 23      4      25  9.7   NA     5  23
#> 24     NA      NA 12.0   NA     5  24
#> 25     NA      NA 16.6   NA     5  25
#> 26     NA      NA 14.9   NA     5  NA
#> 27     NA      NA  8.0   NA     5  NA
#> 28     23      13 12.0   NA     5  NA
#> 29     NA      NA 14.9   NA     5  NA
#> 30     NA      NA  5.7   NA     5  NA
#> 31     NA      NA  7.4   NA     5  NA
#> 32     NA      NA  8.6   NA     6   1
#> 33     NA      NA  9.7   NA     6   2
#> 34     NA      NA 16.1   NA     6   3
#> 35     NA      NA  9.2   NA     6   4
#> 36     NA      NA  8.6   NA     6   5
#> 37     NA      NA 14.3   NA     6   6
#> 38     NA      NA  9.7   NA     6   7
#> 39     NA      NA  6.9   NA     6   8
#> 40     NA      NA 13.8   NA     6   9
#> 41     NA      NA 11.5   NA     6  10
#> 42     NA      NA 10.9   NA     6  11
#> 43     NA      NA  9.2   NA     6  12
#> 44     23      NA  8.0   NA     6  13
#> 45     NA      NA 13.8   NA     6  14
#> 46     NA      NA 11.5   NA     6  15
#> 47     21      NA 14.9   NA     6  16
#> 48     NA      NA 20.7   NA     6  17
#> 49     20      NA  9.2   NA     6  18
#> 50     12      NA 11.5   NA     6  19
#> 51     13      NA 10.3   NA     6  20
#> 52     NA      NA  6.3   NA     6  21
#> 53     NA      NA  1.7   NA     6  22
#> 54     NA      NA  4.6   NA     6  23
#> 55     NA      NA  6.3   NA     6  24
#> 56     NA      NA  8.0   NA     6  25
#> 57     NA      NA  8.0   NA     6  NA
#> 58     NA      NA 10.3   NA     6  NA
#> 59     NA      NA 11.5   NA     6  NA
#> 60     NA      NA 14.9   NA     6  NA
#> 61     NA      NA  8.0   NA     6  NA
#> 62     NA      NA  4.1   NA     7   1
#> 63     NA      NA  9.2   NA     7   2
#> 64     NA      NA  9.2   NA     7   3
#> 65     NA      NA 10.9   NA     7   4
#> 66     NA      NA  4.6   NA     7   5
#> 67     NA      NA 10.9   NA     7   6
#> 68     NA      NA  5.1   NA     7   7
#> 69     NA      NA  6.3   NA     7   8
#> 70     NA      NA  5.7   NA     7   9
#> 71     NA      NA  7.4   NA     7  10
#> 72     NA      NA  8.6   NA     7  11
#> 73     10      NA 14.3   NA     7  12
#> 74     NA      NA 14.9   NA     7  13
#> 75     NA      NA 14.9   NA     7  14
#> 76      7      NA 14.3   NA     7  15
#> 77     NA      NA  6.9   NA     7  16
#> 78     NA      NA 10.3   NA     7  17
#> 79     NA      NA  6.3   NA     7  18
#> 80     NA      NA  5.1   NA     7  19
#> 81     NA      NA 11.5   NA     7  20
#> 82     16       7  6.9   NA     7  21
#> 83     NA      NA  9.7   NA     7  22
#> 84     NA      NA 11.5   NA     7  23
#> 85     NA      NA  8.6   NA     7  24
#> 86     NA      NA  8.0   NA     7  25
#> 87     20      NA  8.6   NA     7  NA
#> 88     NA      NA 12.0   NA     7  NA
#> 89     NA      NA  7.4   NA     7  NA
#> 90     NA      NA  7.4   NA     7  NA
#> 91     NA      NA  7.4   NA     7  NA
#> 92     NA      NA  9.2   NA     7  NA
#> 93     NA      NA  6.9   NA     8   1
#> 94      9      24 13.8   NA     8   2
#> 95     16      NA  7.4   NA     8   3
#> 96     NA      NA  6.9   NA     8   4
#> 97     NA      NA  7.4   NA     8   5
#> 98     NA      NA  4.6   NA     8   6
#> 99     NA      NA  4.0   NA     8   7
#> 100    NA      NA 10.3   NA     8   8
#> 101    NA      NA  8.0   NA     8   9
#> 102    NA      NA  8.6   NA     8  10
#> 103    NA      NA 11.5   NA     8  11
#> 104    NA      NA 11.5   NA     8  12
#> 105    NA      NA 11.5   NA     8  13
#> 106    NA      NA  9.7   NA     8  14
#> 107    NA      NA 11.5   NA     8  15
#> 108    22      NA 10.3   NA     8  16
#> 109    NA      NA  6.3   NA     8  17
#> 110    23      NA  7.4   NA     8  18
#> 111    NA      NA 10.9   NA     8  19
#> 112    NA      NA 10.3   NA     8  20
#> 113    21      NA 15.5   NA     8  21
#> 114     9      NA 14.3   NA     8  22
#> 115    NA      NA 12.6   NA     8  23
#> 116    NA      NA  9.7   NA     8  24
#> 117    NA      NA  3.4   NA     8  25
#> 118    NA      NA  8.0   NA     8  NA
#> 119    NA      NA  5.7   NA     8  NA
#> 120    NA      NA  9.7   NA     8  NA
#> 121    NA      NA  2.3   NA     8  NA
#> 122    NA      NA  6.3   NA     8  NA
#> 123    NA      NA  6.3   NA     8  NA
#> 124    NA      NA  6.9   NA     9   1
#> 125    NA      NA  5.1   NA     9   2
#> 126    NA      NA  2.8   NA     9   3
#> 127    NA      NA  4.6   NA     9   4
#> 128    NA      NA  7.4   NA     9   5
#> 129    NA      NA 15.5   NA     9   6
#> 130    20      NA 10.9   NA     9   7
#> 131    23      NA 10.3   NA     9   8
#> 132    21      NA 10.9   NA     9   9
#> 133    24      NA  9.7   NA     9  10
#> 134    NA      NA 14.9   NA     9  11
#> 135    21      NA 15.5   NA     9  12
#> 136    NA      NA  6.3   NA     9  13
#> 137     9      24 10.9   NA     9  14
#> 138    13      NA 11.5   NA     9  15
#> 139    NA      NA  6.9   NA     9  16
#> 140    18      NA 13.8   NA     9  17
#> 141    13      NA 10.3   NA     9  18
#> 142    24      NA 10.3   NA     9  19
#> 143    16      NA  8.0   NA     9  20
#> 144    13      NA 12.6   NA     9  21
#> 145    23      14  9.2   NA     9  22
#> 146    NA      NA 10.3   NA     9  23
#> 147     7      NA 10.3   NA     9  24
#> 148    14      20 16.6   NA     9  25
#> 149    NA      NA  6.9   NA     9  NA
#> 150    NA      NA 13.2   NA     9  NA
#> 151    14      NA 14.3   NA     9  NA
#> 152    18      NA  8.0   NA     9  NA
#> 153    20      NA 11.5   NA     9  NA

To do so at specific columns, pass an optional subset_cols character vector:


recode_as_na_for(airquality, value=25,subset_cols="Solar.R",
criteria="gt")
#>     Ozone Solar.R Wind Temp Month Day
#> 1      41      NA  7.4   67     5   1
#> 2      36      NA  8.0   72     5   2
#> 3      12      NA 12.6   74     5   3
#> 4      18      NA 11.5   62     5   4
#> 5      NA      NA 14.3   56     5   5
#> 6      28      NA 14.9   66     5   6
#> 7      23      NA  8.6   65     5   7
#> 8      19      NA 13.8   59     5   8
#> 9       8      19 20.1   61     5   9
#> 10     NA      NA  8.6   69     5  10
#> 11      7      NA  6.9   74     5  11
#> 12     16      NA  9.7   69     5  12
#> 13     11      NA  9.2   66     5  13
#> 14     14      NA 10.9   68     5  14
#> 15     18      NA 13.2   58     5  15
#> 16     14      NA 11.5   64     5  16
#> 17     34      NA 12.0   66     5  17
#> 18      6      NA 18.4   57     5  18
#> 19     30      NA 11.5   68     5  19
#> 20     11      NA  9.7   62     5  20
#> 21      1       8  9.7   59     5  21
#> 22     11      NA 16.6   73     5  22
#> 23      4      25  9.7   61     5  23
#> 24     32      NA 12.0   61     5  24
#> 25     NA      NA 16.6   57     5  25
#> 26     NA      NA 14.9   58     5  26
#> 27     NA      NA  8.0   57     5  27
#> 28     23      13 12.0   67     5  28
#> 29     45      NA 14.9   81     5  29
#> 30    115      NA  5.7   79     5  30
#> 31     37      NA  7.4   76     5  31
#> 32     NA      NA  8.6   78     6   1
#> 33     NA      NA  9.7   74     6   2
#> 34     NA      NA 16.1   67     6   3
#> 35     NA      NA  9.2   84     6   4
#> 36     NA      NA  8.6   85     6   5
#> 37     NA      NA 14.3   79     6   6
#> 38     29      NA  9.7   82     6   7
#> 39     NA      NA  6.9   87     6   8
#> 40     71      NA 13.8   90     6   9
#> 41     39      NA 11.5   87     6  10
#> 42     NA      NA 10.9   93     6  11
#> 43     NA      NA  9.2   92     6  12
#> 44     23      NA  8.0   82     6  13
#> 45     NA      NA 13.8   80     6  14
#> 46     NA      NA 11.5   79     6  15
#> 47     21      NA 14.9   77     6  16
#> 48     37      NA 20.7   72     6  17
#> 49     20      NA  9.2   65     6  18
#> 50     12      NA 11.5   73     6  19
#> 51     13      NA 10.3   76     6  20
#> 52     NA      NA  6.3   77     6  21
#> 53     NA      NA  1.7   76     6  22
#> 54     NA      NA  4.6   76     6  23
#> 55     NA      NA  6.3   76     6  24
#> 56     NA      NA  8.0   75     6  25
#> 57     NA      NA  8.0   78     6  26
#> 58     NA      NA 10.3   73     6  27
#> 59     NA      NA 11.5   80     6  28
#> 60     NA      NA 14.9   77     6  29
#> 61     NA      NA  8.0   83     6  30
#> 62    135      NA  4.1   84     7   1
#> 63     49      NA  9.2   85     7   2
#> 64     32      NA  9.2   81     7   3
#> 65     NA      NA 10.9   84     7   4
#> 66     64      NA  4.6   83     7   5
#> 67     40      NA 10.9   83     7   6
#> 68     77      NA  5.1   88     7   7
#> 69     97      NA  6.3   92     7   8
#> 70     97      NA  5.7   92     7   9
#> 71     85      NA  7.4   89     7  10
#> 72     NA      NA  8.6   82     7  11
#> 73     10      NA 14.3   73     7  12
#> 74     27      NA 14.9   81     7  13
#> 75     NA      NA 14.9   91     7  14
#> 76      7      NA 14.3   80     7  15
#> 77     48      NA  6.9   81     7  16
#> 78     35      NA 10.3   82     7  17
#> 79     61      NA  6.3   84     7  18
#> 80     79      NA  5.1   87     7  19
#> 81     63      NA 11.5   85     7  20
#> 82     16       7  6.9   74     7  21
#> 83     NA      NA  9.7   81     7  22
#> 84     NA      NA 11.5   82     7  23
#> 85     80      NA  8.6   86     7  24
#> 86    108      NA  8.0   85     7  25
#> 87     20      NA  8.6   82     7  26
#> 88     52      NA 12.0   86     7  27
#> 89     82      NA  7.4   88     7  28
#> 90     50      NA  7.4   86     7  29
#> 91     64      NA  7.4   83     7  30
#> 92     59      NA  9.2   81     7  31
#> 93     39      NA  6.9   81     8   1
#> 94      9      24 13.8   81     8   2
#> 95     16      NA  7.4   82     8   3
#> 96     78      NA  6.9   86     8   4
#> 97     35      NA  7.4   85     8   5
#> 98     66      NA  4.6   87     8   6
#> 99    122      NA  4.0   89     8   7
#> 100    89      NA 10.3   90     8   8
#> 101   110      NA  8.0   90     8   9
#> 102    NA      NA  8.6   92     8  10
#> 103    NA      NA 11.5   86     8  11
#> 104    44      NA 11.5   86     8  12
#> 105    28      NA 11.5   82     8  13
#> 106    65      NA  9.7   80     8  14
#> 107    NA      NA 11.5   79     8  15
#> 108    22      NA 10.3   77     8  16
#> 109    59      NA  6.3   79     8  17
#> 110    23      NA  7.4   76     8  18
#> 111    31      NA 10.9   78     8  19
#> 112    44      NA 10.3   78     8  20
#> 113    21      NA 15.5   77     8  21
#> 114     9      NA 14.3   72     8  22
#> 115    NA      NA 12.6   75     8  23
#> 116    45      NA  9.7   79     8  24
#> 117   168      NA  3.4   81     8  25
#> 118    73      NA  8.0   86     8  26
#> 119    NA      NA  5.7   88     8  27
#> 120    76      NA  9.7   97     8  28
#> 121   118      NA  2.3   94     8  29
#> 122    84      NA  6.3   96     8  30
#> 123    85      NA  6.3   94     8  31
#> 124    96      NA  6.9   91     9   1
#> 125    78      NA  5.1   92     9   2
#> 126    73      NA  2.8   93     9   3
#> 127    91      NA  4.6   93     9   4
#> 128    47      NA  7.4   87     9   5
#> 129    32      NA 15.5   84     9   6
#> 130    20      NA 10.9   80     9   7
#> 131    23      NA 10.3   78     9   8
#> 132    21      NA 10.9   75     9   9
#> 133    24      NA  9.7   73     9  10
#> 134    44      NA 14.9   81     9  11
#> 135    21      NA 15.5   76     9  12
#> 136    28      NA  6.3   77     9  13
#> 137     9      24 10.9   71     9  14
#> 138    13      NA 11.5   71     9  15
#> 139    46      NA  6.9   78     9  16
#> 140    18      NA 13.8   67     9  17
#> 141    13      NA 10.3   76     9  18
#> 142    24      NA 10.3   68     9  19
#> 143    16      NA  8.0   82     9  20
#> 144    13      NA 12.6   64     9  21
#> 145    23      14  9.2   71     9  22
#> 146    36      NA 10.3   81     9  23
#> 147     7      NA 10.3   69     9  24
#> 148    14      20 16.6   63     9  25
#> 149    30      NA  6.9   70     9  26
#> 150    NA      NA 13.2   77     9  27
#> 151    14      NA 14.3   75     9  28
#> 152    18      NA  8.0   76     9  29
#> 153    20      NA 11.5   68     9  30

To raise an issue, please do so here

Thank you, feedback is always welcome :)