R: Quickest way to create dataframe with an alternative to IFELSE

I have a similar question this the one on this thread: Using R, replace all values in a matrix <0.1 with 0?

But in my case I have hypothetically larger dataset and variable thresholds. I need to create a dataframe with each value retrieved from a condition using the values on the first columns of the same dataframe. These values are different for each line.

Here is an example of the dataframe:

SNP        A1  A2   MAF
rs3094315  G   A   0.172
rs7419119  G   T   0.240
rs13302957 G   A   0.081
rs6696609  T   C   0.393

Here is a sample of my code:

seqIndividuals = seq(1:201)
for(i in seqIndividuals) {
alFrequ[paste("IND",i,"a",sep="")] = ifelse(runif(length(alFrequ\$SNP),0.00,1.00) < alFrequ\$MAF, alFrequ\$A1, alFrequ\$A2)
alFrequ[paste("IND",i,"b",sep="")] = ifelse(runif(length(alFrequ\$SNP),0.00,1.00) < alFrequ\$MAF, alFrequ\$A1, alFrequ\$A2)
}

I am creating two new columns for each individual "i" in "seqIndividuals" by retrieving either values from column "A1" if a random value if lower than column "MAF", or "A2" if higher. The code is working great, but as a dataset grows in rows and columns (individuals) the time also grows significantly.

Is there a way to avoid using IFELSE for this situation, as I understand it works as a loop? I tried generating a matrix of random values and then replacing them, but it takes the same time or even longer.

mtxAlFrequ = matrix(runif(length(alFrequ\$SNP)*(201)),nrow=length(alFrequ\$SNP),ncol=201)
mtxAlFrequ[mtxAlFrequ < alFrequ\$MAF] = alFrequ\$A1

Thanks!

One option is data.table

library(data.table)
nm1 <- paste0("IND", rep(letters[1:2], length(seqIndividuals)),
rep(seqIndividuals, each = 2))
setDT(alFrequ)
for(j in seq_along(nm1)) {
alFrequ[, nm1[j] := A2
][runif(.N, 0, 1) < MAF , nm1[j] := A1][]
}

Benchmarks

set.seed(24)
alFrequ <- data.frame(SNP= paste0('rs', sample(600000, 340000, replace=FALSE)),
A1 = sample(c("G", "T", "A", "C"), 340000, replace=TRUE),
A2 = sample(c("G", "T", "A", "C"), 340000, replace=TRUE),
MAF = runif(340000, 0, 1), stringsAsFactors=FALSE)
nm1 <- paste0("IND", rep(letters[1:2], length(seqIndividuals)),
rep(seqIndividuals, each = 2))

system.time({
setDT(alFrequ)
for(j in seq_along(nm1)){
alFrequ[, nm1[j] := A2][runif(.N, 0, 1) < MAF , nm1[j] := A1][]
}
})
#   user  system elapsed
#  10.72    1.05   11.76

and using the OP's code on the original dataset

system.time({
for(i in seqIndividuals) {
alFrequ[paste("IND",i,"a",sep="")] = ifelse(runif(length(alFrequ\$SNP),0.00,1.00) <
alFrequ\$MAF, alFrequ\$A1, alFrequ\$A2)
alFrequ[paste("IND",i,"b",sep="")] = ifelse(runif(length(alFrequ\$SNP),0.00,1.00) <
alFrequ\$MAF, alFrequ\$A1, alFrequ\$A2)
}
})
#    user  system elapsed
#   72.16    6.82   79.33
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

# More Articles