Strange behavior when using apply with rank and order on a data.frame with ordered factors

I've found some weird behavior with `apply`.

Assume I have an arbitrary matrix of ordered variables

``````set.seed(4)
x <- ordered(sample(1:10, size=4, replace=T))
y <- ordered(sample(1:10, size=4, replace=T))
z <- ordered(sample(1:10, size=4, replace=T))

data1 <- data.frame(x,y,z)
``````

Now I want to get the ranks for each variable. I could do this two ways:

With a for loop:

``````rankmat1 <- data1
for(i in 1:dim(data1)){
rankmat1[, i] <- rank(data1 [, i])
}
``````

Or with `apply`

``````rankmat2 <- apply(data1, 2, rank)
``````

So, here are the original levels:

``````data1
x  y  z
1 6  9 10
2 1  3  1
3 3  8  8
4 3 10  3
``````

And here are the correct rankings:

``````rankmat1
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2
``````

But why are these rankings from `apply` permuted differently?

``````rankmat2
x y z
[1,] 4.0 4 2
[2,] 1.0 2 1
[3,] 2.5 3 4
[4,] 2.5 1 3
``````

This happens with `order` too:

``````ordermat1 <- data1
for(i in 1:dim(data1 )){
ordermat1[, i] <- order(data1 [, i])
}
ordermat2 <- apply(data1, 2, order)

ordermat1
x y z
1 2 2 2
2 3 3 4
3 4 1 3
4 1 4 1

ordermat2
x y z
[1,] 2 4 2
[2,] 3 2 1
[3,] 4 3 4
[4,] 1 1 3
``````

As requested by the OP, here is a detailed explanation which may help other R users to evade the traps.

### Trap 1

As joran has pointed out, `apply` coerces the data frame into a matrix thereby replacing the ordered factors by characters. So, the original data.frame

``````data1
x  y  z
1 6  9 10
2 1  3  1
3 3  8  8
4 3 10  3
``````

becomes

``````as.matrix(data1)
x   y    z
[1,] "6" "9"  "10"
[2,] "1" "3"  "1"
[3,] "3" "8"  "8"
[4,] "3" "10" "3"
``````

### Trap 2

Characters are sorted lexically. Thus, sorting the `y` column as character returns

``````sort(c("9", "3", "8", "10"))
 "10" "3"  "8"  "9"
``````

``````sort(c(9, 3, 8, 10))
  3  8  9 10
``````

This explains why `apply` returns a different result for the `rank` operation here.

### Solution

You can use `lapply` to compute the rank of each column of the data frame.

``````as.data.frame(lapply(data1, rank))
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2
``````

`lapply` returns a list and a data frame is a special kind of list.

Avoid `sapply` because `sapply` takes the output of `lapply`and "simplifies" it to something what it thinks is appropriate. Here,

``````sapply(data1, rank)
x y z
[1,] 4.0 3 4
[2,] 1.0 1 1
[3,] 2.5 2 3
[4,] 2.5 4 2
``````

returns a matrix (again!) which needs to be coerced to a data frame. (See chapter 8.3.20 of The R Inferno by Patrick Burns.The text is a good read, anyway.)

### Alternative Solution

The OP has not given an indication why he needs to work with ordered factors. If factors, ordered or not, are not essential to the OPs underlying problem, then `apply`would have worked as expected.

``````set.seed(4)
x2 <- sample(1:10, size = 4, replace = T)
y2 <- sample(1:10, size = 4, replace = T)
z2 <- sample(1:10, size = 4, replace = T)
data2 <- data.frame(x2, y2, z2)
data2
x2 y2 z2
1  6  9 10
2  1  3  1
3  3  8  8
4  3 10  3
apply(data2, 2, rank)
x2 y2 z2
[1,] 4.0  3  4
[2,] 1.0  1  1
[3,] 2.5  2  3
[4,] 2.5  4  2
``````

(Nevertheless, better to use `lapply` instead of `apply` with a data frame).

### Trap 3

When I started to learn `R`, I was misled by the name of the function `ordered()`. It took me a while to understand that it creates a special kind of factors. Likewise, it took me some time to figure out the difference between `sort()` and `order()` and when to use which function appropriately.

I am not sure why the extract reason for that happen to apply function. But you could try `sapply` to solve the problem.

```rankmat3 <- as.data.frame(sapply(data1, rank))
```
The result would be like:
```rankmat3
x y z
1 4.0 3 4
2 1.0 1 1
3 2.5 2 3
4 2.5 4 2
```

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

# More Articles