Hi there

Bug

Yonghui Dong / 2018-04-01


Spring is coming, and so are the bugs!

Last night I found several bugs in the woolen carpet in my living room. I vacuum cleaned the carpet and sprayed some indoor pesticide, hoping that I could get rid of them.

When I was playing R this morning, I encountered another ‘interesting bug’.

I wanted to split a number and extracted its decimal part, so I wrote a simple function:

split <- function(x){
  t = unlist(strsplit(as.character(x),"\\."))[2]
  return(t)
}

I tested this function and I got what I expected.

split(pi)
## [1] "14159265358979"
split(1.00001)
## [1] "00001"

However when I made another test, I got some weird results:

split(0.0001)
## [1] NA
split(0.00001)
## [1] NA

I got some ‘NAs’. At the beginning I did not understand what happened. It seems that this bug appears only when the integer part of the number is 0 & the number has more than 3 decimal places & only the last decimal place is non-zero (what a strict condition!).

Finally I found out the reason. R by default uses scientific notation. So 0.0001 will be formatted as 1e-04, and when I use strsplit(1e-04, "\\.""), it will not split this number as there is no decimal separator here.

Naturally, the way to fix this bug is to disable the scientific notation and this time I got what I expected.

split <- function(x){
  # disable scientific notation
  options(scipen=999)
  t = unlist(strsplit(as.character(x),"\\."))[2]
  return(t)
}

split(0.00001)
## [1] "00001"

However, the bug is not fully fixed. For instance, I wanted to split(2.000), and I got ‘NA’ again.

split(2.000)
## [1] NA

The reason is that 2.00 will be converted into “2” using as.character(2.00). So the once-for-all solution is to use a character input rather than a numeric input.

split(2.00)
## [1] NA
split('2.00')
## [1] "00"
split('0.0001')
## [1] "0001"

oops ~~