Ruby CSV gem returns Infinity instead of double

I have a method that goes through a CSV file and loads into Postgres.

CSV.foreach(path, converters: :all)

      

When it encounters a number like "2.02E + 17" it loads "2.0150519e + 17", but when it encounters "20150515E000590" it loads "Infinity".

If I installed

CSV.foreach(path)

      

When it encounters "2.02E + 17", it loads "20150519E000010" and "20150515E000590" as "20150515E000590".

I want to download exactly what is shown in Excel. So in the case of "2.02E + 17" I want to load "2.02E + 17", but in the case of "20150515E000590" I want to load "20150515E000590" and not "Infinity". My question is, how do I get the CSV not to override "20150515E000590" with "Infinity"?

+3


source to share


1 answer


First of all, Postgres might be able to handle CSV uploads without Ruby. As for your question ...


CSV doesn't define data types, so whenever you read CSV data into what data types expect (like Excel or Ruby), the program has to guess.

When Excel sees it 20150519E000010

, it guesses that this is scientific notation 20150519e10, i.e. 20150519 x 10 10 . Excel makes a distinction between the basic data in the spreadsheet and its display method, however, in this case it selects the shortest way to display this number: 2.02E+17

. Therefore, even if Excel shows you 2.02E+17

, the actual data in the file 20150519E000010

.



When you read the CSV in Ruby and saying that it is converted into the Ruby types of data, it does the same conjecture (that is, scientific notation), but you get the other display: 2.0150519e+17

. This is to be expected because it 2.02E+17

is Excel's way of reducing the amount of data displayed. Ruby data types don't match Excel. This also explains why 20150515E00059

it becomes Infinity

. 20150515 x 10 59 is too large for a Ruby floating point datatype, so Ruby converts it to the largest possible float: Infinity.

However, I strongly suspect that both Excel and Ruby are wrong . When I see 20150515E000059

it it looks like 2015-05-15 00:00:59

. It's not a number in scientific notation, it's a timestamp! You can define a custom converter to handle the format:

CSV::Converters[:mytime] = lambda do |s|
  DateTime.parse(s.tr(?E, ?T)) rescue s
end

CSV.parse("20150515000019", converters: :mytime)
# [[#<DateTime: 2015-05-15T00:00:19+00:00 ((2457158j,19s,0n),+0s,2299161j)>]]

      

+2


source







All Articles