Julia: Replacing a number with a string in an array
I have an array of numbers (integer or float) (it is actually a column in a DataFrame) and would like to replace, for example, all instances of 0 with "NaN" or some text. (Or convert 1 β "M" and 2 β "F".)
I ran into the problem that when I write array[i] = "text"
, I get the error:
`convert` has no method matching convert(::Type{Int64}, ::ASCIIString)
How do I get around this? Also, what is the most efficient way to do the Pandas' equivalent df.column.replace({1:"M", 2:"F"}, inplace=True)
?
I tried this:
df[:sex] = [ {1 => "M", 2 => "F"}[i] for i in df[:sex] ]
... but this leads to a problem when I only replace some of the values ββ(then I get the "X key not found" error since I am passing in a value from [: sex] which is not in my dict).
source to share
You might be better off with PooledDataArray
:
PooledDataArray{T}
: a variantDataArray{T}
optimized for representing arrays that contain many repetitions of a small number of unique values ββ- as is usually the case with categorical data.
... this is equivalent to categorization in pandas / R.
julia> df = DataFrame([1 3; 2 4; 1 6])
3x2 DataFrames.DataFrame
| Row | x1 | x2 |
|-----|----|----|
| 1 | 1 | 3 |
| 2 | 2 | 4 |
| 3 | 1 | 6 |
julia> PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
:Male
:Female
:Male
julia> df[:x1] = PooledDataArray(DataArrays.RefArray(df[:x1]), [:Male, :Female])
3-element DataArrays.PooledDataArray{Symbol,Int64,1}:
:Male
:Female
:Male
julia> df
3x2 DataFrames.DataFrame
| Row | x1 | x2 |
|-----|--------|----|
| 1 | Male | 3 |
| 2 | Female | 4 |
| 3 | Male | 6 |
Note: this works because the referenced array contains values ββfrom 1 to the size of the labels (2).
source to share