Convert character array that includes asterisks to numeric number in MATLAB
I am trying to convert character arrays containing asterisks ('*') to numeric numbers.
I have a character vectors cell array based on data imported from a .dat file. For example, a cell array C
contains a column of cells (for example C{1,1}, C{2,1}, ... C{n,1}
), each containing a character vector, for example C{1,1}
contains:
'23.000 * * 1.000 1.000 1.000 34.000 5.065 6.719'
When I try to convert C{1,1}
to a numeric double MATLAB returns an empty double character like
new_double = str2num(C{1,1})
new_double =
[]
When I remove the asterisk manually the code works:
new_double = str2num(C{1,1})
new_double =
23.0000 1.0000 1.0000 1.0000 34.0000 5.0650 6.7190
All I want to do is read the data into a double array for further processing. I don't care if the command ignores the asterisks or replaces them with NaNs - the data with asterisks is not important to me. What matters is that I am reading data from the last two columns, for example 5.065 6.71. Unfortunately I cannot index them as they are embedded in a character vector.
I have also tried using:
c2 = C{1,1};
new_double = sscanf(c2,'%f%');
But he stops reading in an asterisk, for example,
new_double = 23
I have searched all over the world, the only helpful post is: https://uk.mathworks.com/matlabcentral/answers/127847-how-to-read-csv-file-with-asterix However, I cannot use this method because I I am working with a character vector, not delimiters.
source to share
Here's another way:
C{1,1} = '23.000 * * 1.000 1.000 1.000 34.000 5.065 6.719'; result = str2double(strsplit(C{1}));
This gives
result =
23.0000 NaN NaN 1.0000 1.0000 1.0000 34.0000 5.0650 6.7190
It works like this:
-
strsplit
separates the line in spaces. This gives a cell array of substrings formed by adjacent nonspatial characters, -
str2double
converts each of the cells to a number and gives a numeric vector as the result, whenNaN
for records that cannot be interpreted as numbers.
The advantage of using str2double
over str2num
is that the former doesn't use internally eval
, so it can't run potentially dangerous code.
source to share
Let both do. In the first case, when you want to ignore the asterisks, you can remove them from the string and execute str2num
as usual. Defining your data:
C{1,1} = '23.000 * * 1.000 1.000 1.000 34.000 5.065 6.719';
... you can use regular expressions to potentially remove multiple asterisks that are in a sequence (for example, if you have **
, ***
etc.) and change them to an empty string with regexprep
:
out = regexprep(C, '*+', '');
This tells us that for all strings in the cell array, C
we replace any existing sequence with *
an empty string.
In this case, we get:
>> out = regexprep(C, '*+', '')
out =
cell
'23.000 1.000 1.000 1.000 34.000 5.065 6.719'
You can proceed and call str2num
accordingly. If you decide to replace the asterisks with NaN
, for example, just use regexprep
again, and NaN
instead of an empty string instead:
out = regexprep(C, '*+', 'NaN');
We get:
>> out = regexprep(C, '*+', 'NaN');
out =
cell
'23.000 NaN NaN 1.000 1.000 1.000 34.000 5.065 6.719'
The point is to replace the affected parts of your string with something else, and regexprep
can definitely help.
source to share