Convert character array that includes asterisks to numeric number in MATLAB

I am trying to convert character arrays containing asterisks ('*') to numeric numbers.

I have a character vectors cell array based on data imported from a .dat file. For example, a cell array C

contains a column of cells (for example C{1,1}, C{2,1}, ... C{n,1}

), each containing a character vector, for example C{1,1}

contains:

'23.000          *          *      1.000      1.000      1.000     34.000      5.065      6.719'

      

When I try to convert C{1,1}

to a numeric double MATLAB returns an empty double character like

new_double = str2num(C{1,1})

new_double =

     []

      

When I remove the asterisk manually the code works:

 new_double = str2num(C{1,1})

 new_double =

   23.0000    1.0000    1.0000    1.0000   34.0000    5.0650    6.7190

      

All I want to do is read the data into a double array for further processing. I don't care if the command ignores the asterisks or replaces them with NaNs - the data with asterisks is not important to me. What matters is that I am reading data from the last two columns, for example 5.065 6.71. Unfortunately I cannot index them as they are embedded in a character vector.

I have also tried using:

c2 = C{1,1};
new_double = sscanf(c2,'%f%'); 

      

But he stops reading in an asterisk, for example,

new_double =

    23

      

I have searched all over the world, the only helpful post is: https://uk.mathworks.com/matlabcentral/answers/127847-how-to-read-csv-file-with-asterix However, I cannot use this method because I I am working with a character vector, not delimiters.

+3


source to share


2 answers


Here's another way:

C{1,1} = '23.000          *          *      1.000      1.000      1.000     34.000      5.065      6.719';
result = str2double(strsplit(C{1}));

      

This gives

result =
   23.0000       NaN       NaN    1.0000    1.0000    1.0000   34.0000    5.0650    6.7190

      



It works like this:

The advantage of using str2double

over str2num

is that the former doesn't use internally eval

, so it can't run potentially dangerous code.

+3


source


Let both do. In the first case, when you want to ignore the asterisks, you can remove them from the string and execute str2num

as usual. Defining your data:

C{1,1} = '23.000          *          *      1.000      1.000      1.000     34.000      5.065      6.719';

      

... you can use regular expressions to potentially remove multiple asterisks that are in a sequence (for example, if you have **

, ***

etc.) and change them to an empty string with regexprep

:

out = regexprep(C, '*+', '');

      

This tells us that for all strings in the cell array, C

we replace any existing sequence with *

an empty string.

In this case, we get:



>> out = regexprep(C, '*+', '')

out =

  cell

    '23.000                          1.000      1.000      1.000     34.000      5.065      6.719'

      

You can proceed and call str2num

accordingly. If you decide to replace the asterisks with NaN

, for example, just use regexprep

again, and NaN

instead of an empty string instead:

out = regexprep(C, '*+', 'NaN');

      

We get:

>> out = regexprep(C, '*+', 'NaN');

out =

  cell

    '23.000          NaN          NaN      1.000      1.000      1.000     34.000      5.065      6.719'

      

The point is to replace the affected parts of your string with something else, and regexprep

can definitely help.

+2


source







All Articles