My problems creating a MatLab table encoded in UTF-8

I know MatLab shouldn't be perfect for this, but I want to do a little bit of work with my table data.dat file, which looks like this:

ID,Name,Surname,Y,M,D,Num,Loc
1672399390,A,DULKINAS,1993,01,31,3019,Šiauliai
4157844163,D,SKARBALIUS,1993,12,08,3019,Tauragė
5541091033,E,LUKOŠEVIČIUS,1992,10,25,3019,Panevėžys
2005609387,M,DUBINSKAS,1991,03,31,3019,Kaunas
2716651285,P,ŽIEDELIS,1992,02,28,3019,Vilnius

      

Since the data is neatly formatted and separated by commas, I decided to just use readtable('data.dat')

and work from there.

Problem 1. MatLab does not indicate where the faulty line is. Since there are a couple of redundant commas, it was just throwing an error Each line of the text file must have the same number of delimiters . I solved this by counting the commas on each line, using other tools, and manually correcting them afterwards.

Problem 2. For some reason, it renames the first variable ID

(which is AFAIK a valid non-reserved variable name) before x__ID

and gives a warning Changed names have been changed to make them valid MATLAB identifiers . It doesn't bother me, but it's weird.

Problem 3. Well UTF-8 characters are displayed incorrectly. Moreover, after I tried my documentation and run readtable('data.dat','FileEncoding','UTF-8')

it gives me the error Invalid parameter name: FileEncoding , I am confused.

How should I approach this situation?

+3


source to share


2 answers


This is probably because you are using a version of matlab that is older than R2014b. The option FileEncoding

was added in R2014b . If you check the documentation in your installation for doc readtable

, you will likely find it missing.



The reason for renaming the identifier is because it interprets the "Byte Score" at the beginning of your unicode document as part of the name

+2


source


Also, to resolve issue 1 - lines with extra commas are now flagged in the error message with R2015a. I added an extra comma to the data file on line 4 and here is the result:



>> readtable('data.dat', 'FileEncoding', 'UTF-8')
Error using readtable (line 129)
Reading failed at line 4. All lines of a text file must have the same number of delimiters. 
Line 4 has 8 delimiters, while preceding lines have 7. 

      

+1


source







All Articles