Bigdata: how to parse pst / e-mail data?

I have pst or email files in hdf. now, i want to do text analysis with whichever component available in hadoop works best. how to start with.

Should I first extract the actual content from these files and save it somewhere (like a text file) and then run the analysis on a text file?

please suggest me.

ps: I ran into this while I started searching on google. this is just one option or any other solution available.

+3


source to share





All Articles