Getting Plain Text in Yahoo Pipes

I have a Yahoo pipe taking an Atom feed from a google group and I want to do some processing on the full body of the post (running various regexes to fetch data). I can get the body of the post as plain text from google using a url like:

http://groups.google.com/group/(group_name)/msg/(message_id)?dmode=source&output=gplain

      

However, I am having trouble getting it inside Yahoo pipes as a string value. Extract page from non-HTML pages. YQL using html table seems to work and wraps plain text inside a p element whose text I can extract like this:

select * from html where url="..." and xpath="//p"

      

However, if the body of the message contains html tags, YQL returns an HTML subtree instead of a string. Is there a way to flatten it back into its HTML source?

+2


source to share


1 answer


The trick is to remove "output = gplain" and grab the content from the pre element.

select content from html 
where url="http://groups.google.com/group/haml/msg/0f78eda2f5ef802d?dmode=source" 
and xpath='//div[contains(@class,"maincontbox")]/pre'

      



I created a feed with Google Group and Message ID as input for the demo:

http://pipes.yahoo.com/pipes/pipe.info?_id=3d345e162405e7dbd47d73b95c21f102

+1


source







All Articles