Split pandas Series of lines containing multiline strings into separate lines

I have a pandas series filled with lines like this:

In:    
s = pd.Series(['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.'])

Out:
0                        This is a single line.
1                          This is another one.
2    This is a string\nwith more than one line.
dtype: object

      

How can I split all lines in this Series that contain the linebreak character \n

into their own lines? I would expect:

0      This is a single line.
1        This is another one.
2            This is a string
3    with more than one line.
dtype: object

      

I know that I can split each line with a linebreak character with

s = s.str.split('\n')

      

which gives

0                        [This is a single line.]
1                          [This is another one.]
2    [This is a string, with more than one line.]

      

but this only splits the line within the line, not into its own lines for each token.

+3


source to share


1 answer


You can loop through each line in each line to create a new series:

pd.Series([j for i in s.str.split('\n') for j in i])

      



It might make more sense to do this on the input, rather than creating time series, for example:

strings = ['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.']
pd.Series([j for i in strings for j in i.split('\n')])

      

+4


source







All Articles