Split pandas Series of lines containing multiline strings into separate lines
I have a pandas series filled with lines like this:
In:
s = pd.Series(['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.'])
Out:
0 This is a single line.
1 This is another one.
2 This is a string\nwith more than one line.
dtype: object
How can I split all lines in this Series that contain the linebreak character \n
into their own lines? I would expect:
0 This is a single line.
1 This is another one.
2 This is a string
3 with more than one line.
dtype: object
I know that I can split each line with a linebreak character with
s = s.str.split('\n')
which gives
0 [This is a single line.]
1 [This is another one.]
2 [This is a string, with more than one line.]
but this only splits the line within the line, not into its own lines for each token.
+3
source to share
1 answer
You can loop through each line in each line to create a new series:
pd.Series([j for i in s.str.split('\n') for j in i])
It might make more sense to do this on the input, rather than creating time series, for example:
strings = ['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.']
pd.Series([j for i in strings for j in i.split('\n')])
+4
source to share