Removing newlines from cluttered rows in pandas dataframe cells?

Question

Removing newlines from cluttered rows in pandas dataframe cells?

I used several ways to split and delete lines in my pandas framework to remove all "\ n'characters", but for some reason it just doesn't want to remove characters attached to other words, even though I separated them. I have a pandas dataframe with a column that grabs text from web pages using Beautifulsoup. The text was already cleaned up with nice font, but it was unable to remove lines attached to other characters. My lines look something like this:

". We will explore various software technologies \ n relevant to games, including programming languages, scripting \ nlanguages, operating systems, file systems, networking, simulation \ nengines, and multimedia design systems. We will also explore some of the fundamental scientific concepts from computer science. and related areas including "

Is there a simple python way to remove these "\ n" characters?

Thanks in advance!

+7

python string split pandas

Calvin May 28 '17 at 13:18

source to share

3 answers

in messy data, it may be a good idea to remove all spaces df.replace(r'\s', '', regex = True, inplace = True)

.

+1

Pawel piela 29 oct. 17 at 12:31

source to share

   df = 'Sarah Marie Wimberly So so beautiful!!!\nAbram Staten You guys look good man.\nTJ Sloan I miss you guys\n'

   df = df.replace(r'\\n',' ', regex=True)

This worked for the dirty data I had.

0

Harshini Kanukuntla June 12. 19 at 21:56

source to share

jezrael · Accepted Answer · 2017-05-28T13:22:25+0000

EDIT: The correct answer to this was:

df = df.replace(r'\\n',' ', regex=True)

I think you need : replace

df = df.replace('\n','', regex=True)

Or:

df = df.replace('\n',' ', regex=True)

Or:

df = df.replace(r'\\n',' ', regex=True)

Sample:

text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
                                                   A
0  hands-on\ndev nologies\nrelevant scripting\nla...

df = df.replace('\n',' ', regex=True)
print (df)
                                                A
0  hands-on dev nologies relevant scripting lang

Removing newlines from cluttered rows in pandas dataframe cells?

More articles: