Removing newlines from cluttered rows in pandas dataframe cells?

I used several ways to split and delete lines in my pandas framework to remove all "\ n'characters", but for some reason it just doesn't want to remove characters attached to other words, even though I separated them. I have a pandas dataframe with a column that grabs text from web pages using Beautifulsoup. The text was already cleaned up with nice font, but it was unable to remove lines attached to other characters. My lines look something like this:

". We will explore various software technologies \ n relevant to games, including programming languages, scripting \ nlanguages, operating systems, file systems, networking, simulation \ nengines, and multimedia design systems. We will also explore some of the fundamental scientific concepts from computer science. and related areas including "

Is there a simple python way to remove these "\ n" characters?

Thanks in advance!

+7


source to share


3 answers


EDIT: The correct answer to this was:

df = df.replace(r'\\n',' ', regex=True) 

      

I think you need : replace

df = df.replace('\n','', regex=True)

      

Or:



df = df.replace('\n',' ', regex=True)

      

Or:



df = df.replace(r'\\n',' ', regex=True)

      

Sample:

text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
                                                   A
0  hands-on\ndev nologies\nrelevant scripting\nla...

df = df.replace('\n',' ', regex=True)
print (df)
                                                A
0  hands-on dev nologies relevant scripting lang 

      

+24


source


in messy data, it may be a good idea to remove all spaces df.replace(r'\s', '', regex = True, inplace = True)

.



+1


source


   df = 'Sarah Marie Wimberly So so beautiful!!!\nAbram Staten You guys look good man.\nTJ Sloan I miss you guys\n'

   df = df.replace(r'\\n',' ', regex=True)

      

This worked for the dirty data I had.

0


source







All Articles