Removing newlines from cluttered rows in pandas dataframe cells?
I used several ways to split and delete lines in my pandas framework to remove all "\ n'characters", but for some reason it just doesn't want to remove characters attached to other words, even though I separated them. I have a pandas dataframe with a column that grabs text from web pages using Beautifulsoup. The text was already cleaned up with nice font, but it was unable to remove lines attached to other characters. My lines look something like this:
". We will explore various software technologies \ n relevant to games, including programming languages, scripting \ nlanguages, operating systems, file systems, networking, simulation \ nengines, and multimedia design systems. We will also explore some of the fundamental scientific concepts from computer science. and related areas including "
Is there a simple python way to remove these "\ n" characters?
Thanks in advance!
source to share
EDIT: The correct answer to this was:
df = df.replace(r'\\n',' ', regex=True)
I think you need : replace
df = df.replace('\n','', regex=True)
Or:
df = df.replace('\n',' ', regex=True)
Or:
df = df.replace(r'\\n',' ', regex=True)
Sample:
text = '''hands-on\ndev nologies\nrelevant scripting\nlang
'''
df = pd.DataFrame({'A':[text]})
print (df)
A
0 hands-on\ndev nologies\nrelevant scripting\nla...
df = df.replace('\n',' ', regex=True)
print (df)
A
0 hands-on dev nologies relevant scripting lang
source to share