Remove full url from text file with unix awk / sed / grep
I have a text file in the form of tweets and I am having problems removing the full url. Sample text file:
index.html
:
this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq
I would like to create a new file that only has:
this is a tweet that has info.
this is a tweet that has an image.
I am working with grep now and I have
grep -oP "http://\K[^']+" final.txt
Thank!
+3
Michael Vieth
source
to share
2 answers
sed 's/http[^ ]*//g' YourFile
[^] * catches all characters that are not empty
+1
josifoski
source
to share
Depends on how restrictive you want.
Full URLs starting with HTTP and delimited:
sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html
Anything with a delimited dot around:
sed -e 's|\b[^ ]*\.[^ ]*\b||g' test.html
+1
nullman
source
to share