Remove full url from text file with unix awk / sed / grep

I have a text file in the form of tweets and I am having problems removing the full url. Sample text file:

index.html

:

this is a tweet that has info. http://google.com
this is a tweet that has an image. pic.twitter.com/a2y4H1b2Jq

      

I would like to create a new file that only has:

this is a tweet that has info.
this is a tweet that has an image.

      

I am working with grep now and I have

grep -oP "http://\K[^']+" final.txt

      

Thank!

+3


source to share


2 answers


sed 's/http[^ ]*//g' YourFile  

      



[^] * catches all characters that are not empty

+1


source


Depends on how restrictive you want.

Full URLs starting with HTTP and delimited:

sed -e 's|\bhttp[^ ]*\.[^ ]*\b||g' test.html

      



Anything with a delimited dot around:

sed -e 's|\b[^ ]*\.[^ ]*\b||g' test.html

      

+1


source







All Articles