How to remove Arabic hashtags?
I read tweets from Twitter using Twitter4j and I am trying to filter hashtags from it after I take text from it Now I turn it into strings I have this line: "892698363371638784: RT @hikids_ksa: ุงููุนุจุฉ ุฎุทูุฑุฉ ู ุฑุง ููุจู ููุง ู ุฑุง ููุจู ููุง ู ุฑุง ููุจู ููุง ู ุฑุง ููุจู ููุง ู ู ุชูููุฑ ู ู ูุงุฑุฉ๐๐ป๐ก ู ุชููุฑุฉ ูู # ู ุชุฌุฑ_ูุงู_ููุฏุฒ_ุงูุงููุชุฑููู .. "
I want to remove ู ุชุฌุฑ_ูุงู_ููุฏุฒ_ุงูุงููุชุฑููู as it has a Hashtag after it using java
the problem of my code didn't work on this input: "@kaskasomar ููุฏุง ุจูุง ู ุฎ ู ุชู ู ุชู ุบูุฑู ุจูุฎูู ุงูุดุนุจ ุงููุจูุงูู ูุจูุชูู ู ุจุงูุงุฑูุงุจ ุจุณ ูุงูุฑุฃูู ุจููุงุช
the ุณุฎูู part was not removed for some reason this is my method
static String removeHashtags(String in)
{
in = in.replaceAll("#[A-Za-z]+","");//remove English hashtags
in = in.replaceAll("[ุฃ-ู]#+","");//remove Arabic hashtags that have # before it
return in = in.replaceAll("#[ุฃ-ู]+","");//remove Arabic hashtags that have # after it
}
source to share
If you are just trying to remove all hash tags in any language, you can write
in = in.replaceAll("#\\p{IsAlphabetic}+", "");
If you specifically want to remove Arabic hash tags, you can write
in = in.replaceAll("#\\p{IsArabic}+", "");
so you don't have to worry about creating a left and right and right to left regex. This improves the readability of your code.
source to share