Replacing numbers with zeros in golang

I want to replace all numbers in a string with zeros, and ideally consecutive numbers should be replaced with one zero.
abc826def47

should becomeabc0def0

I tried two methods:
Using regex:

var numbersRegExp = regexp.MustCompile("[0-9]+")
func normalizeNumbers(str string) string{
    return numbersRegExp.ReplaceAllString(str, "0")
}

      

Using strings. Replace

import s "strings"
func normalizeNumbers(str string) string{
    str = s.Replace(str, "1", "0", -1)
    str = s.Replace(str, "2", "0", -1)
    str = s.Replace(str, "3", "0", -1)
    str = s.Replace(str, "4", "0", -1)
    str = s.Replace(str, "5", "0", -1)
    str = s.Replace(str, "6", "0", -1)
    str = s.Replace(str, "7", "0", -1)
    str = s.Replace(str, "8", "0", -1)
    str = s.Replace(str, "9", "0", -1)
    str = s.Replace(str, "00", "0", -1)
    return str
}

      

The second method, which does not use a regex, seems to be a little faster, but still very slow when working with about 100k lines, and it does not replace sequential numbers. Is there a better way to do this?

+3


source to share


1 answer


The fastest solution is (always) building the output on the fly. This requires looping through the input runes once, and with the correct original output "buffer" (which is equal in this case []rune

), you can also avoid reallocation.

Here's the implementation:

func repNums(s string) string {
    out := make([]rune, len(s)) // len(s) is bytes not runes, this is just estimation

    i, added := 0, false
    for _, r := range s {
        if r >= '0' && r <= '9' {
            if added {
                continue
            }
            added, out[i] = true, '0'
        } else {
            added, out[i] = false, r
        }
        i++
    }
    return string(out[:i])
}

      

Testing:



fmt.Printf("%q\n", repNums("abc826def47")) // "abc0def0"
fmt.Printf("%q\n", repNums("1234"))        // "0"
fmt.Printf("%q\n", repNums("asdf"))        // "asdf"
fmt.Printf("%q\n", repNums(""))            // ""
fmt.Printf("%q\n", repNums("a12b34c9d"))   // "a0b0c0d"

      

Try it on the Go Playground .

Notes:

  • I have estimated the output buffer (rune count) with len(s)

    which is not the input rune number, but the byte count. This is an upper estimate, but it doesn't require any effort. You can use utf8.RuneCountInString()

    to get the exact number of runes in the input string

    if you want to (but this decodes and iterates over the runes of the input string

    , it's not really worth it).
  • I am testing numbers with a condition r >= '0' && r <= '9'

    . Alternatively, you can useunicode.IsDigit()

  • Depending on the nature of your input strings, if the frequency of the inputs where a digit is missing is high (and therefore the output is equal to the input signal), you can improve performance by first checking if there is a digit in the input, and if not, simply return the input string

    ( which is immutable).
+6


source







All Articles