Google safe browsing api url encoding (canonicalization)

In my application, I validate user-entered URLs for malware by submitting them to google.

To check if I was getting a "malware detected" reaction I used the URL http: //malware.testing.google.test/testing/malware

To my surprise, this URL has not been flagged as malware

In conversation that I found that when I enter a trailing slash, it hits as malware.

The documentation says the URL must be canonicalized.

Do any of you know about meeting this requirement? (preferably in C #)

+3


source to share


2 answers


Using ForguesR link provided that I created this C # implementation.

It passes 26 out of 33 tests from the google test suite found at: https://developers.google.com/safe-browsing/developers_guide_v3#Canonicalization



This was considered good enough for production as it does not capture more catchy web pages.

Code: https://dotnetfiddle.net/xO9sWl

+2


source


I am working on the same problem right now and the only one I found is the Java implementation in the jGoogleSafeBrowsing library . Unfortunately it is tied to the v2 API.

Anyway, you can look at the canonicalization code here . Please be aware that:



+3


source







All Articles