Valid character range for base 64 encoding

I am interested in the following:
Is there a list of characters that will never be , as part of the underlying encoded string?

For example *

. I'm not sure if this will happen or not. If the original input actually had *

, as part of it would be encoded differently?

+30


source to share


4 answers


Here's what I can find: RFC 4648

It includes this handy table:

                  Table 1: The Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 +
    12 M            29 d            46 u            63 /
    13 N            30 e            47 v
    14 O            31 f            48 w         (pad) =
    15 P            32 g            49 x
    16 Q            33 h            50 y

      



Thus, a regex that matches any character that should never appear in Base 64 encodings would be:

[^A-Za-z0-9+/=]

      

However, as the kappa responses indicate, this is only a recommendation. Specific implementations may choose a different set of 64 characters. (In fact, even the associated RFC contains an alternate table for secure URL and file encoding that replaces characters 62 and 63 with -

and, _

respectively.) So I think it really depends on the implementation that created the encoding.

+64


source


You can probably be safe with the other answers in most situations, but according to the Wikipedia article on Base64, there shouldn't be a specific list you can rely on:

The exact choice of character set chosen for the 64 characters required for the base is implementation dependent.



RFC 4648 mentions other alphabets such as "Level 64 Database Identifier and Filename" where +

and /

are replaced with -

and _

.

Here's a table of Base64 variants that use different characters. Keep in mind that there are specific rules for implementing line separators that you can find in a single table. Some implementations, like Mime , even allow (and ignore) characters that are not in the alphabet.

+12


source


Base64 only contains A–Z

, A–Z

, 0–9

, +

, /

and =

. So the list of characters that will not be used is all possible characters minus the ones above.

For special purposes, .

and are also possible _

.

+7


source


https://en.wikipedia.org/wiki/Base64#Design

MIME Base64 implementation uses AZ, az and 0-9 for the first 62 values

So, for the most part, you should only expect alphanumeric characters. The sample table in this article also displays "+" and "-"; you are unlikely to see '*'.

You can use http://www.motobit.com/util/base64-decoder-encoder.asp to convert to Base64 for example and for '*' this returns "Kg =="

+2


source







All Articles