Variable char encoding of bytes

23 07 2010

I may not be looking in the right place, but often I want to take a byte[] and convert it into a char[] where the char representation comes from a set of chars that I decide on. This is not for false encryption or obfuscation, I just want a safe compact representation of keys. Hex is ok, but bulky and does no give much flexibility. I couldn’t find anything off the shelf that would work with random encoding sequences and random byte[] lengths so here is what I quickly put together. Published for any one who also couldn’t find anything off the shelf. It benchmarks at about 110K encodings per second (9ns) on a Mac Book Pro Java16 and is reasonable memory efficient. It could be made more efficient, but that would require more cpu.

    /**
   * Generate an encoded array of chars using as few chars as possible
   *
   * @param hash
   *          the hash to encode
   * @param encode
   *          a char array of encodings any length you lik but probably but the shorter it
   *          is the longer the result. Dont be dumb and use an encoding size of < 2.
   * @return
   */
  public static String encode(byte[] hash, char[] encode) {
    StringBuilder sb = new StringBuilder(hash.length);
    int x = (int)(hash[0]+128);
    int xt = 0;
    int i = 0;
    while(i < hash.length) {
      if ( x < encode.length) {
        i++;
        if ( i < hash.length ) {
          if ( x == 0 ) {
            x = (int)(hash[i]+128);
          } else {
            x = (x+1)*(int)(hash[i]+128);
          }
        } else {
          sb.append(encode[x]);
          break;
        }
      }
      xt = x%encode.length;
      x = x/encode.length;
      sb.append(encode[xt]);
    }
    return sb.toString();
  }

And a unit test that checks the operation, and checks for collisions (AFAICT the algorithm is complete, but I could have missed something).

  @Test
  public void testEncoding() {
    SecureRandom sr = new SecureRandom();
    String encoding = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
    for (int j = 10; j < encoding.length(); j++) {
      for (int i = 1; i < 100; i++) {
        byte[] b = new byte[i];
        sr.nextBytes(b);
        System.err.println(StringUtils.encode(b, encoding.substring(0, j).toCharArray()));
      }
    }
  }

  /**
   *  This is a very long running test, do not enable unless you want to wait a long time.
   */
  @Test
  public void testEncodingCollision() {
    SecureRandom sr = new SecureRandom();
    String encoding = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    long l = 200000;
    long m = 1000;
    Set<String> check = new HashSet<String>((int)l);
    for (long j = 0; j < m; j++) {
      long s = System.currentTimeMillis();
      check.clear();
      for (long i = 0; i < l; i++) {
        byte[] b = new byte[20];
        sr.nextBytes(b);

        String e = StringUtils.encode(b, encoding.toCharArray());

        if (check.contains(e)) {
          Assert.fail();
        }
        check.add(e);
      }
      long e = System.currentTimeMillis();
      long t = (1000*(e-s+1))/l;
      long o = (1000*l)/(e-s+1);
      long tleft = (t*l*(m-j))/1000000;
      System.err.println("No Collisions after "+l*j+" operations of "+l*m+" "+((100*l*j)/(l*m))+" % at "+t+" ns/op "+o+" ops/s eta "+tleft);
    }
  }

If you want to use this, feel free and are worried about license consider the code Apache 2 Licensed (c) Ian Boston. If you do use this and find a problem or if you know of something out there in a standard lib that does the same, please comment.

About these ads

Actions

Information

3 responses

23 07 2010
Ian

The encoded length appears to converge on about 110% of the byte length for encoding keys of > 50. Obviously for small hash sizes and key lengths (eg 10) its closer to 200%

23 07 2010
Odi

The initial StringBuilder size could be initialized a bit smarter such that no growing (allocation, copy) will be necessary.

23 07 2010
Ian

good point, looking at the runs 1.2x the byte length should cover most cases. Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 103 other followers

%d bloggers like this: