Variable char encoding of bytes

23 07 2010

I may not be looking in the right place, but often I want to take a byte[] and convert it into a char[] where the char representation comes from a set of chars that I decide on. This is not for false encryption or obfuscation, I just want a safe compact representation of keys. Hex is ok, but bulky and does no give much flexibility. I couldn’t find anything off the shelf that would work with random encoding sequences and random byte[] lengths so here is what I quickly put together. Published for any one who also couldn’t find anything off the shelf. It benchmarks at about 110K encodings per second (9ns) on a Mac Book Pro Java16 and is reasonable memory efficient. It could be made more efficient, but that would require more cpu.

    /**
   * Generate an encoded array of chars using as few chars as possible
   *
   * @param hash
   *          the hash to encode
   * @param encode
   *          a char array of encodings any length you lik but probably but the shorter it
   *          is the longer the result. Dont be dumb and use an encoding size of < 2.
   * @return
   */
  public static String encode(byte[] hash, char[] encode) {
    StringBuilder sb = new StringBuilder(hash.length);
    int x = (int)(hash[0]+128);
    int xt = 0;
    int i = 0;
    while(i < hash.length) {
      if ( x < encode.length) {
        i++;
        if ( i < hash.length ) {
          if ( x == 0 ) {
            x = (int)(hash[i]+128);
          } else {
            x = (x+1)*(int)(hash[i]+128);
          }
        } else {
          sb.append(encode[x]);
          break;
        }
      }
      xt = x%encode.length;
      x = x/encode.length;
      sb.append(encode[xt]);
    }
    return sb.toString();
  }

And a unit test that checks the operation, and checks for collisions (AFAICT the algorithm is complete, but I could have missed something).

  @Test
  public void testEncoding() {
    SecureRandom sr = new SecureRandom();
    String encoding = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
    for (int j = 10; j < encoding.length(); j++) {
      for (int i = 1; i < 100; i++) {
        byte[] b = new byte[i];
        sr.nextBytes(b);
        System.err.println(StringUtils.encode(b, encoding.substring(0, j).toCharArray()));
      }
    }
  }

  /**
   *  This is a very long running test, do not enable unless you want to wait a long time.
   */
  @Test
  public void testEncodingCollision() {
    SecureRandom sr = new SecureRandom();
    String encoding = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    long l = 200000;
    long m = 1000;
    Set<String> check = new HashSet<String>((int)l);
    for (long j = 0; j < m; j++) {
      long s = System.currentTimeMillis();
      check.clear();
      for (long i = 0; i < l; i++) {
        byte[] b = new byte[20];
        sr.nextBytes(b);

        String e = StringUtils.encode(b, encoding.toCharArray());

        if (check.contains(e)) {
          Assert.fail();
        }
        check.add(e);
      }
      long e = System.currentTimeMillis();
      long t = (1000*(e-s+1))/l;
      long o = (1000*l)/(e-s+1);
      long tleft = (t*l*(m-j))/1000000;
      System.err.println("No Collisions after "+l*j+" operations of "+l*m+" "+((100*l*j)/(l*m))+" % at "+t+" ns/op "+o+" ops/s eta "+tleft);
    }
  }

If you want to use this, feel free and are worried about license consider the code Apache 2 Licensed (c) Ian Boston. If you do use this and find a problem or if you know of something out there in a standard lib that does the same, please comment.

Advertisements

Actions

Information

3 responses

23 07 2010
Ian

The encoded length appears to converge on about 110% of the byte length for encoding keys of > 50. Obviously for small hash sizes and key lengths (eg 10) its closer to 200%

23 07 2010
Odi

The initial StringBuilder size could be initialized a bit smarter such that no growing (allocation, copy) will be necessary.

23 07 2010
Ian

good point, looking at the runs 1.2x the byte length should cover most cases. Thanks




%d bloggers like this: