0001: /*
0002: * Copyright 2001-2005 Sun Microsystems, Inc. All Rights Reserved.
0003: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
0004: *
0005: * This code is free software; you can redistribute it and/or modify it
0006: * under the terms of the GNU General Public License version 2 only, as
0007: * published by the Free Software Foundation. Sun designates this
0008: * particular file as subject to the "Classpath" exception as provided
0009: * by Sun in the LICENSE file that accompanied this code.
0010: *
0011: * This code is distributed in the hope that it will be useful, but WITHOUT
0012: * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
0013: * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
0014: * version 2 for more details (a copy is included in the LICENSE file that
0015: * accompanied this code).
0016: *
0017: * You should have received a copy of the GNU General Public License version
0018: * 2 along with this work; if not, write to the Free Software Foundation,
0019: * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
0020: *
0021: * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
0022: * CA 95054 USA or visit www.sun.com if you need additional information or
0023: * have any questions.
0024: */
0025:
0026: package com.sun.java.util.jar.pack;
0027:
0028: import java.io.*;
0029: import java.util.*;
0030:
0031: /**
0032: * Define the conversions between sequences of small integers and raw bytes.
0033: * This is a schema of encodings which incorporates varying lengths,
0034: * varying degrees of length variability, and varying amounts of signed-ness.
0035: * @author John Rose
0036: * @version 1.21, 05/05/07
0037: */
0038: class Coding implements Constants, Comparable, CodingMethod,
0039: Histogram.BitMetric {
0040: /*
0041: Coding schema for single integers, parameterized by (B,H,S):
0042:
0043: Let B in [1,5], H in [1,256], S in [0,3].
0044: (S limit is arbitrary. B follows the 32-bit limit. H is byte size.)
0045:
0046: A given (B,H,S) code varies in length from 1 to B bytes.
0047:
0048: The 256 values a byte may take on are divided into L=(256-H) and H
0049: values, with all the H values larger than the L values.
0050: (That is, the L values are [0,L) and the H are [L,256).)
0051:
0052: The last byte is always either the B-th byte, a byte with "L value"
0053: (<L), or both. There is no other byte that satisfies these conditions.
0054: All bytes before the last always have "H values" (>=L).
0055:
0056: Therefore, if L==0, the code always has the full length of B bytes.
0057: The coding then becomes a classic B-byte little-endian unsigned integer.
0058: (Also, if L==128, the high bit of each byte acts signals the presence
0059: of a following byte, up to the maximum length.)
0060:
0061: In the unsigned case (S==0), the coding is compact and monotonic
0062: in the ordering of byte sequences defined by appending zero bytes
0063: to pad them to a common length B, reversing them, and ordering them
0064: lexicographically. (This agrees with "little-endian" byte order.)
0065:
0066: Therefore, the unsigned value of a byte sequence may be defined as:
0067: <pre>
0068: U(b0) == b0
0069: in [0..L)
0070: or [0..256) if B==1 (**)
0071:
0072: U(b0,b1) == b0 + b1*H
0073: in [L..L*(1+H))
0074: or [L..L*(1+H) + H^2) if B==2
0075:
0076: U(b0,b1,b2) == b0 + b1*H + b2*H^2
0077: in [L*(1+H)..L*(1+H+H^2))
0078: or [L*(1+H)..L*(1+H+H^2) + H^3) if B==3
0079:
0080: U(b[i]: i<n) == Sum[i<n]( b[i] * H^i )
0081: up to L*Sum[i<n]( H^i )
0082: or to L*Sum[i<n]( H^i ) + H^n if n==B
0083: </pre>
0084:
0085: (**) If B==1, the values H,L play no role in the coding.
0086: As a convention, we require that any (1,H,S) code must always
0087: encode values less than H. Thus, a simple unsigned byte is coded
0088: specifically by the code (1,256,0).
0089:
0090: (Properly speaking, the unsigned case should be parameterized as
0091: S==Infinity. If the schema were regular, the case S==0 would really
0092: denote a numbering in which all coded values are negative.)
0093:
0094: If S>0, the unsigned value of a byte sequence is regarded as a binary
0095: integer. If any of the S low-order bits are zero, the corresponding
0096: signed value will be non-negative. If all of the S low-order bits
0097: (S>0) are one, the the corresponding signed value will be negative.
0098:
0099: The non-negative signed values are compact and monotonically increasing
0100: (from 0) in the ordering of the corresponding unsigned values.
0101:
0102: The negative signed values are compact and monotonically decreasing
0103: (from -1) in the ordering of the corresponding unsigned values.
0104:
0105: In essence, the low-order S bits function as a collective sign bit
0106: for negative signed numbers, and as a low-order base-(2^S-1) digit
0107: for non-negative signed numbers.
0108:
0109: Therefore, the signed value corresponding to an unsigned value is:
0110: <pre>
0111: Sgn(x) == x if S==0
0112: Sgn(x) == (x / 2^S)*(2^S-1) + (x % 2^S), if S>0, (x % 2^S) < 2^S-1
0113: Sgn(x) == -(x / 2^S)-1, if S>0, (x % 2^S) == 2^S-1
0114: </pre>
0115:
0116: Finally, the value of a byte sequence, given the coding parameters
0117: (B,H,S), is defined as:
0118: <pre>
0119: V(b[i]: i<n) == Sgn(U(b[i]: i<n))
0120: </pre>
0121:
0122: The extremal positive and negative signed value for a given range
0123: of unsigned values may be found by sign-encoding the largest unsigned
0124: value which is not 2^S-1 mod 2^S, and that which is, respectively.
0125:
0126: Because B,H,S are variable, this is not a single coding but a schema
0127: of codings. For optimal compression, it is necessary to adaptively
0128: select specific codings to the data being compressed.
0129:
0130: For example, if a sequence of values happens never to be negative,
0131: S==0 is the best choice. If the values are equally balanced between
0132: negative and positive, S==1. If negative values are rare, then S>1
0133: is more appropriate.
0134:
0135: A (B,H,S) encoding is called a "subrange" if it does not encode
0136: the largest 32-bit value, and if the number R of values it does
0137: encode can be expressed as a positive 32-bit value. (Note that
0138: B=1 implies R<=256, B=2 implies R<=65536, etc.)
0139:
0140: A delta version of a given (B,H,S) coding encodes an array of integers
0141: by writing their successive differences in the (B,H,S) coding.
0142: The original integers themselves may be recovered by making a
0143: running accumulation of sum of the differences as they are read.
0144:
0145: As a special case, if a (B,H,S) encoding is a subrange, its delta
0146: version will only encode arrays of numbers in the coding's unsigned
0147: range, [0..R-1]. The coding of deltas is still in the normal signed
0148: range, if S!=0. During delta encoding, all subtraction results are
0149: reduced to the signed range, by adding multiples of R. Likewise,
0150: . during encoding, all addition results are reduced to the unsigned range.
0151: This special case for subranges allows the benefits of wraparound
0152: when encoding correlated sequences of very small positive numbers.
0153: */
0154:
0155: // Code-specific limits:
0156: private static int saturate32(long x) {
0157: if (x > Integer.MAX_VALUE)
0158: return Integer.MAX_VALUE;
0159: if (x < Integer.MIN_VALUE)
0160: return Integer.MIN_VALUE;
0161: return (int) x;
0162: }
0163:
0164: private static long codeRangeLong(int B, int H) {
0165: return codeRangeLong(B, H, B);
0166: }
0167:
0168: private static long codeRangeLong(int B, int H, int nMax) {
0169: // Code range for a all (B,H) codes of length <=nMax (<=B).
0170: // n < B: L*Sum[i<n]( H^i )
0171: // n == B: L*Sum[i<B]( H^i ) + H^B
0172: assert (nMax >= 0 && nMax <= B);
0173: assert (B >= 1 && B <= 5);
0174: assert (H >= 1 && H <= 256);
0175: if (nMax == 0)
0176: return 0; // no codes of zero length
0177: if (B == 1)
0178: return H; // special case; see (**) above
0179: int L = 256 - H;
0180: long sum = 0;
0181: long H_i = 1;
0182: for (int n = 1; n <= nMax; n++) {
0183: sum += H_i;
0184: H_i *= H;
0185: }
0186: sum *= L;
0187: if (nMax == B)
0188: sum += H_i;
0189: return sum;
0190: }
0191:
0192: /** Largest int representable by (B,H,S) in up to nMax bytes. */
0193: public static int codeMax(int B, int H, int S, int nMax) {
0194: //assert(S >= 0 && S <= S_MAX);
0195: long range = codeRangeLong(B, H, nMax);
0196: if (range == 0)
0197: return -1; // degenerate max value for empty set of codes
0198: if (S == 0 || range >= (long) 1 << 32)
0199: return saturate32(range - 1);
0200: long maxPos = range - 1;
0201: while (isNegativeCode(maxPos, S))
0202: --maxPos;
0203: if (maxPos < 0)
0204: return -1; // No positive codings at all.
0205: int smax = decodeSign32(maxPos, S);
0206: // check for 32-bit wraparound:
0207: if (smax < 0)
0208: return Integer.MAX_VALUE;
0209: return smax;
0210: }
0211:
0212: /** Smallest int representable by (B,H,S) in up to nMax bytes.
0213: Returns Integer.MIN_VALUE if 32-bit wraparound covers
0214: the entire negative range.
0215: */
0216: public static int codeMin(int B, int H, int S, int nMax) {
0217: //assert(S >= 0 && S <= S_MAX);
0218: long range = codeRangeLong(B, H, nMax);
0219: if (range >= (long) 1 << 32 && nMax == B) {
0220: // Can code negative values via 32-bit wraparound.
0221: return Integer.MIN_VALUE;
0222: }
0223: if (S == 0) {
0224: return 0;
0225: }
0226: int Smask = (1 << S) - 1;
0227: long maxNeg = range - 1;
0228: while (!isNegativeCode(maxNeg, S))
0229: --maxNeg;
0230: if (maxNeg < 0)
0231: return 0; // No negative codings at all.
0232: return decodeSign32(maxNeg, S);
0233: }
0234:
0235: // Some of the arithmetic below is on unsigned 32-bit integers.
0236: // These must be represented in Java as longs in the range [0..2^32-1].
0237: // The conversion to a signed int is just the Java cast (int), but
0238: // the conversion to an unsigned int is the following little method:
0239: private static long toUnsigned32(int sx) {
0240: return ((long) sx << 32) >>> 32;
0241: }
0242:
0243: // Sign encoding:
0244: private static boolean isNegativeCode(long ux, int S) {
0245: assert (S > 0);
0246: assert (ux >= -1); // can be out of 32-bit range; who cares
0247: int Smask = (1 << S) - 1;
0248: return (((int) ux + 1) & Smask) == 0;
0249: }
0250:
0251: private static boolean hasNegativeCode(int sx, int S) {
0252: assert (S > 0);
0253: // If S>=2 very low negatives are coded by 32-bit-wrapped positives.
0254: // The lowest negative representable by a negative coding is
0255: // ~(umax32 >> S), and the next lower number is coded by wrapping
0256: // the highest positive:
0257: // CodePos(umax32-1) -> (umax32-1)-((umax32-1)>>S)
0258: // which simplifies to ~(umax32 >> S)-1.
0259: return (0 > sx) && (sx >= ~(-1 >>> S));
0260: }
0261:
0262: private static int decodeSign32(long ux, int S) {
0263: assert (ux == toUnsigned32((int) ux)) // must be unsigned 32-bit number
0264: : (Long.toHexString(ux));
0265: if (S == 0) {
0266: return (int) ux; // cast to signed int
0267: }
0268: int sx;
0269: if (isNegativeCode(ux, S)) {
0270: // Sgn(x) == -(x / 2^S)-1
0271: sx = ~((int) ux >>> S);
0272: } else {
0273: // Sgn(x) == (x / 2^S)*(2^S-1) + (x % 2^S)
0274: sx = (int) ux - ((int) ux >>> S);
0275: }
0276: // Assert special case of S==1:
0277: assert (!(S == 1) || sx == (((int) ux >>> 1) ^ -((int) ux & 1)));
0278: return sx;
0279: }
0280:
0281: private static long encodeSign32(int sx, int S) {
0282: if (S == 0) {
0283: return toUnsigned32(sx); // unsigned 32-bit int
0284: }
0285: int Smask = (1 << S) - 1;
0286: long ux;
0287: if (!hasNegativeCode(sx, S)) {
0288: // InvSgn(sx) = (sx / (2^S-1))*2^S + (sx % (2^S-1))
0289: ux = sx + (toUnsigned32(sx) / Smask);
0290: } else {
0291: // InvSgn(sx) = (-sx-1)*2^S + (2^S-1)
0292: ux = (-sx << S) - 1;
0293: }
0294: ux = toUnsigned32((int) ux);
0295: assert (sx == decodeSign32(ux, S)) : (Long.toHexString(ux)
0296: + " -> " + Integer.toHexString(sx) + " != " + Integer
0297: .toHexString(decodeSign32(ux, S)));
0298: return ux;
0299: }
0300:
0301: // Top-level coding of single integers:
0302: public static void writeInt(byte[] out, int[] outpos, int sx,
0303: int B, int H, int S) {
0304: long ux = encodeSign32(sx, S);
0305: assert (ux == toUnsigned32((int) ux));
0306: assert (ux < codeRangeLong(B, H)) : Long.toHexString(ux);
0307: int L = 256 - H;
0308: long sum = ux;
0309: int pos = outpos[0];
0310: for (int i = 0; i < B - 1; i++) {
0311: if (sum < L)
0312: break;
0313: sum -= L;
0314: int b_i = (int) (L + (sum % H));
0315: sum /= H;
0316: out[pos++] = (byte) b_i;
0317: }
0318: out[pos++] = (byte) sum;
0319: // Report number of bytes written by updating outpos[0]:
0320: outpos[0] = pos;
0321: // Check right away for mis-coding.
0322: //assert(sx == readInt(out, new int[1], B, H, S));
0323: }
0324:
0325: public static int readInt(byte[] in, int[] inpos, int B, int H,
0326: int S) {
0327: // U(b[i]: i<n) == Sum[i<n]( b[i] * H^i )
0328: int L = 256 - H;
0329: long sum = 0;
0330: long H_i = 1;
0331: int pos = inpos[0];
0332: for (int i = 0; i < B; i++) {
0333: int b_i = in[pos++] & 0xFF;
0334: sum += b_i * H_i;
0335: H_i *= H;
0336: if (b_i < L)
0337: break;
0338: }
0339: //assert(sum >= 0 && sum < codeRangeLong(B, H));
0340: // Report number of bytes read by updating inpos[0]:
0341: inpos[0] = pos;
0342: return decodeSign32(sum, S);
0343: }
0344:
0345: // The Stream version doesn't fetch a byte unless it is needed for coding.
0346: public static int readIntFrom(InputStream in, int B, int H, int S)
0347: throws IOException {
0348: // U(b[i]: i<n) == Sum[i<n]( b[i] * H^i )
0349: int L = 256 - H;
0350: long sum = 0;
0351: long H_i = 1;
0352: for (int i = 0; i < B; i++) {
0353: int b_i = in.read();
0354: if (b_i < 0)
0355: throw new RuntimeException("unexpected EOF");
0356: sum += b_i * H_i;
0357: H_i *= H;
0358: if (b_i < L)
0359: break;
0360: }
0361: assert (sum >= 0 && sum < codeRangeLong(B, H));
0362: return decodeSign32(sum, S);
0363: }
0364:
0365: public static final int B_MAX = 5; /* B: [1,5] */
0366: public static final int H_MAX = 256; /* H: [1,256] */
0367: public static final int S_MAX = 2; /* S: [0,2] */
0368:
0369: // END OF STATICS.
0370:
0371: private final int B; /*1..5*/// # bytes (1..5)
0372: private final int H; /*1..256*/// # codes requiring a higher byte
0373: private final int L; /*0..255*/// # codes requiring a higher byte
0374: private final int S; /*0..3*/// # low-order bits representing sign
0375: private final int del; /*0..2*/// type of delta encoding (0 == none)
0376: private final int min; // smallest representable value
0377: private final int max; // largest representable value
0378: private final int umin; // smallest representable uns. value
0379: private final int umax; // largest representable uns. value
0380: private final int[] byteMin; // smallest repr. value, given # bytes
0381: private final int[] byteMax; // largest repr. value, given # bytes
0382:
0383: private Coding(int B, int H, int S) {
0384: this (B, H, S, 0);
0385: }
0386:
0387: private Coding(int B, int H, int S, int del) {
0388: this .B = B;
0389: this .H = H;
0390: this .L = 256 - H;
0391: this .S = S;
0392: this .del = del;
0393: this .min = codeMin(B, H, S, B);
0394: this .max = codeMax(B, H, S, B);
0395: this .umin = codeMin(B, H, 0, B);
0396: this .umax = codeMax(B, H, 0, B);
0397: this .byteMin = new int[B];
0398: this .byteMax = new int[B];
0399:
0400: for (int nMax = 1; nMax <= B; nMax++) {
0401: byteMin[nMax - 1] = codeMin(B, H, S, nMax);
0402: byteMax[nMax - 1] = codeMax(B, H, S, nMax);
0403: }
0404: }
0405:
0406: public boolean equals(Object x) {
0407: if (!(x instanceof Coding))
0408: return false;
0409: Coding that = (Coding) x;
0410: if (this .B != that.B)
0411: return false;
0412: if (this .H != that.H)
0413: return false;
0414: if (this .S != that.S)
0415: return false;
0416: if (this .del != that.del)
0417: return false;
0418: return true;
0419: }
0420:
0421: public int hashCode() {
0422: return (del << 14) + (S << 11) + (B << 8) + (H << 0);
0423: }
0424:
0425: private static HashMap codeMap;
0426:
0427: private static synchronized Coding of(int B, int H, int S, int del) {
0428: if (codeMap == null)
0429: codeMap = new HashMap();
0430: Coding x0 = new Coding(B, H, S, del);
0431: Coding x1 = (Coding) codeMap.get(x0);
0432: if (x1 == null)
0433: codeMap.put(x0, x1 = x0);
0434: return x1;
0435: }
0436:
0437: public static Coding of(int B, int H) {
0438: return of(B, H, 0, 0);
0439: }
0440:
0441: public static Coding of(int B, int H, int S) {
0442: return of(B, H, S, 0);
0443: }
0444:
0445: public boolean canRepresentValue(int x) {
0446: if (isSubrange())
0447: return canRepresentUnsigned(x);
0448: else
0449: return canRepresentSigned(x);
0450: }
0451:
0452: /** Can this coding represent a single value, possibly a delta?
0453: * This ignores the D property. That is, for delta codings,
0454: * this tests whether a delta value of 'x' can be coded.
0455: * For signed delta codings which produce unsigned end values,
0456: * use canRepresentUnsigned.
0457: */
0458: public boolean canRepresentSigned(int x) {
0459: return (x >= min && x <= max);
0460: }
0461:
0462: /** Can this coding, apart from its S property,
0463: * represent a single value? (Negative values
0464: * can only be represented via 32-bit overflow,
0465: * so this returns true for negative values
0466: * if isFullRange is true.)
0467: */
0468: public boolean canRepresentUnsigned(int x) {
0469: return (x >= umin && x <= umax);
0470: }
0471:
0472: // object-oriented code/decode
0473: public int readFrom(byte[] in, int[] inpos) {
0474: return readInt(in, inpos, B, H, S);
0475: }
0476:
0477: public void writeTo(byte[] out, int[] outpos, int x) {
0478: writeInt(out, outpos, x, B, H, S);
0479: }
0480:
0481: // Stream versions
0482: public int readFrom(InputStream in) throws IOException {
0483: return readIntFrom(in, B, H, S);
0484: }
0485:
0486: public void writeTo(OutputStream out, int x) throws IOException {
0487: byte[] buf = new byte[B];
0488: int[] pos = new int[1];
0489: writeInt(buf, pos, x, B, H, S);
0490: out.write(buf, 0, pos[0]);
0491: }
0492:
0493: // Stream/array versions
0494: public void readArrayFrom(InputStream in, int[] a, int start,
0495: int end) throws IOException {
0496: // %%% use byte[] buffer
0497: for (int i = start; i < end; i++)
0498: a[i] = readFrom(in);
0499: for (int dstep = 0; dstep < del; dstep++) {
0500: long state = 0;
0501: for (int i = start; i < end; i++) {
0502: state += a[i];
0503: // Reduce array values to the required range.
0504: if (isSubrange()) {
0505: state = reduceToUnsignedRange(state);
0506: }
0507: a[i] = (int) state;
0508: }
0509: }
0510: }
0511:
0512: public void writeArrayTo(OutputStream out, int[] a, int start,
0513: int end) throws IOException {
0514: if (end <= start)
0515: return;
0516: for (int dstep = 0; dstep < del; dstep++) {
0517: int[] deltas;
0518: if (!isSubrange())
0519: deltas = makeDeltas(a, start, end, 0, 0);
0520: else
0521: deltas = makeDeltas(a, start, end, min, max);
0522: a = deltas;
0523: start = 0;
0524: end = deltas.length;
0525: }
0526: // The following code is a buffered version of this loop:
0527: // for (int i = start; i < end; i++)
0528: // writeTo(out, a[i]);
0529: byte[] buf = new byte[1 << 8];
0530: final int bufmax = buf.length - B;
0531: int[] pos = { 0 };
0532: for (int i = start; i < end;) {
0533: while (pos[0] <= bufmax) {
0534: writeTo(buf, pos, a[i++]);
0535: if (i >= end)
0536: break;
0537: }
0538: out.write(buf, 0, pos[0]);
0539: pos[0] = 0;
0540: }
0541: }
0542:
0543: /** Tell if the range of this coding (number of distinct
0544: * representable values) can be expressed in 32 bits.
0545: */
0546: boolean isSubrange() {
0547: return max < Integer.MAX_VALUE
0548: && ((long) max - (long) min + 1) <= Integer.MAX_VALUE;
0549: }
0550:
0551: /** Tell if this coding can represent all 32-bit values.
0552: * Note: Some codings, such as unsigned ones, can be neither
0553: * subranges nor full-range codings.
0554: */
0555: boolean isFullRange() {
0556: return max == Integer.MAX_VALUE && min == Integer.MIN_VALUE;
0557: }
0558:
0559: /** Return the number of values this coding (a subrange) can represent. */
0560: int getRange() {
0561: assert (isSubrange());
0562: return (max - min) + 1; // range includes both min & max
0563: }
0564:
0565: Coding setB(int B) {
0566: return Coding.of(B, H, S, del);
0567: }
0568:
0569: Coding setH(int H) {
0570: return Coding.of(B, H, S, del);
0571: }
0572:
0573: Coding setS(int S) {
0574: return Coding.of(B, H, S, del);
0575: }
0576:
0577: Coding setL(int L) {
0578: return setH(256 - L);
0579: }
0580:
0581: Coding setD(int del) {
0582: return Coding.of(B, H, S, del);
0583: }
0584:
0585: Coding getDeltaCoding() {
0586: return setD(del + 1);
0587: }
0588:
0589: /** Return a coding suitable for representing summed, modulo-reduced values. */
0590: Coding getValueCoding() {
0591: if (isDelta())
0592: return Coding.of(B, H, 0, del - 1);
0593: else
0594: return this ;
0595: }
0596:
0597: /** Reduce the given value to be within this coding's unsigned range,
0598: * by adding or subtracting a multiple of (max-min+1).
0599: */
0600: int reduceToUnsignedRange(long value) {
0601: if (value == (int) value && canRepresentUnsigned((int) value))
0602: // already in unsigned range
0603: return (int) value;
0604: int range = getRange();
0605: assert (range > 0);
0606: value %= range;
0607: if (value < 0)
0608: value += range;
0609: assert (canRepresentUnsigned((int) value));
0610: return (int) value;
0611: }
0612:
0613: int reduceToSignedRange(int value) {
0614: if (canRepresentSigned(value))
0615: // already in signed range
0616: return value;
0617: return reduceToSignedRange(value, min, max);
0618: }
0619:
0620: static int reduceToSignedRange(int value, int min, int max) {
0621: int range = (max - min + 1);
0622: assert (range > 0);
0623: int value0 = value;
0624: value -= min;
0625: if (value < 0 && value0 >= 0) {
0626: // 32-bit overflow, but the next '%=' op needs to be unsigned
0627: value -= range;
0628: assert (value >= 0);
0629: }
0630: value %= range;
0631: if (value < 0)
0632: value += range;
0633: value += min;
0634: assert (min <= value && value <= max);
0635: return value;
0636: }
0637:
0638: /** Does this coding support at least one negative value?
0639: Includes codings that can do so via 32-bit wraparound.
0640: */
0641: boolean isSigned() {
0642: return min < 0;
0643: }
0644:
0645: /** Does this coding code arrays by making successive differences? */
0646: boolean isDelta() {
0647: return del != 0;
0648: }
0649:
0650: public int B() {
0651: return B;
0652: }
0653:
0654: public int H() {
0655: return H;
0656: }
0657:
0658: public int L() {
0659: return L;
0660: }
0661:
0662: public int S() {
0663: return S;
0664: }
0665:
0666: public int del() {
0667: return del;
0668: }
0669:
0670: public int min() {
0671: return min;
0672: }
0673:
0674: public int max() {
0675: return max;
0676: }
0677:
0678: public int umin() {
0679: return umin;
0680: }
0681:
0682: public int umax() {
0683: return umax;
0684: }
0685:
0686: public int byteMin(int b) {
0687: return byteMin[b - 1];
0688: }
0689:
0690: public int byteMax(int b) {
0691: return byteMax[b - 1];
0692: }
0693:
0694: public int compareTo(Object x) {
0695: Coding that = (Coding) x;
0696: int dkey = this .del - that.del;
0697: if (dkey == 0)
0698: dkey = this .B - that.B;
0699: if (dkey == 0)
0700: dkey = this .H - that.H;
0701: if (dkey == 0)
0702: dkey = this .S - that.S;
0703: return dkey;
0704: }
0705:
0706: /** Heuristic measure of the difference between two codings. */
0707: public int distanceFrom(Coding that) {
0708: int diffdel = this .del - that.del;
0709: if (diffdel < 0)
0710: diffdel = -diffdel;
0711: int diffS = this .S - that.S;
0712: if (diffS < 0)
0713: diffS = -diffS;
0714: int diffB = this .B - that.B;
0715: if (diffB < 0)
0716: diffB = -diffB;
0717: int diffHL;
0718: if (this .H == that.H) {
0719: diffHL = 0;
0720: } else {
0721: // Distance in log space of H (<=128) and L (<128).
0722: int this HL = this .getHL();
0723: int thatHL = that.getHL();
0724: // Double the accuracy of the log:
0725: this HL *= this HL;
0726: thatHL *= thatHL;
0727: if (this HL > thatHL)
0728: diffHL = ceil_lg2(1 + (this HL - 1) / thatHL);
0729: else
0730: diffHL = ceil_lg2(1 + (thatHL - 1) / this HL);
0731: }
0732: int norm = 5 * (diffdel + diffS + diffB) + diffHL;
0733: assert (norm != 0 || this .compareTo(that) == 0);
0734: return norm;
0735: }
0736:
0737: private int getHL() {
0738: // Follow H in log space by the multiplicative inverse of L.
0739: if (H <= 128)
0740: return H;
0741: if (L >= 1)
0742: return 128 * 128 / L;
0743: return 128 * 256;
0744: }
0745:
0746: /** ceiling(log[2](x)): {1->0, 2->1, 3->2, 4->2, ...} */
0747: static int ceil_lg2(int x) {
0748: assert (x - 1 >= 0); // x in range (int.MIN_VALUE -> 32)
0749: x -= 1;
0750: int lg = 0;
0751: while (x != 0) {
0752: lg++;
0753: x >>= 1;
0754: }
0755: return lg;
0756: }
0757:
0758: static private final byte[] byteBitWidths = new byte[0x100];
0759: static {
0760: for (int b = 0; b < byteBitWidths.length; b++) {
0761: byteBitWidths[b] = (byte) ceil_lg2(b + 1);
0762: }
0763: for (int i = 10; i >= 0; i = (i << 1) - (i >> 3)) {
0764: assert (bitWidth(i) == ceil_lg2(i + 1));
0765: }
0766: }
0767:
0768: /** Number of significant bits in i, not counting sign bits.
0769: * For positive i, it is ceil_lg2(i + 1).
0770: */
0771: static int bitWidth(int i) {
0772: if (i < 0)
0773: i = ~i; // change sign
0774: int w = 0;
0775: int lo = i;
0776: if (lo < byteBitWidths.length)
0777: return byteBitWidths[lo];
0778: int hi;
0779: hi = (lo >>> 16);
0780: if (hi != 0) {
0781: lo = hi;
0782: w += 16;
0783: }
0784: hi = (lo >>> 8);
0785: if (hi != 0) {
0786: lo = hi;
0787: w += 8;
0788: }
0789: w += byteBitWidths[lo];
0790: //assert(w == ceil_lg2(i + 1));
0791: return w;
0792: }
0793:
0794: /** Create an array of successive differences.
0795: * If min==max, accept any and all 32-bit overflow.
0796: * Otherwise, avoid 32-bit overflow, and reduce all differences
0797: * to a value in the given range, by adding or subtracting
0798: * multiples of the range cardinality (max-min+1).
0799: * Also, the values are assumed to be in the range [0..(max-min)].
0800: */
0801: static int[] makeDeltas(int[] values, int start, int end, int min,
0802: int max) {
0803: assert (max >= min);
0804: int count = end - start;
0805: int[] deltas = new int[count];
0806: int state = 0;
0807: if (min == max) {
0808: for (int i = 0; i < count; i++) {
0809: int value = values[start + i];
0810: deltas[i] = value - state;
0811: state = value;
0812: }
0813: } else {
0814: for (int i = 0; i < count; i++) {
0815: int value = values[start + i];
0816: assert (value >= 0 && value + min <= max);
0817: int delta = value - state;
0818: assert (delta == (long) value - (long) state); // no overflow
0819: state = value;
0820: // Reduce delta values to the required range.
0821: delta = reduceToSignedRange(delta, min, max);
0822: deltas[i] = delta;
0823: }
0824: }
0825: return deltas;
0826: }
0827:
0828: boolean canRepresent(int minValue, int maxValue) {
0829: assert (minValue <= maxValue);
0830: if (del > 0) {
0831: if (isSubrange()) {
0832: // We will force the values to reduce to the right subrange.
0833: return canRepresentUnsigned(maxValue)
0834: && canRepresentUnsigned(minValue);
0835: } else {
0836: // Huge range; delta values must assume full 32-bit range.
0837: return isFullRange();
0838: }
0839: } else
0840: // final values must be representable
0841: return canRepresentSigned(maxValue)
0842: && canRepresentSigned(minValue);
0843: }
0844:
0845: boolean canRepresent(int[] values, int start, int end) {
0846: int len = end - start;
0847: if (len == 0)
0848: return true;
0849: if (isFullRange())
0850: return true;
0851: // Calculate max, min:
0852: int max = values[start];
0853: int min = max;
0854: for (int i = 1; i < len; i++) {
0855: int value = values[start + i];
0856: if (max < value)
0857: max = value;
0858: if (min > value)
0859: min = value;
0860: }
0861: return canRepresent(min, max);
0862: }
0863:
0864: public double getBitLength(int value) { // implements BitMetric
0865: return (double) getLength(value) * 8;
0866: }
0867:
0868: /** How many bytes are in the coding of this value?
0869: * Returns Integer.MAX_VALUE if the value has no coding.
0870: * The coding must not be a delta coding, since there is no
0871: * definite size for a single value apart from its context.
0872: */
0873: public int getLength(int value) {
0874: if (isDelta() && isSubrange()) {
0875: if (!canRepresentUnsigned(value))
0876: return Integer.MAX_VALUE;
0877: value = reduceToSignedRange(value);
0878: }
0879: if (value >= 0) {
0880: for (int n = 0; n < B; n++) {
0881: if (value <= byteMax[n])
0882: return n + 1;
0883: }
0884: } else {
0885: for (int n = 0; n < B; n++) {
0886: if (value >= byteMin[n])
0887: return n + 1;
0888: }
0889: }
0890: return Integer.MAX_VALUE;
0891: }
0892:
0893: public int getLength(int[] values, int start, int end) {
0894: int len = end - start;
0895: if (B == 1)
0896: return len;
0897: if (L == 0)
0898: return len * B;
0899: if (isDelta()) {
0900: int[] deltas;
0901: if (!isSubrange())
0902: deltas = makeDeltas(values, start, end, 0, 0);
0903: else
0904: deltas = makeDeltas(values, start, end, min, max);
0905: //return Coding.of(B, H, S).getLength(deltas, 0, len);
0906: values = deltas;
0907: start = 0;
0908: end = values.length;
0909: }
0910: int sum = len; // at least 1 byte per
0911: // add extra bytes for extra-long values
0912: for (int n = 1; n <= B; n++) {
0913: // what is the coding interval [min..max] for n bytes?
0914: int max = byteMax[n - 1];
0915: int min = byteMin[n - 1];
0916: int longer = 0; // count of guys longer than n bytes
0917: for (int i = 0; i < len; i++) {
0918: int value = values[start + i];
0919: if (value >= 0) {
0920: if (value > max)
0921: longer++;
0922: } else {
0923: if (value < min)
0924: longer++;
0925: }
0926: }
0927: if (longer == 0)
0928: break; // no more passes needed
0929: if (n == B)
0930: return Integer.MAX_VALUE; // cannot represent!
0931: sum += longer;
0932: }
0933: return sum;
0934: }
0935:
0936: public byte[] getMetaCoding(Coding dflt) {
0937: if (dflt == this )
0938: return new byte[] { (byte) _meta_default };
0939: int canonicalIndex = BandStructure.indexOf(this );
0940: if (canonicalIndex > 0)
0941: return new byte[] { (byte) canonicalIndex };
0942: return new byte[] { (byte) _meta_arb,
0943: (byte) (del + 2 * S + 8 * (B - 1)), (byte) (H - 1) };
0944: }
0945:
0946: public static int parseMetaCoding(byte[] bytes, int pos,
0947: Coding dflt, CodingMethod res[]) {
0948: int op = bytes[pos++] & 0xFF;
0949: if (_meta_canon_min <= op && op <= _meta_canon_max) {
0950: Coding c = BandStructure.codingForIndex(op);
0951: assert (c != null);
0952: res[0] = c;
0953: return pos;
0954: }
0955: if (op == _meta_arb) {
0956: int dsb = bytes[pos++] & 0xFF;
0957: int H_1 = bytes[pos++] & 0xFF;
0958: int del = dsb % 2;
0959: int S = (dsb / 2) % 4;
0960: int B = (dsb / 8) + 1;
0961: int H = H_1 + 1;
0962: if (!((1 <= B && B <= B_MAX) && (0 <= S && S <= S_MAX)
0963: && (1 <= H && H <= H_MAX) && (0 <= del && del <= 1))
0964: || (B == 1 && H != 256) || (B == 5 && H == 256)) {
0965: throw new RuntimeException("Bad arb. coding: (" + B
0966: + "," + H + "," + S + "," + del);
0967: }
0968: res[0] = Coding.of(B, H, S, del);
0969: return pos;
0970: }
0971: return pos - 1; // backup
0972: }
0973:
0974: public String keyString() {
0975: return "(" + B + "," + H + "," + S + "," + del + ")";
0976: }
0977:
0978: public String toString() {
0979: String str = "Coding" + keyString();
0980: // If -ea, print out more informative strings!
0981: //assert((str = stringForDebug()) != null);
0982: return str;
0983: }
0984:
0985: static boolean verboseStringForDebug = false;
0986:
0987: String stringForDebug() {
0988: String minS = (min == Integer.MIN_VALUE ? "min" : "" + min);
0989: String maxS = (max == Integer.MAX_VALUE ? "max" : "" + max);
0990: String str = keyString() + " L=" + L + " r=[" + minS + ","
0991: + maxS + "]";
0992: if (isSubrange())
0993: str += " subrange";
0994: else if (!isFullRange())
0995: str += " MIDRANGE";
0996: if (verboseStringForDebug) {
0997: str += " {";
0998: int prev_range = 0;
0999: for (int n = 1; n <= B; n++) {
1000: int range_n = saturate32((long) byteMax[n - 1]
1001: - byteMin[n - 1] + 1);
1002: assert (range_n == saturate32(codeRangeLong(B, H, n)));
1003: range_n -= prev_range;
1004: prev_range = range_n;
1005: String rngS = (range_n == Integer.MAX_VALUE ? "max"
1006: : "" + range_n);
1007: str += " #" + n + "=" + rngS;
1008: }
1009: str += " }";
1010: }
1011: return str;
1012: }
1013: }
|