public class GroupingShuffleRangeTracker extends Object implements RangeTracker<ByteArrayShufflePosition>
RangeTracker for positions used by GroupingShuffleReader
(ByteArrayShufflePosition).
These positions roughly correspond to hashes of keys. In case of hash collisions, multiple groups can have the same position. In that case, the first group at a particular position is considered a split point (because it is the first to be returned when reading a position range starting at this position), others are not.
| Constructor and Description |
|---|
GroupingShuffleRangeTracker(ByteArrayShufflePosition startPosition,
ByteArrayShufflePosition stopPosition) |
| Modifier and Type | Method and Description |
|---|---|
double |
getFractionConsumed()
Returns the approximate fraction of positions in the source that have been consumed by
successful
RangeTracker.tryReturnRecordAt(boolean, PositionT) calls, or 0.0 if no such calls have happened. |
ByteArrayShufflePosition |
getLastGroupStart() |
ByteArrayShufflePosition |
getStartPosition()
Returns the starting position of the current range, inclusive.
|
ByteArrayShufflePosition |
getStopPosition()
Returns the ending position of the current range, exclusive.
|
String |
toString() |
boolean |
tryReturnRecordAt(boolean isAtSplitPoint,
ByteArrayShufflePosition groupStart)
Atomically determines whether a record at the given position can be returned and updates
internal state.
|
boolean |
trySplitAtPosition(ByteArrayShufflePosition splitPosition)
Atomically splits the current range [
RangeTracker.getStartPosition(), RangeTracker.getStopPosition())
into a "primary" part [RangeTracker.getStartPosition(), splitPosition)
and a "residual" part [splitPosition, RangeTracker.getStopPosition()), assuming the current
last-consumed position is within [RangeTracker.getStartPosition(), splitPosition)
(i.e., splitPosition has not been consumed yet). |
public GroupingShuffleRangeTracker(@Nullable ByteArrayShufflePosition startPosition, @Nullable ByteArrayShufflePosition stopPosition)
public ByteArrayShufflePosition getStartPosition()
RangeTrackergetStartPosition in interface RangeTracker<ByteArrayShufflePosition>public ByteArrayShufflePosition getStopPosition()
RangeTrackergetStopPosition in interface RangeTracker<ByteArrayShufflePosition>public ByteArrayShufflePosition getLastGroupStart()
public boolean tryReturnRecordAt(boolean isAtSplitPoint,
ByteArrayShufflePosition groupStart)
RangeTrackerisAtSplitPoint is true, and recordStart is outside the current
range, returns false;
recordStart and returns
true.
This method MUST be called on all split point records. It may be called on every record.
tryReturnRecordAt in interface RangeTracker<ByteArrayShufflePosition>public boolean trySplitAtPosition(ByteArrayShufflePosition splitPosition)
RangeTrackerRangeTracker.getStartPosition(), RangeTracker.getStopPosition())
into a "primary" part [RangeTracker.getStartPosition(), splitPosition)
and a "residual" part [splitPosition, RangeTracker.getStopPosition()), assuming the current
last-consumed position is within [RangeTracker.getStartPosition(), splitPosition)
(i.e., splitPosition has not been consumed yet).
Updates the current range to be the primary and returns true. This means that
all further calls on the current object will interpret their arguments relative to the
primary range.
If the split position has already been consumed, or if no RangeTracker.tryReturnRecordAt(boolean, PositionT) call
was made yet, returns false. The second condition is to prevent dynamic splitting
during reader start-up.
trySplitAtPosition in interface RangeTracker<ByteArrayShufflePosition>public double getFractionConsumed()
RangeTrackerRangeTracker.tryReturnRecordAt(boolean, PositionT) calls, or 0.0 if no such calls have happened.getFractionConsumed in interface RangeTracker<ByteArrayShufflePosition>