public final class HadoopDataInputStream
extends org.apache.flink.core.fs.FSDataInputStream
FSDataInputStream for Hadoop's input streams.
This supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).| Modifier and Type | Field and Description |
|---|---|
static int |
MIN_SKIP_BYTES
Minimum amount of bytes to skip forward before we issue a seek instead of discarding read.
|
| Constructor and Description |
|---|
HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
Creates a new data input stream from the given Hadoop input stream.
|
| Modifier and Type | Method and Description |
|---|---|
int |
available() |
void |
close() |
void |
forceSeek(long seekPos)
Positions the stream to the given location.
|
org.apache.hadoop.fs.FSDataInputStream |
getHadoopInputStream()
Gets the wrapped Hadoop input stream.
|
long |
getPos() |
int |
read() |
int |
read(byte[] buffer,
int offset,
int length) |
void |
seek(long seekPos) |
long |
skip(long n) |
void |
skipFully(long bytes)
Skips over a given amount of bytes in the stream.
|
mark, markSupported, read, resetpublic static final int MIN_SKIP_BYTES
The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.
The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.
public HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
fsDataInputStream - The Hadoop input streampublic void seek(long seekPos)
throws IOException
seek in class org.apache.flink.core.fs.FSDataInputStreamIOExceptionpublic long getPos()
throws IOException
getPos in class org.apache.flink.core.fs.FSDataInputStreamIOExceptionpublic int read()
throws IOException
read in class InputStreamIOExceptionpublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class InputStreamIOExceptionpublic int read(@Nonnull byte[] buffer, int offset, int length) throws IOException
read in class InputStreamIOExceptionpublic int available()
throws IOException
available in class InputStreamIOExceptionpublic long skip(long n)
throws IOException
skip in class InputStreamIOExceptionpublic org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()
public void forceSeek(long seekPos)
throws IOException
seek(long), this method will
always issue a "seek" command to the dfs and may not replace it by skip(long) for small seeks.
Notice that the underlying DFS implementation can still decide to do skip instead of seek.
seekPos - the position to seek to.IOExceptionpublic void skipFully(long bytes)
throws IOException
bytes - the number of bytes to skip.IOExceptionCopyright © 2014–2018 The Apache Software Foundation. All rights reserved.