|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.rdd.RDD<scala.Tuple2<K,V>>
org.apache.spark.rdd.HadoopRDD<K,V>
public class HadoopRDD<K,V>
:: DeveloperApi ::
An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS,
sources in HBase, or S3), using the older MapReduce API (org.apache.hadoop.mapred).
Note: Instantiating this class directly is not recommended, please use
org.apache.spark.SparkContext.hadoopRDD()
param: sc The SparkContext to associate the RDD with. param: broadcastedConf A general Hadoop Configuration, or a subclass of it. If the enclosed variabe references an instance of JobConf, then that JobConf will be used for the Hadoop job. Otherwise, a new JobConf will be created on each slave using the enclosed Configuration. param: initLocalJobConfFuncOpt Optional closure used to initialize any JobConf that HadoopRDD creates. param: inputFormatClass Storage format of the data to be read. param: keyClass Class of the key associated with the inputFormatClass. param: valueClass Class of the value associated with the inputFormatClass. param: minPartitions Minimum number of HadoopRDD partitions (Hadoop Splits) to generate.
| Constructor Summary | |
|---|---|
HadoopRDD(SparkContext sc,
Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf,
scala.Option<scala.Function1<org.apache.hadoop.mapred.JobConf,scala.runtime.BoxedUnit>> initLocalJobConfFuncOpt,
Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
Class<K> keyClass,
Class<V> valueClass,
int minPartitions)
|
|
HadoopRDD(SparkContext sc,
org.apache.hadoop.mapred.JobConf conf,
Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
Class<K> keyClass,
Class<V> valueClass,
int minPartitions)
|
|
| Method Summary | ||
|---|---|---|
static void |
addLocalConfiguration(String jobTrackerId,
int jobId,
int splitId,
int attemptId,
org.apache.hadoop.mapred.JobConf conf)
Add Hadoop configuration specific to a single partition and attempt. |
|
void |
checkpoint()
Mark this RDD for checkpointing. |
|
InterruptibleIterator<scala.Tuple2<K,V>> |
compute(Partition theSplit,
TaskContext context)
:: DeveloperApi :: Implemented by subclasses to compute a given partition. |
|
static Object |
CONFIGURATION_INSTANTIATION_LOCK()
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456). |
|
static boolean |
containsCachedMetadata(String key)
|
|
static Object |
getCachedMetadata(String key)
The three methods below are helpers for accessing the local map, a property of the SparkEnv of the local process. |
|
org.apache.hadoop.conf.Configuration |
getConf()
|
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD. |
|
scala.collection.Seq<String> |
getPreferredLocations(Partition split)
Optionally overridden by subclasses to specify placement preferences. |
|
|
mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapred.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
boolean preservesPartitioning,
scala.reflect.ClassTag<U> evidence$1)
Maps over a partition, providing the InputSplit that was used as the base of the partition. |
|
HadoopRDD<K,V> |
persist(StorageLevel storageLevel)
Set this RDD's storage level to persist its values across operations after the first time it is computed. |
|
static int |
RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES()
Update the input bytes read metric each time this number of records has been read |
|
static scala.Option<org.apache.spark.rdd.HadoopRDD.SplitInfoReflections> |
SPLIT_INFO_REFLECTIONS()
|
|
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface org.apache.spark.Logging |
|---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
| Constructor Detail |
|---|
public HadoopRDD(SparkContext sc,
Broadcast<SerializableWritable<org.apache.hadoop.conf.Configuration>> broadcastedConf,
scala.Option<scala.Function1<org.apache.hadoop.mapred.JobConf,scala.runtime.BoxedUnit>> initLocalJobConfFuncOpt,
Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
Class<K> keyClass,
Class<V> valueClass,
int minPartitions)
public HadoopRDD(SparkContext sc,
org.apache.hadoop.mapred.JobConf conf,
Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass,
Class<K> keyClass,
Class<V> valueClass,
int minPartitions)
| Method Detail |
|---|
public static Object CONFIGURATION_INSTANTIATION_LOCK()
public static int RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES()
public static Object getCachedMetadata(String key)
key - (undocumented)
public static boolean containsCachedMetadata(String key)
public static void addLocalConfiguration(String jobTrackerId,
int jobId,
int splitId,
int attemptId,
org.apache.hadoop.mapred.JobConf conf)
public static scala.Option<org.apache.spark.rdd.HadoopRDD.SplitInfoReflections> SPLIT_INFO_REFLECTIONS()
public Partition[] getPartitions()
RDD
public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit,
TaskContext context)
RDD
compute in class RDD<scala.Tuple2<K,V>>theSplit - (undocumented)context - (undocumented)
public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapred.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
boolean preservesPartitioning,
scala.reflect.ClassTag<U> evidence$1)
public scala.collection.Seq<String> getPreferredLocations(Partition split)
RDD
split - (undocumented)
public void checkpoint()
RDD
checkpoint in class RDD<scala.Tuple2<K,V>>public HadoopRDD<K,V> persist(StorageLevel storageLevel)
RDD
persist in class RDD<scala.Tuple2<K,V>>storageLevel - (undocumented)
public org.apache.hadoop.conf.Configuration getConf()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||