public class HadoopTableReader extends Object implements TableReader
| Constructor and Description |
|---|
HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
MetastoreRelation relation,
HiveContext sc,
org.apache.hadoop.hive.conf.HiveConf hiveExtraConf) |
| Modifier and Type | Method and Description |
|---|---|
static scala.collection.Iterator<Row> |
fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator,
org.apache.hadoop.hive.serde2.Deserializer deserializer,
scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs,
org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)
Transform all given raw
Writables into Rows. |
static void |
initializeLocalJobConfFunc(String path,
org.apache.hadoop.hive.ql.plan.TableDesc tableDesc,
org.apache.hadoop.mapred.JobConf jobConf)
Curried.
|
RDD<Row> |
makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer,
scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
Create a HadoopRDD for every partition key specified in the query.
|
RDD<Row> |
makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions) |
RDD<Row> |
makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable) |
RDD<Row> |
makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable,
Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass,
scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
Creates a Hadoop RDD to read data from the target table's data directory.
|
public HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
MetastoreRelation relation,
HiveContext sc,
org.apache.hadoop.hive.conf.HiveConf hiveExtraConf)
public static void initializeLocalJobConfFunc(String path,
org.apache.hadoop.hive.ql.plan.TableDesc tableDesc,
org.apache.hadoop.mapred.JobConf jobConf)
public static scala.collection.Iterator<Row> fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator, org.apache.hadoop.hive.serde2.Deserializer deserializer, scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs, org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)
Writables into Rows.
iterator - Iterator of all Writables to be transformeddeserializer - The Deserializer associated with the input WritablenonPartitionKeyAttrs - Attributes that should be filled together with their corresponding
positions in the output schemamutableRow - A reusable MutableRow that should be filledIterator[Row] transformed from iteratorpublic RDD<Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable)
makeRDDForTable in interface TableReaderpublic RDD<Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable, Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
hiveTable - Hive metadata for the table being scanned.deserializerClass - Class of the SerDe used to deserialize Writables read from Hadoop.filterOpt - If defined, then the filter is used to reject files contained in the data
directory being read. If None, then all files are accepted.public RDD<Row> makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions)
makeRDDForPartitionedTable in interface TableReaderpublic RDD<Row> makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
partitionToDeserializer - Mapping from a Hive Partition metadata object to the SerDe
class to use to deserialize input Writables from the corresponding partition.filterOpt - If defined, then the filter is used to reject files contained in the data
subdirectory of each partition being read. If None, then all files are accepted.