HadoopTableReader (Spark 1.3.1 JavaDoc)

Object
- org.apache.spark.sql.hive.HadoopTableReader

All Implemented Interfaces:

TableReader
```
public class HadoopTableReader
extends Object
implements TableReader
```
Helper class for scanning tables stored in Hadoop - e.g., to read Hive tables that reside in the data warehouse directory.

Constructor Summary

Constructors
Constructor and Description
`HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes, MetastoreRelation relation, HiveContext sc, org.apache.hadoop.hive.conf.HiveConf hiveExtraConf)`

Method Summary

Methods
Modifier and Type	Method and Description
`static scala.collection.Iterator<Row>`	`fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator, org.apache.hadoop.hive.serde2.Deserializer deserializer, scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs, org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)` Transform all given raw `Writable`s into `Row`s.
`static void`	`initializeLocalJobConfFunc(String path, org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf)` Curried.
`RDD<Row>`	`makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)` Create a HadoopRDD for every partition key specified in the query.
`RDD<Row>`	`makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions)`
`RDD<Row>`	`makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable)`
`RDD<Row>`	`makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable, Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass, scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)` Creates a Hadoop RDD to read data from the target table's data directory.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - HadoopTableReader
```
public HadoopTableReader(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> attributes,
                 MetastoreRelation relation,
                 HiveContext sc,
                 org.apache.hadoop.hive.conf.HiveConf hiveExtraConf)
```
- Method Detail
  - initializeLocalJobConfFunc
```
public static void initializeLocalJobConfFunc(String path,
                              org.apache.hadoop.hive.ql.plan.TableDesc tableDesc,
                              org.apache.hadoop.mapred.JobConf jobConf)
```
    Curried. After given an argument for 'path', the resulting JobConf => Unit closure is used to instantiate a HadoopRDD.
  - fillObject
```
public static scala.collection.Iterator<Row> fillObject(scala.collection.Iterator<org.apache.hadoop.io.Writable> iterator,
                                        org.apache.hadoop.hive.serde2.Deserializer deserializer,
                                        scala.collection.Seq<scala.Tuple2<org.apache.spark.sql.catalyst.expressions.Attribute,Object>> nonPartitionKeyAttrs,
                                        org.apache.spark.sql.catalyst.expressions.MutableRow mutableRow)
```
    Transform all given raw Writables into Rows.
    
    Parameters:
    iterator - Iterator of all Writables to be transformed
    deserializer - The Deserializer associated with the input Writable
    nonPartitionKeyAttrs - Attributes that should be filled together with their corresponding positions in the output schema
    mutableRow - A reusable MutableRow that should be filled
    
    Returns:
    An Iterator[Row] transformed from iterator
  - makeRDDForTable
```
public RDD<Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable)
```
    Specified by:
    
    makeRDDForTable in interface TableReader
  - makeRDDForTable
```
public RDD<Row> makeRDDForTable(org.apache.hadoop.hive.ql.metadata.Table hiveTable,
                       Class<? extends org.apache.hadoop.hive.serde2.Deserializer> deserializerClass,
                       scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
```
    Creates a Hadoop RDD to read data from the target table's data directory. Returns a transformed RDD that contains deserialized rows.
    
    Parameters:
    hiveTable - Hive metadata for the table being scanned.
    deserializerClass - Class of the SerDe used to deserialize Writables read from Hadoop.
    filterOpt - If defined, then the filter is used to reject files contained in the data directory being read. If None, then all files are accepted.
  - makeRDDForPartitionedTable
```
public RDD<Row> makeRDDForPartitionedTable(scala.collection.Seq<org.apache.hadoop.hive.ql.metadata.Partition> partitions)
```
    Specified by:
    
    makeRDDForPartitionedTable in interface TableReader
  - makeRDDForPartitionedTable
```
public RDD<Row> makeRDDForPartitionedTable(scala.collection.immutable.Map<org.apache.hadoop.hive.ql.metadata.Partition,Class<? extends org.apache.hadoop.hive.serde2.Deserializer>> partitionToDeserializer,
                                  scala.Option<org.apache.hadoop.fs.PathFilter> filterOpt)
```
    Create a HadoopRDD for every partition key specified in the query. Note that for on-disk Hive tables, a data directory is created for each partition corresponding to keys specified using 'PARTITION BY'.
    
    Parameters:
    partitionToDeserializer - Mapping from a Hive Partition metadata object to the SerDe class to use to deserialize input Writables from the corresponding partition.
    filterOpt - If defined, then the filter is used to reject files contained in the data subdirectory of each partition being read. If None, then all files are accepted.

Class HadoopTableReader

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Detail

HadoopTableReader

Method Detail

initializeLocalJobConfFunc

fillObject

makeRDDForTable

makeRDDForTable

makeRDDForPartitionedTable

makeRDDForPartitionedTable