:: DeveloperApi ::
:: DeveloperApi ::
The data type for collections of multiple values.
Internally these are represented as columns that contain a .scala.collection.Seq
An ArrayType object comprises two fields, elementType: DataType and
containsNull: Boolean. The field of elementType is used to specify the type of
array elements. The field of containsNull is used to specify if the array has null values.
:: DeveloperApi ::
:: DeveloperApi ::
The base type of all Spark SQL data types.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing scala.math.BigDecimal values.
TODO(matei): explain precision and scale
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Maps. A MapType object comprises three fields,
keyType: DataType, valueType: DataType and valueContainsNull: Boolean.
The field of keyType is used to specify the type of keys in the map.
The field of valueType is used to specify the type of values in the map.
The field of valueContainsNull is used to specify if values of this map has null values.
For values of a MapType column, keys are not allowed to have null values.
:: DeveloperApi ::
:: DeveloperApi ::
Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean, Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and Array[Metadata]. JSON is used for serialization.
The default constructor is private. User should use either MetadataBuilder or Metadata$#fromJson to create Metadata instances.
:: DeveloperApi :: Builder for Metadata.
:: DeveloperApi :: Builder for Metadata. If there is a key collision, the latter will overwrite the former.
:: DeveloperApi ::
:: DeveloperApi ::
Represents one row of output from a relational operator.
:: AlphaComponent :: The entry point for running relational queries using Spark.
:: AlphaComponent :: An RDD of Row objects that has an associated schema.
Converts a logical plan into zero or more SparkPlans.
Converts a logical plan into zero or more SparkPlans.
:: DeveloperApi ::
:: DeveloperApi ::
A StructField object represents a field in a StructType object.
A StructField object comprises three fields, name: String, dataType: DataType,
and nullable: Boolean. The field of name is the name of a StructField. The field of
dataType specifies the data type of a StructField.
The field of nullable specifies if values of a StructField can contain null values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Rows. A StructType object comprises a Seq of StructFields.
:: DeveloperApi ::
:: DeveloperApi ::
An ArrayType object can be constructed with two ways,
ArrayType(elementType: DataType, containsNull: Boolean)and
ArrayType(elementType: DataType)
For ArrayType(elementType), the field of containsNull is set to false.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Array[Byte] values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Boolean values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Byte values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing java.sql.Date values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing scala.math.BigDecimal values.
TODO(matei): explain precision and scale
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Double values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Float values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Int values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Long values.
:: DeveloperApi ::
:: DeveloperApi ::
A MapType object can be constructed with two ways,
MapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean)and
MapType(keyType: DataType, valueType: DataType)
For MapType(keyType: DataType, valueType: DataType),
the field of valueContainsNull is set to true.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing NULL values.
:: DeveloperApi ::
:: DeveloperApi ::
A Row object can be constructed by providing field values. Example:
import org.apache.spark.sql._ // Create a Row from values. Row(value1, value2, value3, ...) // Create a Row from a Seq of values. Row.fromSeq(Seq(value1, value2, ...))
A value of a row can be accessed through both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. An example of generic access by ordinal:
import org.apache.spark.sql._ val row = Row(1, true, "a string", null) // row: Row = [1,true,a string,null] val firstValue = row(0) // firstValue: Any = 1 val fourthValue = row(3) // fourthValue: Any = null
For native primitive access, it is invalid to use the native primitive interface to retrieve
a value that is null, instead a user must check isNullAt before attempting to retrieve a
value that might be null.
An example of native primitive access:
// using the row from the previous example. val firstValue = row.getInt(0) // firstValue: Int = 1 val isNull = row.isNullAt(3) // isNull: Boolean = true
Interfaces related to native primitive access are:
isNullAt(i: Int): Boolean
getInt(i: Int): Int
getLong(i: Int): Long
getDouble(i: Int): Double
getFloat(i: Int): Float
getBoolean(i: Int): Boolean
getShort(i: Int): Short
getByte(i: Int): Byte
getString(i: Int): String
Fields in a Row object can be extracted in a pattern match. Example:
import org.apache.spark.sql._ val pairs = sql("SELECT key, value FROM src").rdd.map { case Row(key: Int, value: String) => key -> value }
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing Short values.
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing String values
:: DeveloperApi ::
:: DeveloperApi ::
A StructField object can be constructed by
StructField(name: String, dataType: DataType, nullable: Boolean)
:: DeveloperApi ::
:: DeveloperApi ::
A StructType object can be constructed by
StructType(fields: Seq[StructField])For a StructType object, one or multiple StructFields can be extracted by names.
If multiple StructFields are extracted, a StructType object will be returned.
If a provided name does not have a matching field, it will be ignored. For the case
of extracting a single StructField, a null will be returned.
Example:
import org.apache.spark.sql._ val struct = StructType( StructField("a", IntegerType, true) :: StructField("b", LongType, false) :: StructField("c", BooleanType, false) :: Nil) // Extract a single StructField. val singleField = struct("b") // singleField: StructField = StructField(b,LongType,false) // This struct does not have a field called "d". null will be returned. val nonExisting = struct("d") // nonExisting: StructField = null // Extract multiple StructFields. Field names are provided in a set. // A StructType object will be returned. val twoFields = struct(Set("b", "c")) // twoFields: StructType = // StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false))) // Those names do not have matching fields will be ignored. // For the case shown below, "d" will be ignored and // it is treated as struct(Set("b", "c")). val ignoreNonExisting = struct(Set("b", "c", "d")) // ignoreNonExisting: StructType = // StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false)))
A Row object is used as a value of the StructType. Example:
import org.apache.spark.sql._ val innerStruct = StructType( StructField("f1", IntegerType, true) :: StructField("f2", LongType, false) :: StructField("f3", BooleanType, false) :: Nil) val struct = StructType( StructField("a", innerStruct, true) :: Nil) // Create a Row with the schema defined by struct val row = Row(Row(1, 2, true)) // row: Row = [[1,2,true]]
:: DeveloperApi ::
:: DeveloperApi ::
The data type representing java.sql.Timestamp values.
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
A set of APIs for adding data sources to Spark SQL.
Allows the execution of relational queries, including those expressed in SQL using Spark.