Reading from a Snowflake Data Store
You can run the Qubole Dataframe API for Apache Spark to read data from any virtual warehouse that is associated with the Snowflake data store from the Qubole Analyze page, notebooks, or APIs.
You can use either Scala or Python code snippets in the Spark command or a Spark application to read the data from the Snowflake data store.
Prerequisites
You must ensure that you are using Spark version 2.x.x or above.
Before running any Spark command, you must ensure that the Snowflake virtual warehouse is running.
Note
If you want to use an already running Spark cluster to read data from the newly added Snowflake data store, then restart the Spark cluster so that the Snowflake jars are installed on the Spark cluster.
DML queries are not supported.
Before you begin, you should review and understand the parameters required for reading data from a Snowflake data store. See Understanding the parameters used for Snowflake
Note
If the database name contains hyphen (-), you must use additional escape characters ("\"database-name\""
) for the database name when composing the Spark command or Spark application.
Reading from the Analyze Page
From the Home menu, navigate to the Analyze page and click Compose.
Select Spark Command from the Command Type drop-down list.
Depending on the language you want to use, perform the appropriate actions:
Scala:
By default, Scala is selected. Use the following code snippet and compose the Spark application in the query editor.
val df = spark.read
.option("sfDatabase","<database-name>")
.snowflake("<catalog-name>","<snowflake-virtual-warehouse-name>", "<query>")
Python:
Select Python from the drop-down list.
Use the following code snippet and compose the Spark application in Python in the query editor.
df = spark.read \
.option("sfDatabase", "<database-name>") \
.snowflake("<catalog-name>", "<snowflake-virtual-warehouse-name>", "<query>")
Click Run to execute the query. Click Save if you want to run the same query later.
The query result is displayed in the Results tab, and the query logs in the Logs tab. The Logs tab has a Errors and Warnings filter. For more information on how to download command results and logs, see Downloading Results and Logs.
Example
The following example code shows how to read data from a Snowflake data store using Scala.
val query = "select id,name,class from test_table1 "
val read_df = spark.read.option("sfDatabase", "TEST_DB").snowflake("snowflake_catalog", "load_wh", query)
The following example code shows how to read data from a Snowflake data store using Python.
query = "select id,name,class from test_table1 "
read_df = spark.read.option("sfDatabase", "TEST_DB").snowflake("snowflake_catalog", "load_wh", query)
Reading from Notebooks
Note
You should have configured a Spark notebook and associated an interpreter setting with the notebook before running any Spark command or Spark application. See Composing a Spark Command and Running Spark Applications for more information.
From the Home menu, navigate to the Notebooks page.
Select a Spark notebook from the list of notebooks and ensure that it is assigned cluster is up to use it for running queries.
Use the following code snippet and compose the Spark application in the paragraph.
val df = spark.read
.option("sfDatabase","<database-name>")
.snowflake("<catalog-name>","<snowflake-virtual-warehouse-name>", "<query>")
Run the paragraph.
Example
The following example code shows how to read data from a Snowflake data store using Notebooks.
case class Person(name: String, age: Int)
val persons = Array(Person("snow_user1", 1), Person("snow_user2", 2), Person("snow_user3", 3), Person("snow_user4", 4))
val df = spark.createDataFrame(persons)
// write into snow-flake table
df.write.option("sfDatabase", "TEST_DB").mode("overwrite").snowflake("snowflake-test", "load_wh", "new_persons_221")
// check the data written into table
val query = "select * from new_persons_221 order by name"
val read_people = spark.read.option("sfDatabase", "TEST_DB").snowflake("snowflake-test", "load_wh", query)
read_people.collect.foreach(println)
For more information, see Composing a Spark Command and Running Spark Applications.
For REST API-related information, see Submit a Spark Command.