Connect SparkThriftServer with Tableau/PowerBI
REFERENCE : https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-use-bi-tools
Use Power BI for
Spark data visualization
Note
This section is applicable only for Spark 1.6 on
HDInsight 3.4 and Spark 2.0 on HDInsight 3.5.
Once you have saved the data as a table, you can use
Power BI to connect to the data and visualize it to create reports, dashboards,
etc.
1.
Make sure you have access to Power BI. You can
get a free preview subscription of Power BI from http://www.powerbi.com/.
2.
Sign in to Power
BI.
3.
From the bottom of the left pane, click Get
Data.
4.
On the Get Data page, under Import
or Connect to Data, for Databases, click Get.
5.
On the next screen, click Spark on Azure
HDInsight and then click Connect. When prompted,
enter the cluster URL (mysparkcluster.azurehdinsight.net
)
and the credentials to connect to the cluster.
After the connection is established, Power BI
starts importing data from the Spark cluster on HDInsight.
6.
Power BI imports the data and adds a Spark
dataset under the Datasets heading. Click the data set to open
a new worksheet to visualize the data. You can also save the worksheet as a
report. To save a worksheet, from the File menu, click Save.
7. Notice that the Fields list on the right lists the hvac table you created earlier. Expand the table to see the fields in the table, as you defined in notebook earlier.
8. Build a visualization to show the variance
between target temperature and actual temperature for each building. To
visualize yoru data, select Area Chart (shown in red box). To
define the axis, drag-and-drop the BuildingID field under Axis,
and ActualTemp/TargetTemp fields under Value.
9. By default the visualization shows the sum for ActualTemp and TargetTemp. For both the fields, from the drop-down, select Average to get an average of actual and target temperatures for both buildings.
10. Your data visualization should be similar to the
one in the screenshot. Move your cursor over the visualization to get tool tips
with relevant data.
11. Click Save from the top menu
and provide a report name. You can also pin the visual. When you pin a
visualization, it is stored on your dashboard so you can track the latest value
at a glance.
You can add as many visualizations as you want for
the same dataset and pin them to the dashboard for a snapshot of your data.
Also, Spark clusters on HDInsight are connected to Power BI with direct
connect. This ensures that Power BI always has the most up-to-date data from
your cluster so you do not need to schedule refreshes for the dataset.
Use Tableau Desktop for Spark data visualization
Note
This section is applicable only for Spark 1.5.2 clusters
created in Azure HDInsight.
1.
Install Tableau Desktop on the
computer where you are running this Apache Spark BI tutorial.
2.
Make sure that computer also has Microsoft Spark
ODBC driver installed. You can install the driver from here.
3.
Launch Tableau Desktop. In the left pane, from
the list of server to connect to, click Spark SQL. If Spark
SQL is not listed by default in the left pane, you can find it by click More
Servers.
4.
In the Spark SQL connection dialog box, provide
the values as shown in the screenshot, and then click OK.
The authentication drop-down lists Microsoft
Azure HDInsight Service as an option, only if you installed the Microsoft Spark ODBC
Driver on the computer.
5.
On the next screen, from the Schema
drop-down, click the Find icon, and then click default.
6.
For the Table field, click the Find
icon again to list all the Hive tables available in the cluster. You should see
the hvac table you created earlier using the notebook.
7.
Drag and drop the table to the top box on the
right. Tableau imports the data and displays the schema as highlighted by the
red box.
8.
Click the Sheet1 tab at the
bottom left. Make a visualization that shows the average target and actual
temperatures for all buildings for each date. Drag Date and Building
ID to Columns and Actual Temp/Target
Temp to Rows. Under Marks, select Area
to use an area map for Spark data visualization.
9.
By default, the temperature fields are shown as
aggregate. If you want to show the average temperatures instead, you can do so
from the drop-down, as shown below.
10.
You can also super-impose one temperature map
over the other to get a better feel of difference between target and actual
temperatures. Move the mouse to the corner of the lower area map till you see
the handle shape highlighted in a red circle. Drag the map to the other map on
the top and release the mouse when you see the shape highlighted in red
rectangle.
11. Click Save
to save the worksheet. You can create dashboards and add one or more sheets to
it.
Comments
Post a Comment