Skip to main content

Connect SparkThriftServer with Tableau/PowerBI

 Connect SparkThriftServer with Tableau/PowerBI


REFERENCE : https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-use-bi-tools

Use Power BI for Spark data visualization

Note

This section is applicable only for Spark 1.6 on HDInsight 3.4 and Spark 2.0 on HDInsight 3.5. 

Once you have saved the data as a table, you can use Power BI to connect to the data and visualize it to create reports, dashboards, etc. 

1.      Make sure you have access to Power BI. You can get a free preview subscription of Power BI from http://www.powerbi.com/.

2.      Sign in to Power BI.

3.      From the bottom of the left pane, click Get Data.

4.      On the Get Data page, under Import or Connect to Data, for Databases, click Get.

5.      On the next screen, click Spark on Azure HDInsight and then click Connect. When prompted, enter the cluster URL (mysparkcluster.azurehdinsight.net) and the credentials to connect to the cluster.

After the connection is established, Power BI starts importing data from the Spark cluster on HDInsight.

      6.      Power BI imports the data and adds a Spark dataset under the Datasets heading. Click the data set to open a new worksheet to visualize the data. You can also save the worksheet as a report. To save a worksheet, from the File menu, click Save.

      7.      Notice that the Fields list on the right lists the hvac table you created earlier. Expand the table to see the fields in the table, as you defined in notebook earlier.

 

     8.   Build a visualization to show the variance between target temperature and actual temperature for each building. To visualize yoru data, select Area Chart (shown in red box). To define the axis, drag-and-drop the BuildingID field under Axis, and ActualTemp/TargetTemp fields under Value.

     9.      By default the visualization shows the sum for ActualTemp and TargetTemp. For both the fields, from the drop-down, select Average to get an average of actual and target temperatures for both buildings.

 

    10.      Your data visualization should be similar to the one in the screenshot. Move your cursor over the visualization to get tool tips with relevant data.

    11.    Click Save from the top menu and provide a report name. You can also pin the visual. When you pin a visualization, it is stored on your dashboard so you can track the latest value at a glance.

You can add as many visualizations as you want for the same dataset and pin them to the dashboard for a snapshot of your data. Also, Spark clusters on HDInsight are connected to Power BI with direct connect. This ensures that Power BI always has the most up-to-date data from your cluster so you do not need to schedule refreshes for the dataset.


Use Tableau Desktop for Spark data visualization


Note

This section is applicable only for Spark 1.5.2 clusters created in Azure HDInsight. 

1.      Install Tableau Desktop on the computer where you are running this Apache Spark BI tutorial.

2.      Make sure that computer also has Microsoft Spark ODBC driver installed. You can install the driver from here.

3.      Launch Tableau Desktop. In the left pane, from the list of server to connect to, click Spark SQL. If Spark SQL is not listed by default in the left pane, you can find it by click More Servers.

4.      In the Spark SQL connection dialog box, provide the values as shown in the screenshot, and then click OK.

The authentication drop-down lists Microsoft Azure HDInsight Service as an option, only if you installed the Microsoft Spark ODBC Driver on the computer.

5.      On the next screen, from the Schema drop-down, click the Find icon, and then click default.

6.      For the Table field, click the Find icon again to list all the Hive tables available in the cluster. You should see the hvac table you created earlier using the notebook.

7.      Drag and drop the table to the top box on the right. Tableau imports the data and displays the schema as highlighted by the red box.

8.      Click the Sheet1 tab at the bottom left. Make a visualization that shows the average target and actual temperatures for all buildings for each date. Drag Date and Building ID to Columns and Actual Temp/Target Temp to Rows. Under Marks, select Area to use an area map for Spark data visualization.

9.      By default, the temperature fields are shown as aggregate. If you want to show the average temperatures instead, you can do so from the drop-down, as shown below.

10.      You can also super-impose one temperature map over the other to get a better feel of difference between target and actual temperatures. Move the mouse to the corner of the lower area map till you see the handle shape highlighted in a red circle. Drag the map to the other map on the top and release the mouse when you see the shape highlighted in red rectangle.

     11.      Click Save to save the worksheet. You can create dashboards and add one or more sheets to it.





Thank you !! Example HTML page Pleaes provide your valuable feedback.

Comments

Popular posts from this blog

Jenkins

Pre-requisites 1. Install a Webserver https://gitlab.com/Azam-devops/webserver/-/blob/main/README.md Code for index.html https://gitlab.com/Azam-devops/webserver 2. Maven Code https://gitlab.com/Azam-devops/imperial-maven-project 1. Install & configure Jenkins Automation Server on Linux Vm. 2. Go through at some of the important options in Jenkins. 3. Manage Jenkins. 4. Plugins 5. Global Tools Configuration. 6. Credentials 7. Users 8. Slave Nodes 9. Configuring CI pipeline using Gitlab. 10. Configuring standalone CICD pipeline using. 11. Automating the CICD pipeline. 12. Jenkins log 13. Introduction to Jenkins file. 14. Basic groovy syntax & file formation. 15. Launching a Pipeline using Jenkins file. 3. DevOps Architecture Description of above DevOps plan. Create Maven based source code in Gitlab. Create a Jenkins job which will execute below stages. Checkout code from Gitlab Build/compile the source code using Maven as a build tool. scan the code virtually. Test...

Docker In Details

  Course Contents:- 1. Overview of Docker 2. Difference between Virtualization & Containerization 3. Installation & Configuration of Docker Runtime on Linux & Windows 4. Practice on Docker commands 5. launch a Webserver in a container 6. Launch public & official images of application like Jenkins, Nginx, DB etc.. 7. Launch a base OS Container 8. How to save changes inside the container & create a fresh image(commit) 9. How to ship image & container from one hardware to another. 10. How to remove stop/rm multiple container/images 11. Docker Registry 12. Docker Networking       Check current docker network                  Docker Network Bridge                     Docker Network Weaving                  Launch our own Docker Cluster with our defined Network             ...

Ansible

  Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks. Platform support Control machines have to be a Linux/Unix host (for example SUSE Linux Enterprise, Red Hat Enterprise Linux, Debian, CentOS, macOS, BSD, Ubuntu, and Python 2.7 or 3.5 is required. Managed nodes, if they are Unix-like, must have Python 2.4 or later. For managed nodes with Python 2.5 or earlier, the python-simplejson package is also required. Since version 1.7, Ansible can also manage Windows nodes. In this case, native PowerShell remoting supported by the WS-Managemen...

Basic Linux Commands

  Linux Command Cheat Sheet Hello All, Below are the most common commands used in a day to day life of  linux user. if you are new to linux i will recommend you to go through all of the commands.  this commands will help you to troubleshoot linux issues.   Command Description ls Lists all files and directories from present working directory ls-R Lists files in sub-directories ls-a to list down hidden files. ls-al Lists files and directories with complete details like permissions, size, owner cd or cd ~ To go back to home directory cd .. Move one level up cd To change to a particular directory cd / Move to the root directory cat > filename Creates a new file cat filename Displays the content of a file cat file...

Kubernetes-Update

                                                    https://kubernetes.io/ Kubernetes (K8s)  is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon  15 years of experience of running production workloads at Google , combined with best-of-breed ideas and practices from the community. Latest Verion:-  1.19 Kubernetes Objects Kubernetes defines a set of building blocks ("primitives"), which collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by int...