Welcome!
Create Spark+jupyter Application on BlueData App Workbench
Hello. Today we will be teaching you how to create an image on the Blue Data EPIC platform. We will be creating an image consisting of Spark and Jupyter using the Blue Data EPIC Application Workbench on a CentOS base image container.
Prerequisites:
-Basic knowledge on containers
-Linux administration
-Git
-Spark, Hadoop and Jupyter
This scenario is developed by:
Congratulations!
You've completed the scenario!
Scenario Rating
You have successfully finished the Spark with Jupyter Scenario in BlueData App Workbench environment.!!
Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands
cat scrapbook_bluedata_spark-with-jupyter_scenario2_container.tar | docker load
docker run -it /bluedata_spark-with-jupyter_scenario2:
Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

Steps
Create Spark+jupyter Application on BlueData App Workbench
Step 1 - Preparing the Environment
For creating the image in Bluedata EPIC platform, you need to install Bluedata EPIC App workbench.
We have already installed BlueData Epic Work Bench, for more information, please click on this link.here
For checking the version of BlueData App-Workbench, execute the following command.
bdwb --version
Step 2 - Getting Started.
To begin the application development, we will first need to create a directory called “Spark” or any directory name of your choice. This directory will house all the files and components necessary to create the application image.
To create a directory, execute the following command:
mkdir ~/Spark
Navigate to the newly created directory:
cd ~/Spark
Next, we need to create a skeleton file structure. To do so execute the following command:
bdwb --init
bdwb --init command will setup all the base directories and files you need in order to develop any app image
Execute the below command to list all the files and folders created by bdwb command:
ls -R
Make sure you will see the following files and folders:
Now, we will create a folder called “centos” inside the empty image directory.
Execute the following command to do so :
mkdir ~/Spark/image/centos
Step 3 - Create the Dockerfile.
The next step is to create a Dockerfile.
"A docker file is a text file that the Docker engine understands to automatically build an image by reading the file. The Dockerfile consists of all the commands a user would call to assemble the desired image."
Let’s go ahead and create a Dockerfile inside the newly created centos folder.
For your reference, we have already created a ready Dockerfile in the ~/test directory. We will copy that file into the centos folder using the following command:
copy that file into the centos folder using the following command:
cp ~/test/Dockerfile ~/Spark/image/centos
Execute the following command to view the content of the Dockerfile:
cat ~/Spark/image/centos/Dockerfile
FROM bluedata/centos7:latest
You have ability to create your own base image.
e.g: FROM ubuntu:12.04, ubuntu:12.04 is the base image used here.
Step 4 - Creating spark,jupyter, and java configuration files.
Setup additional dependencies which are needed for Spark.
In this step, we will be showing you the additional steps needed to create a successful Spark image.
We have already referenced the files that are needed, all you would need to do is copy them into the appropriate location.
The dependencies that need to be configured is Jupyter, which we will provide a shell script , and configuring the correct version of Java for Spark.
It is always good to check what version of Spark you are using to understand the dependencies you may need setup in your base image.
Task1:
Copy the pre-made jupyter script to the centos folder:
cp ~/test/configure_jupyter.sh ~/Spark/image/centos
Feel free to run the following command to see what is being executed in the shell script:
cat ~/Spark/image/centos/configure_jupyter.sh
You will see the steps required to setup jupyter listed.
Next, copy over the java script:.
cp ~/test/configure_java8.sh ~/Spark/image/centos
Task2:
We need to add additional configuration files under the appconfig directory. We have already made these files for you, to add them in, please execute the following commands:
Remove the appconfig folder from the Spark folder
rm -rf appconfig
yum install wget -y
Add the appconfig reference files using the below command(Due to space constraint in Katacoda we have uploaded the required appconfig files into dropbox)
wget https://www.dropbox.com/s/wbnr83q26przbs6/appconfig.zip
yum install unzip -y
Unzip the file
unzip appconfig.zip
Check the files under appconfig directory
ls appconfig
Make sure you see following files unders appconfig folder:
Remove the zip file from the folder
rm -rf appconfig.zip
Task3:
When our image is ready to deploy in the EPIC Application Catalog, we need to include a picture that represents the image. For your reference, we have already created a .png file for your use.
cp ~/test/Logo_Spark.png ~/Spark
Logo.png file includes a logo file (400px x 200px .png) to visually identify each application in the App Store
Step 5 - Building the Bin File using BlueData App Workbench.
In this step we will be creating the bin using BlueData App Workbench by executing the following commands.
.wb file contains a series of App Workbench commands that control the creation of the Catalog Image
Please review the link before proceeding for wb file: Link
Task 1:
For your reference, we have already created a ready .wb file in the ~/test directory. We will copy that file into the Spark directory using the following command:
cp ~/test/Spark.wb ~/Spark
To check the files in Spark folder
ls
Task 2:
To view the content of the file, execute the following
cat ~/Spark/Spark.wb
Task 3:
Execute the .wb file
./Spark.wb
After executing the above command wait for sometime until bin gets created!!
Step 6 - Finalising the build for Spark with Jupyter.
Checking the Bin File For Spark.
Let's see what you have built.
The newly built application package (or bundle) is saved in the deliverables directory.
cd deliverables
ls
Make it an executable using the below command
chmod +x bdcatalog-centos7-bluedata-spark221-1.0.bin
To make the new image appear in App Store Copy the bin file to /srv/bluedata/catalog
Go to App Store in the EPIC GUI and click on Refresh button to bring the image in to App Store.
Once the image appears on the App Store click on Install button to install the image.