Welcome!
Creating a CDH application image
Hello. Today we will be teaching you how to create an image on the Blue Data EPIC platform. In this scenario you will learn how to Create CDH image using the Blue Data EPIC Application Workbench on a CentOS base image container.
Prerequisites:
-Basic knowledge on containers
-Linux administration
-Git
-Hadoop and CDH
This scenario is developed by:
Congratulations!
You've completed the scenario!
Scenario Rating
You have successfully completed the scenario for creating CDH image!!
Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands
cat scrapbook_bluedata_cdh_scenario_container.tar | docker load
docker run -it /bluedata_cdh_scenario:
Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

Steps
Creating a CDH application image
Step 1 - Creating the base directory
For creating the image in the Bluedata EPIC platform, you need to install BlueData EPIC App workbench.
We have already installed BlueData Epic App WorkBench. For more information, please click here
For checking the version of BlueData App-Workbench, execute the following command.
bdwb --version
Now, to begin let us create the directory to house the files
mkdir ~/Source
mkdir ~/Source/CDH
Step 2 - Copying the bin file
Here we will be using a bin file of the existing image already developed. Let us download a .bin file to the directory we have created in the earlier step.
Before that, let us install wget package. For doing so, execute the following command
yum install wget -y
Now, Copy the existing CDH bin file into Source Directory.
cp bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin /root/Source
Now, Remove all folders which are pre-installed to save the space.
rm -rf $(ls | grep -v Source | grep -v test)
Now, navigate to the Source directory.
cd ~/Source
Now, Make the .bin file executable by executing the following command
chmod +x bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin
Step 3 - Extracting the contents of the bin file
In this step we will extract the contents of the .bin file.
For this, execute the following command.
./bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin --payload
This will create two files in the directory. (to view the files execute the following command)
ls
Untar the payload.tar file by executing the following command
tar xvf payload.tar
This will create new directories. (to view the files execute the following command)
ls
Delete the bin file to save space on the device
rm -rf bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin
Navigate to the new directory, and untar the file by executing the following command
cd bdcatalog-centos7-bluedata-cdh514_centos7x-1.7
tar xvf bdcatalog-centos7-bluedata-cdh514_centos7x-1.7-src.tgz
Now navigate to the directory created after the above step
cd bdcatalog-centos7-bluedata-cdh514_centos7x-1.7-src
Copy the content of this directory to the directory you created initially
cp -R * ~/Source/CDH
Step 4 - Configuration files for CDH
Here we are looking in to all the configuration scripts like startscript & default scripts are in the directory named cdh5-cm-setup.
This directory contains all of the necessary files pre-created for standard services, including startscript, and appjob.
You may change the name of this directory if desired.
Execute the below command to list the all Files in cdh5-cm-setup directory
ls cdh5-cm-setup
setup_cluster.py script is responsible for setting up CDH Cluster
add_remove_node.py scripts is responsible for scaling up/down the nodes in CDH cluster
Enable_Kerberos.py script enable Kerberos in CDH cluster
Macros.sh contains all the built in macros of BlueData that would be executed during image creation.
Logging.sh provides the logging facilities for a catalog configuration bundle.
Utils.sh contains utility functions defined which provides information on docker id, cpu share, memory status and fqdn of the current container.
appjob provides the information on the type of job to be launched and we can also add application specific jobs.
The startscript file can be automatically generated by invoking the appropriate AppWorkbench appconfig autogen --generate commands from the .wb file
Let's Look in to Startscript
Startscript contain scripts like cluster creation metadata for CDH image
cluster config choice selections for CDH image
Tenant Level settings for Kerberos
Tenant Level settings for CDH parcels and repo .
cat /root/Source/CDH/cdh5-cm-setup/startscript
Start script will execute in each and every host when all host get created
After creating the hosts cloudera manager and agent gets register in each host Then Mysql database gets deployed in cmserver role and also cloudera database will be created.
After that main setup script gets executed
The main script in files contain code for Cluster creation, Activating licence ,Deploying zookeeper, hdfs and other services
Step 5 - Content of the .wb file
Lets see the content of .wb file
Plaese review this link before you proceed: Link
Move to CDH directory by the using the following command:
cd ~/Source/CDH
Following configuration will be done in the .WB file :
loading json file, generating scripts(loggin.sh, appjob etc..) inside cdh5-cm-setup directory and adding logo to the cdh Image
creating the docker image with dockerfile under “image/centos” and packaging docker image
To view the content of .wb file, execute the following command:
cat ~/Source/CDH/cdh514.wb
cdh514.wb file contain following section
Section 1:
The below line will Set the organization name.
builder organization --name BlueData
The below line will load the Metadata JSON file and set up files
catalog load --filepath cdh514.json
appconfig package --dir cdh5-cm-setup
The below line will add the logo file to the image
logo file --filepath Logo_Cloudera.png
Section 2:
Below section of the .wb file pulls all of the pieces together and builds the .bin file.
image build --basedir image/centos --image-repotag bluedata/cdh514
image package --image-repotag bluedata/cdh514 --os centos7
catalog save --filepath staging/cdh514.json --force
sources package
catalog package
Step 6 - Content the Dockerfile
The next step is to create a Dockerfile.
"A docker file is a text file that the Docker engine understands to automatically build an image by reading the file. The Dockerfile consists of all the commands a user would call to assemble the desired image."
To view the contents of the Dockerfile, you can use vi, vim, or cat out the contents. To view the contents in the terminal console, execute the following command:
cat ~/Source/CDH/image/centos/Dockerfile
You will now see many commands populate your terminal. These are the commands you would use if you were to install your application manually on a host.
The first line of the Dockerfile determines what is the “base” image you will be using to install your application on.
Blue Data provides their own base image, which you can use by simply adding the following command at the top of your Dockerfile:
FROM bluedata/centos7:latest
You have ability to create your own base image.
e.g: FROM ubuntu:12.04, ubuntu:12.04 is the base image used here.
All the commands proceeding the base image, are the commands used to setup the application.
These files / commands will be setup on top of the base image from BlueData and will eventually compile into a .Bin file for use on the EPIC platform.
Step 7 - Content the json file
Now, we will look in our .json file.
To view the content of the file, execute the following
cat ~/Source/CDH/cdh514.json
Json file Contains the application registration and deployment information.
Following configuration will be done in the JSON file :
- In this JSON file we are setting cardinality for different- different role(cmserver, controller, standby, ..etc)
- Exporting and defining endpoint for a service
- Enabling GUI service
- Providing name, description and distro_id for CDH image also we are deploying selected services in a particular role
Json file contain below sections
- Identification
- Components
- Services
- Node Roles
- Configuration
Below is the example snippet for identification:
"distro_id": "bluedata/cdh514_centos7x"
"label": {
"name": ""name": "CDH 5.14.0 with Cloudera manager",
"description": "CDH 5.14.0 with Cloudera manager with YARN support. Includes Pig, Hive, Oozie and HBase"
},
"version": "1.0",
"epic_compatible_versions": ["3.4"],
"categories": [ "Hadoop", "HBase" ],
distro_id is unique identifier for either a Catalog entry or a versioned set of Catalog entries.
The label is a property contains the following parameters:
name which is the "short name" of the Catalog entry. The Catalog API does not allow entries with different distro IDs to share the same name.
description which is a longer, more detailed blurb about the entry.
version is a discriminator between multiple Catalog entries that share the same distro ID.
epic_compatible_versions lists the EPIC platform versions where this Catalog entry may be used.
categories is a list of strings used by the EPIC interface to group Catalog entries during cluster creation.
Note: Changes that needs to be done in json file for upgrading HDP image is: Name and distro_id needs to be updated based on the version of the HDP image to reflect the changes in the App Store for the upgraded image
Below is the example snippet for components:
"image": {
"checksum": "", "source_file": ""
},
"setup_package": {
"config_api_version": 7,
"checksum": "",
"source_file": ""
},
image is a property that identifies the location for the image used to launch virtual nodes for this Catalog entry.
setup_package is similar to the image property except for the configuration scripts package that runs inside the launched virtual node.
Below is the example snippet for services:
"services": [
{
"id": "hbase_master",
"exported_service": "hbase",
"label": {
"name": "HMaster"
},
"endpoint" : {
"url_scheme" : "http",
"port" : "60010",
"path" : "/",
"is_dashboard" : true
}
},
In this example, services is a list of service objects.
The defined services will be referenced by other elements of this JSON file to determine which services are active on which nodes within the cluster.
The example snippet for Node Roles:
"node_roles": [
{
"id": "controller",
"cardinality": "1",
"anti_affinity_group_id": "CM",
"min_cores": "4",
"min_memory": "12288"
},
In this example,node_roles is a list of objects describing roles that may be deployed for this Catalog entry. Each role is a particular configuration instantiated from the entry's virtual node image and configured by the setup scripts
Selected Roles-lists IDs of roles that will be deployed.
Node Services-lists IDs of services that will be present on nodes of a given role, if that role is deployed.
Config Metadata- lists of string key/value pairs that can be referenced by the setup scripts.
Config Choices- lists both the choices available to the UI/API user and the possible selections for each choice.
Step 8 - Creating the image
Building the Bin File.
In this step, we are ready to create our image using the BlueData Application Workbench.
The BIN file is used by the EPIC platform to install the image you created onto the platform for use.
Execute the wb file to build the BIN file. To do so run the following command:
./cdh514.wb
This step will take some time to create the BIN file.
Step 9 - Finalising your build for CDH image
Let's see what you have built.
The newly built application package (or bundle) is saved in the deliverables directory.
cd deliverables
ls
Congratulations you have successfully created CDH image.
To make the new image appear in App Store.
Copy the bin file to /srv/bluedata/catalog using the following commad
cp bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin /srv/bluedata/catalog
Make it an executable using the below command
chmod +x bdcatalog-centos7-bluedata-cdh514_centos7x-1.7.bin
Go to App Store in the EPIC GUI and click on Refresh button to bring the image in to App Store.
Once the image appears on the App Store click on Install button to install the image.