Guide to Hadoop HDFS data access

February 3, 2021

This article is part of Cerexio’s Enterprise Big Data Management. It discusses basic Hadoop commands to deal with existing Hadoop clusters in terms of copying files, moving files or creating directory etc. Hadoop is a big umbrella term that comprises a lot of tools and technologies to power up big data management to fit the decision-making processes. All these tools use HDFS known as Hadoop Operating System to get the data and process it. Assume that you are already familiar with the Hadoop environment, and related Hadoop ecosystem.

Hadoop authentication is enabled by admin to access the HDFS needs, and this can be done using Kerberos or standard computer user name-based authentication. This article explains how to allow a normal computer name-based authentication.

There are two ways we can create a user director within HDFS, firstly by creating through a direct logging into the namenode and doing the following operation. Secondly, for better ease, users have the option of equipping the Cerexio’s DataXio environment to create users interactively. Assume that you have your Hadoop environment setup like following, where we have

				
					master.local - Master node has all the master components running
(NameNode, ResourceManager, SparkServer, HiveMetaServer etc), node1.local
and node2.local installed with DataNode, NodeManagers etc.

Let’s create a user directory by following the easy commands given below. First login to master.local server, and now switch to hdfs user to execute hdfs commands.

Now the user ‘srijeyanthan’ is created with the permission to read/write/execute. Our next step is to perform a hdfs file system operation from your own computer without logging into the Hadoop Cluster.

In order to access Hadoop HDFS from srieyanthan@cerexio, there are two important cluster configurations needed, namely, hdfs-site.xml and core-site.xml. We can copy this directly from your Hadoop Cluster NameNode, or use the DataXio environment to download.

Now, we will start the process of accessing the HDFS system from your local computer. First download Hadoop binary compatible with your Hadoop Cluster installed.

				
					1.Download it from the following URL and save it in /home/srijeyanthan/hadoop https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz

2.Extract all the files created in /home/srijeyanthan/hadoop directory 

3.Open the ~/.bashrc and add HADOOP_HOME as following HADOOP_HOME=/home/srijeyanthan/Downloads/hadoop-3.1.4 and add PATH variable as export PATH=$PATH:$HADOOP_HOME/bin

4.Now we are set with all the configuration, and ready to access the HDFS environment. Normally, when we execute hdfs command from the terminal, Hadoop binary translate those commands using cluster core-site.xml and hdfs-site.xml as follows

We will start our first hdfs command ls to list down all the files and directory under the given path illustrated in the image below.

Here are some useful hdfs commands that we can use to interact with hdfs file system

				
					hdfs dfs -put
hdfs dfs -copyFromLocal
hdfs dfs -get
hdfs dfs -copyToLocal
hdfs dfs –cat /path_to_file_in_hdfs
hdfs dfs -mv
 
hdfs dfs -cp
hdfs dfs -moveFromLocal
hdfs dfs -tail [-f]
hdfs dfs –rm

You can now conveniently play around your Hadoop cluster environment and please leave a comment stating any issues faced during file operations.

Address

Email

Phone

Advanced Manufacturing

Manufacturing Execution System (MES)

Electronic Batch Records System (EBR)

Track and Trace System

IQ and OQ System

Machine Health Monitoring System

Service Request Management System

Manufacturing Control Tower

Self-Service Analytics

Production Order Management System

Production Yield Optimiser System

IoT Integration System

Advanced Warehouse Management System

Air Management System

Energy Management Platform

PLC & Instrumentation Setup

Data Science

Enterprise Integrator

Data Analytics

Smart Infrastructure Assets

Digital Twin System

Total Investment System

Predictive Maintenance

Manufacturing Automation Platform

Infrastructure and Risk Management

Enterprise Data Engineering Platform

Data Hub Multi-Cloud Platform

Axon Message Broker Platform

5G

AI

AR & VR

IoT & IIoT

Digital Twin

Hybrid Cloud

Cloud Infrastructure & Operations

High-Frequency Data

Microservices

Event-Driven Architecture

Middleware

Multi-protocols & APIs

Big Data

Hadoop Ecosystem

Streaming Data

Predictive & Prescriptive Analytics

Smart Manufacturing

Supply Chain

AGVs

Smart Warehousing

Digital Factory

Advanced Manufacturing Transformation

Big Data Management Consultancy

Infrastructure Assets Performance Consultancy

About

Blog

News & Events

Guide to Hadoop HDFS data access

Search Blog Posts

Latest Blog Posts

Inventory Management Guide for Manufacturers

6 Warehouse Management System Examples (WMS)

A Guide to Electronic Batch Record Systems (EBR)

8 Types of Warehousing Processes

The Importance of Real-Time Manufacturing Data Collection

What is Discrete Manufacturing?

Cerexio Pte Ltd

Cerexio Pty Ltd