We are switching over to a Linux image this month and the image used is Ali Hadi's Linux image from a HDFS cluster.
This week's question:
Had-A-Loop Around the Block
What is the original filename for block 1073741825?
Well well, once again I was stumped at the start of the question. Where do we start looking? There were a total of three sets of E01 images provided and unsure of what to expect, I loaded each of them up in FTK Imager and satisfied myself that all three images were of Linux systems. (They almost looked like clones of each other too!) But with three system images, where do I start looking? Taking a hint from the image names, I decided to research on what is HDFS.
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. From the linked HDFS architecture guide, HDFS has a master/slave architecture with a single NameNode (master server) which manages the file system namespace, together with a number of DataNodes that manage the storage. Notably, I noted the following regarding the persistence of file system metadata on a HDFS cluster:
The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system.
Could this FsImage file contain the secrets we are looking for? First we had to try and locate the file on our NameNode, so I mounted the HDFS-Master.E01
image at /mnt/hdfs to commence the search. Note also that this image appeared to have a dirty log and required the "norecovery
" option to be mounted.
First, I tried searching for the FsImage file, as well as the EditLog. A case insensitive regex search was used for find
command as my initial searches did not turn up anything, and the output was piped to grep to filter out the Hadoop HTML documentation files.
# find /mnt/hdfs -iregex ".*FsImage.*" -print | grep -v ".html"
Ignoring the .md5 files and those in the /tmp/
directory for the time being, I focused my search on the three fsimage files found in the /usr/local/hadoop
and /opt/hadoop
directories and peeked at their contents.
It is quickly apparently that some help is needed to decode the contents of the file and I thankfully chanced upon this answer by Jing Wang on Stack Overflow that pointed me to the HDFS Offline Image Viewer utility. I downloaded and unpacked the Hadoop 2.x release and queried the fsimage files. (Note that HDFS utilities requires JAVA_HOME
variable to be configured.)
# /opt/hadoop/bin/hdfs oiv -p XML -i /mnt/hdfs/opt/hadoop/hadoop/dfs/name/current/fsimage_0000000000000000000 -o fsimage_00.xml # /opt/hadoop/bin/hdfs oiv -p XML -i /mnt/hdfs/usr/local/hadoop/hadoop2_data/hdfs/namenode/current/fsimage_0000000000000000024 -o fsimage_24.xml # /opt/hadoop/bin/hdfs oiv -p XML -i /mnt/hdfs/usr/local/hadoop/hadoop2_data/hdfs/namenode/current/fsimage_0000000000000000026 -o fsimage_26.xml
Looking through the resultant XML files, I found the name of the file occupying block 1073741825 present in both fsimage_0000000000000000024
and fsimage_0000000000000000026
.
Answer: AptSource
Update 24 Nov 2020: From the answer reveal by Magnet, it appears that the HDFS EditLog was named edits_*
and they can be parsed by Hadoop's oev
tool. No wonder I couldn't find them previously.
No comments:
Post a Comment