第5 章 namenode-文件系统目录树

分享未结  0  808

李延 LV6 2022-03-07

悬赏：20积分

# 0. nameNode 说明

nameNode 管理这整个hdfs的目录树，包括文件所有目录，以及其文件信息。我们本章主要解析它目录树的存储结构

# 1. INode

INode主要分为存储文件的INodeFile对象与存储目录的INodeDirectory对象。而它们都有抽象类INodeWithAdditionalFields与INode。下面我们一次介绍

## 1.1 INode

文件系统的顶级父类，他只定义了一个属性，父节点，其他都是抽象接口：

```java
private INode parent = null;

INode(INode parent) {
   this.parent = parent;
}
```

## 1.2 INodeWithAdditionalFields

在INode中定义的信息基本都在这里实现了，它们有：

- id
- name 文件名称，使用字节数组存储
- modificationTime
- accessTime
- permission

其中需要说明的是permission。它是long类型，用来描述 权限、用户、用户组 信息。其中：

- 前16位表示权限，就是我们linux中的权限777
- 中间25个表示用户
- 最后23个表示用户组

在代码中有PermissionStatusFormat 枚举类我们可以看到

```java
enum PermissionStatusFormat implements LongBitFormat.Enum {
  MODE(null, 16),
  GROUP(MODE.BITS, 24),
  USER(GROUP.BITS, 24);
```

而它的获取方法就是进行位运算：

```java
static String getUser(long permission) {
  //获取permission的中间25位
  final int n = (int)USER.BITS.retrieve(permission);
  String s = SerialNumberManager.USER.getString(n);
  assert s != null;
  return s;
}

static String getGroup(long permission) {
  //获取permission的最后23位
  final int n = (int)GROUP.BITS.retrieve(permission);
  return SerialNumberManager.GROUP.getString(n);
}

static short getMode(long permission) {
  //获取permission的前16位
  return (short)MODE.BITS.retrieve(permission);
}
```

具体位运算在LongBitFormat 类中，不详细说明

## 1.3 INodeDirectory

文件夹对象，保存了所有文件夹信息在父类基础上添加了children 字段，存储子节点

```java
private List<INode> children = null;
```

同时提供相应的方法

查询子节点

```java
//遍历数组，查询子节点所在的下标
int searchChildren(byte[] name) {
  return children == null? -1: Collections.binarySearch(children, name);
}
```

添加子节点

```java
public boolean addChild(INode node) {
	//先查询子节点，如果存在，则不添加
  final int low = searchChildren(node.getLocalNameBytes());
  if (low >= 0) {
    return false;
  }
  //添加
  addChild(node, low);
  return true;
}
  
```

其他不一一列举，就是对数组的操作。

## 1.4 INodeFile

文件对象，主要包括两个属性

```java
//文件头
private long header = 0L;
// 数据快
private BlockInfo[] blocks;
```

header 与父类的permission 相似，用long表示多种形象。包括：

前4个比特用于保存存储策略，中间12个比特用于保存文件备份系数，后48个比特用于保存数据块大小。

而 BlockInfo 是保存的文件数据快信息，表示都是哪些dataNode保存着当前文件内容。我们在之后章节说明。

# 2. FsDirectory

上面我们介绍了hdfs 文件与目录都是以何种对象在内存中存储的，而FsDirectory则是将它们组合成目录树的结构，并提供这颗目录的基本操作方法。

这里主要有两个成员变量

```java
//存储者根目录，就是name为 “”的INodeDirectory 对象，也是整颗目录树的根节点
INodeDirectory rootDir;
// map结构， k 为 INodeId， v 为 INode 对象，每一个INode对象也同步在这个map中维护
private final INodeMap inodeMap; // Synchronized by dirLock
```

其中 getINode 如下：

```java
  @VisibleForTesting // should be removed after a lot of tests are updated
  public INode getINode(String src) throws UnresolvedLinkException,
      AccessControlException, ParentNotDirectoryException {
    return getINode(src, DirOp.READ);
  }
  
    public INode getINode(String src, DirOp dirOp) throws UnresolvedLinkException,
      AccessControlException, ParentNotDirectoryException {
    return getINodesInPath(src, dirOp).getLastINode();
  }
  
    public INodesInPath getINodesInPath(String src, DirOp dirOp)
      throws UnresolvedLinkException, AccessControlException,
      ParentNotDirectoryException {
        
    //INode.getPathComponents(src) 方法将路径以 /分割
    // 同时将String 转换为byte[] 
    return getINodesInPath(INode.getPathComponents(src), dirOp);
  }
  static INodesInPath resolve(final INodeDirectory startingDir,
      final byte[][] components) {
    return resolve(startingDir, components, false);
  }
static INodesInPath resolve(final INodeDirectory startingDir,
      byte[][] components, final boolean isRaw) {
    Preconditions.checkArgument(startingDir.compareTo(components[0]) == 0);

INode curNode = startingDir;
    int count = 0;
    int inodeNum = 0;
    INode[] inodes = new INode[components.length];
    boolean isSnapshot = false;
    int snapshotId = CURRENT_STATE_ID;

while (count < components.length && curNode != null) {
      final boolean lastComp = (count == components.length - 1);
      inodes[inodeNum++] = curNode;
      final boolean isRef = curNode.isReference();
      final boolean isDir = curNode.isDirectory();
      final INodeDirectory dir = isDir? curNode.asDirectory(): null;
      if (!isRef && isDir && dir.isWithSnapshot()) {
        //if the path is a non-snapshot path, update the latest snapshot.
        if (!isSnapshot && shouldUpdateLatestId(
            dir.getDirectoryWithSnapshotFeature().getLastSnapshotId(),
            snapshotId)) {
          snapshotId = dir.getDirectoryWithSnapshotFeature().getLastSnapshotId();
        }
      } else if (isRef && isDir && !lastComp) {
        // If the curNode is a reference node, need to check its dstSnapshot:
        // 1. if the existing snapshot is no later than the dstSnapshot (which
        // is the latest snapshot in dst before the rename), the changes 
        // should be recorded in previous snapshots (belonging to src).
        // 2. however, if the ref node is already the last component, we still 
        // need to know the latest snapshot among the ref node's ancestors, 
        // in case of processing a deletion operation. Thus we do not overwrite
        // the latest snapshot if lastComp is true. In case of the operation is
        // a modification operation, we do a similar check in corresponding 
        // recordModification method.
        if (!isSnapshot) {
          int dstSnapshotId = curNode.asReference().getDstSnapshotId();
          if (snapshotId == CURRENT_STATE_ID || // no snapshot in dst tree of rename
              (dstSnapshotId != CURRENT_STATE_ID &&
               dstSnapshotId >= snapshotId)) { // the above scenario
            int lastSnapshot = CURRENT_STATE_ID;
            DirectoryWithSnapshotFeature sf;
            if (curNode.isDirectory() && 
                (sf = curNode.asDirectory().getDirectoryWithSnapshotFeature()) != null) {
              lastSnapshot = sf.getLastSnapshotId();
            }
            snapshotId = lastSnapshot;
          }
        }
      }
      if (lastComp || !isDir) {
        break;
      }

final byte[] childName = components[++count];
      // check if the next byte[] in components is for ".snapshot"
      if (isDotSnapshotDir(childName) && dir.isSnapshottable()) {
        isSnapshot = true;
        // check if ".snapshot" is the last element of components
        if (count == components.length - 1) {
          break;
        }
        // Resolve snapshot root
        final Snapshot s = dir.getSnapshot(components[count + 1]);
        if (s == null) {
          curNode = null; // snapshot not found
        } else {
          curNode = s.getRoot();
          snapshotId = s.getId();
        }
        // combine .snapshot & name into 1 component element to ensure
        // 1-to-1 correspondence between components and inodes arrays is
        // preserved so a path can be reconstructed.
        byte[][] componentsCopy =
            Arrays.copyOf(components, components.length - 1);
        componentsCopy[count] = DFSUtil.string2Bytes(
            DFSUtil.byteArray2PathString(components, count, 2));
        // shift the remaining components after snapshot name
        int start = count + 2;
        System.arraycopy(components, start, componentsCopy, count + 1,
            components.length - start);
        components = componentsCopy;
        // reduce the inodes array to compensate for reduction in components
        inodes = Arrays.copyOf(inodes, components.length);
      } else {
        // normal case, and also for resolving file/dir under snapshot root
        curNode = dir.getChild(childName,
            isSnapshot ? snapshotId : CURRENT_STATE_ID);
      }
    }
    return new INodesInPath(inodes, components, isRaw, isSnapshot, snapshotId);
  }
```

我们看到对于目录的查询，就是从根节点，一层一层的变量整个目录树。

# 3 后续

目前我们只是简单解析了对于目录树的存储结构，还有其他内容包括：

- 快照。
- 其他对于目录树的操作。

这些内容，我们在对应客户端目录执行时说明。

回帖

消灭零回复