The Google File System

wangfans · 发表于 2013-6-22 12:58

On the other hand, a large chunks ize, even with lazy space
allocation, has its disadvantages. A small file consists of a
small number of chunks, perhaps just one. The chunkservers
storing those chunks may become hot spots if many clients
are accessing the same file. In practice, hot spots have not
been a major issue because our applications mostly read
large multi-chunkfil es sequentially.

wangfans · 发表于 2013-6-24 15:48

However, hot spots did develop when GFS was first used
by a batch-queue system: an executable was written to GFS
as a single-chunkfil e and then started on hundreds of machines
at the same time. The few chunkservers storing this
executable were overloaded by hundreds of simultaneous requests.

wangfans · 发表于 2013-6-24 15:48

We fixed this problem by storing such executables
with a higher replication factor and by making the batchqueue
system stagger application start times. A potential
long-term solution is to allow clients to read data from other
clients in such situations.

wangfans · 发表于 2013-6-24 15:48

2.6 Metadata
The master stores three major types of metadata: the file
and chunkna mespaces, the mapping from files to chunks,
and the locations of each chunk’s replicas. All metadata is
kept in the master’s memory. The first two types (namespaces
and file-to-chunkma pping) are also kept persistent by
logging mutations to an operation log stored on the master’s
local diskan d replicated on remote machines.

wangfans · 发表于 2013-6-24 15:48

Using
a log allows us to update the master state simply, reliably,
and without risking inconsistencies in the event of a master
crash. The master does not store chunklo cation information
persistently. Instead, it asks each chunkserver about its
chunks at master startup and whenever a chunkserver joins
the cluster.

wangfans · 发表于 2013-6-24 15:49

2.6.1 In-Memory Data Structures
Since metadata is stored in memory, master operations are
fast. Furthermore, it is easy and efficient for the master to
periodically scan through its entire state in the background.
This periodic scanning is used to implement chunkg arbage
collection, re-replication in the presence of chunkserver failures,
and chunkm igration to balance load and disk space

wangfans · 发表于 2013-6-26 17:03

usage across chunkservers. Sections 4.3 and 4.4 will discuss
these activities further.
One potential concern for this memory-only approach is
that the number of chunks and hence the capacity of the
whole system is limited by how much memory the master
has. This is not a serious limitation in practice. The master
maintains less than 64 bytes of metadata for each 64 MB
chunk. Most chunks are full because most files contain many
chunks, only the last of which may be partially filled. Similarly,
the file namespace data typically requires less then
64 bytes per file because it stores file names compactly using
prefix compression.

wangfans · 发表于 2013-6-26 17:03

If necessary to support even larger file systems, the cost
of adding extra memory to the master is a small price to pay
for the simplicity, reliability, performance, and flexibility we
gain by storing the metadata in memory.

wangfans · 发表于 2013-6-26 17:04

2.6.2 Chunk Locations
The master does not keep a persistent record of which
chunkservers have a replica of a given chunk. It simply polls
chunkservers for that information at startup. The master
can keep itself up-to-date thereafter because it controls all
chunkpl acement and monitors chunkserver status with regular
HeartBeat messages.

wangfans · 发表于 2013-6-26 17:04

We initially attempted to keep chunk location information
persistently at the master, but we decided that it was much
simpler to request the data from chunkservers at startup,
and periodically thereafter. This eliminated the problem of
keeping the master and chunkservers in sync as chunkservers
join and leave the cluster, change names, fail, restart, and
so on. In a cluster with hundreds of servers, these events
happen all too often.