|
A reader can
identify and discard extra padding and record fragments
using the checksums. If it cannot tolerate the occasional
duplicates (e.g., if they would trigger non-idempotent operations),
it can filter them out using unique identifiers in
the records, which are often needed anyway to name corresponding
application entities such as web documents. These
functionalities for record I/O (except duplicate removal) are
in library code shared by our applications and applicable to
other file interface implementations at Google. With that,
the same sequence of records, plus rare duplicates, is always
delivered to the record reader. |
|