Distributed Transactional NoSQL for the Cloud


Similar Projects

To the best of our knowledge, there is no other Storage interface implementation offering both scalability and fault tolerance the way NEO does:

  • FileStorage Single-file storage
  • RelStorage Relational database storage
  • DirectoryStorage Multi-file storage
  • Zeo Networked, multi-storage RPC.
  • ZeoRaid Fault-tolerant clustering of Zeo servers
  • Ceph Although not an object database, its design is very close to NEO

Pages on related topics

Some interesting pages on topics related to NEO, but not written for/about NEO:

  • Wikipedia page on object databases
  • xrootd/Scalla, a petascaled object database system
  • (Brewer's) CAP Theorem (wikipedia), and a very-well written article on what CAP doesn't cover by Daniel Abadi
  • Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions by Atul Adya
  • The Multi-Queue Replacement Algorithm for Second Level Buffer Caches by Yuanyuan Zhou and James F. Philbin, used for client nodes ZEO-level caching
  • The C10K Problem, by Dan Kegel
  • Lessons Learned From Managing A Petabyte
    Some notes from this paper, with NEO in mind:
    • Thick client, thin server
      status in NEO: designed this way
    • Client-side compression
      status in NEO: implemented
    • Limiting file descriptor count
      status in NEO: implemented on client, requires consideration on master/storage
    • Load balancing
      status in NEO: static balancing is done, dynamic is not
    • Local disc cache, data geographical proximity to user
      status in NEO: not implemented, for now NEO isn't expected to fit long-distance distribution
      ram caching is done, though (both at NEO and ZODB levels), NEO level being implemented with a special
      intermediate caching algorythm
    • Fine-grained locks are better
      status in NEO: designed this way, and
      optimistic transaction consistency used in ZODB also helps
    • Data flushed to disk only during commit
      status in NEO: designed this way
    • No need for central index updates for object changes
      status in NEO: designed this way
    • Large updates & update pooling
      This is not expected to happen at NEO level, but at application level (ex: Zope)
    • Data duplication
      status in NEO: designed this way
    • Sequential storage support (eg. tapes) the same way as random access storage (eq. disks)
      status in NEO: not implemented, for now NEO requires all data (historical and current) to be accessible, to fit the needs of ZODB. It is unsure if this can be implemented at all, and heavily depends on application level behaviour (ex: Zope)
    • Deferral system
      status in NEO: not implemented, NEO focuses on interactive use (short transactions) rather than heavy data processing (long transactions) for the moment, so such feature is not on top of priority list
    Finally, it seems that the biggest difference between described systems and NEO/ZODB sits around the meaning of "transaction" and expected application behavior inside a transaction: NEO provides the same level of isolation as ZODB does, which is (supposed to be) PL-2+, as per Atul Adya's thesis denomination (see below), which looks stricter than transaction isolation (shortly) described here.