NEO Home NEO

    NEO

    NEO is a distributed, redundant and transactional storage designed to be an alternative to ZEO and FileStorage. It is developed and maintained by Nexedi and is being used in several applications maintained by Nexedi as well as Nexedi internal production instance including all websites.

    Features

    NEO implements ZODB's Storage interface, and supports the following standard extensions:

    • Revisions (optional in BaseStorage) with MVCC support.
    • Undo (optional in BaseStorage).
    • Pack (both pruning old object revisions and orphan objects).
    • Conflict resolution.
    • Iterator (allows exporting storage data).

    Note: There is no plan to support "version" extension in NEO, because of its pending-deprecation state.

    NEO adds the following features:

    • No central lock: no more storage-cluster-wide commit lock.
    • Increased scalability: load balancing over multiple machines.
    • Fault tolerance: data replication over multiple machines.

    Releases

    • April 28, 2019 - Version 1.12

    See Changelog for a complete and detailed list of releases.

    Latest News

    Why use NEO?

    Architecture and Characteristics

    With growing data volumes and lessons learned from managing a petabyte in hindsight, below you can find additional chracteristics of NEO:

    • Thick client, thin server
      NEO: designed this way.
    • Client-side compression
      NEO: implemented.
    • Load balancing
      NEO: static balancing is done, dynamic is not.
    • Local disc cache, data geographical proximity to user
      NEO: not implemented, for now NEO isn't expected to fit long-distance distribution, ram caching is done though (both at NEO and ZODB levels), NEO level being implemented with a special intermediate caching algorithm.
    • Fine-grained locks are better
      NEO: designed this way, and optimistic transaction consistency used in ZODB also helps.
    • Data flushed to disk only during commit
      NEO: designed this way.
    • No need for central index updates for object changes
      NEO: designed this way.
    • Large updates & update pooling
      NEO: This is not expected to happen at NEO level, but at application level (ex: Zope).
    • Data duplication
      NEO: designed this way.
    • Sequential storage support (eg. tapes) the same way as random access storage (eq. disks)
      NEO: not implemented, for now NEO requires all data (historical and current) to be accessible, to fit the needs of ZODB. It is unsure if this can be implemented at all, and heavily depends on application level behaviour (ex: Zope).
    • Deferral system
      NEO: not implemented, NEO focuses on interactive use (short transactions) rather than heavy data processing (long transactions) for the moment, so such feature is not on top of priority list.

    Finally, it seems that the biggest difference between described systems and NEO/ZODB sits around the meaning of "transaction" and expected application behavior inside a transaction: NEO provides the same level of isolation as ZODB does, which is (supposed to be) PL-2+, as per Atul Adya's thesis denomination (see below), which looks stricter than transaction isolation (shortly) described here.

    Reliability

    The reliability of a data storage - such as NEO - is critical. To ensure the quality of NEO design, its protocol is in the process of being formally proven through model-checking. To ensure code quality, NEO project relies on automated testing:

    • unit test checking individual method behaviour
    • functional tests checking node and cluster behaviour
    • standard ZODB test suites

    Getting Started

    Source Code

    You can get the source code in the following Git repository: https://lab.nexedi.com/nexedi/neoppod.git (Github mirror) or browse it online.

    It is also published on PyPI.

    Requirements

    The following software is required:

    • Linux 2.6 or later.
      Note: the actual requirement is on epoll, integrated in 2.5.44. There are plans to add support for other platforms, but it is not implemented yet.
    • Python 2.7 (2.7.9 or later for SSL support).
    • MySQLdb
      Note: MySQL server is currently used as a backend for NEO, with InnoDB, RocksDB or TokuDB storage engine. This was chosen as an early approach to take advantage of existing features (transactional persistent storage, basically), and will be replaced with leaner key/value storage later.
    • ZODB >= 3.10.x (Zope 2.13 or later)

    Documentation

    Old (contains obsolete information):

    Scientific Publications

    Tips and Tricks

    Tests

    Automated test results are published on stack.nexedi.com.

    FAQ

    Q: How does NEO scale compared to ZODB?

    A: For "normal" database use (1+TB) NEO is running very stable. Non-scalability topics being worked on include pruning of old data being too slow and reshaping of a cluster (NEO moving data when cluster changes).

    We did a number of scalability tests going up to 150TB to find bottlenecks. Issues found and being investigated:

    • replication too slow (20TB database, one storage becomes OUT_OF_DATE, syncing this non-synced storage takes too long for MariaDB's RockDB)
    • read/write (upload 1.5TB/day through fluentd) was slower than disk/ethernet speed. Being investigated.

     

    The size of one NEO server currently used and connected to 60 zopes is 83724 GB (stored in ERP5 data stream module on top of NEO, smaller disk consumption due to compression). We want to gradually increase this 80 TB stored in ZODB to 1 PB and ran one test storing an array of 4.9 million rows using wendelin.core persistent numpy array in ZODB without consuming too much memory. Doing something similar in ZODB and FileStorage would be difficult in terms of disk space and performance.

    Licence

    NEO is Free Software, licensed under the terms of the GNU GPL v3 (or later). For rationale, please see Nexedi licensing.

    Example Projects

    Projects directly related to NEO, but not actually touching its core. They might or might not be strictly dependant on NEO.

    FUSE module

    Idea: write a FUSE wrapper using NEO as a storage back-end, instead of a hard disk partition, or cd, etc.

    Goal: stress-testing with filesystem benchmark suites.

    Progress: Started in may 2010, stalled since then. Code to be cleaned up and published when I (Vincent Pelletier) find time. Unstable.

    API: ZODB

    Memcached implementation

    Idea: write a memcached server using NEO as a storage back-end, instead of ram (original memcached) or other back-ends (kumofs, etc).

    Goal: benchmark with memcached-oriented tools.

    Progress: Not started, assigned.

    API: Storage preferred, otherwise ZODB

    Similar Projects

    To the best of our knowledge, there is no other Storage interface implementation offering both scalability and fault tolerance the way NEO does:

    • FileStorage - Single-file storage
    • RelStorage - Relational database storage
    • DirectoryStorage - Multi-file storage
    • Zeo - Networked, multi-storage RPC.
    • ZeoRaid - Fault-tolerant clustering of Zeo servers
    • Ceph - Although not an object database, its design is very close to NEO

    Related Projects

    Some interesting pages on topics related to NEO, but not written for/about NEO:

    Project

    History

    NEO project was initiated in 2005 by Nexedi, a French company developing ERP5 - a Free Software ERP for small to large enterprises implemented on top of Zope - since 2001. NEO was then endorsed in 2009 by System@tic competitive cluster, by Paris Region and by FEDER programme of the European Union.

    Members

    Logo Nexedi
    Logo Pilotsystems
    Logo Paris 6
    Logo Paris Nord
    Logo Paris Tech

     

    Logo Unversity Dakar