serialize — Serialisation Infrastructure

Warning

Internal module — subject to change without notice. Do not import these functions directly. Use the public serialisation methods on NestedDictionary instead (to_json, from_json, to_pickle, from_pickle).

Private serialization infrastructure for ndict_tools.

This module is not part of the public API. It provides the low-level helpers used by the serialization methods on _StackedDict (to_json, from_json, to_pickle, from_pickle).

Contents

  • _encode_key / _decode_key : JSON key encoding via type-tagged string prefix

  • NestedDictionaryEncoder : json.JSONEncoder subclass

  • _make_decoder_hook : factory for object_pairs_hook

  • _pickle_dump / _pickle_load: pickle helpers with SHA-256 verification

Design decisions

  • JSON key encoding (design decision #87): JSON mandates string keys; Python supports arbitrary hashable keys. Non-string keys are encoded as __type__:value tagged strings (e.g., __int__:42, __tuple__:(1, 2)). Decoding uses ast.literal_eval for safe reconstruction of tuple and frozenset values. Known limitation: string keys that already start with a __type__: prefix are indistinguishable from encoded keys.

  • API placement (design decision #94): Serialization methods (to_json, from_json, to_pickle, from_pickle) are defined on _StackedDict and delegate to the private helpers below via lazy imports. This creates an accepted import cycle between tools.py and this module.

Key encoding (DD-021)

ndict_tools.serialize._encode_key(key: Any) str

Encode a _StackedDict key to a JSON-safe string.

JSON mandates string keys. This function maps any supported hashable Python key to a unique, reversible string using a type-tagged prefix of the form __type__:value (e.g., integer 42"__int__:42", tuple (1, 2)"__tuple__:(1, 2)"). Plain string keys are passed through unchanged. Known limitation: a string key that already starts with a recognised prefix (e.g. "__int__:42") is indistinguishable from an encoded integer key after a round-trip.

Parameters:

key (Any) – The key to encode. Supported types: str, int, float, bool, flat tuple of scalars, flat frozenset of scalars.

Returns:

A JSON-safe string representing the key.

Return type:

str

Raises:

StackedTypeError – If key is of an unsupported type.

Examples

>>> _encode_key("hello")
'hello'
>>> _encode_key("[42]")
'\\[42]'
>>> _encode_key(42)
'[42]'
>>> _encode_key(3.14)
'[3.14]'
>>> _encode_key(True)
'[True]'
>>> _encode_key((1, 2))
'[(1, 2)]'
>>> _encode_key(frozenset({1, 2}))
'[frozenset{1, 2}]'

Notes

  • bool is tested before int because bool is a subclass of int.

  • float values use repr() to preserve full precision.

  • frozenset is unordered: element order in the encoded form is not guaranteed. Round-trip preserves set equality, not element ordering.

ndict_tools.serialize._decode_key(encoded: str) Any

Decode an encoded JSON key back to its original Python type.

Applies five sequential decoding rules in priority order:

  1. __bool__: prefix → bool (checked before int to avoid misclassification).

  2. __int__: prefix → int.

  3. __float__: prefix → float.

  4. __frozenset__: prefix → frozenset (via ast.literal_eval).

  5. __tuple__: prefix → tuple (via ast.literal_eval).

  6. No recognised prefix → plain str (identity).

No eval() is used. ast.literal_eval() is used only for flat tuples of Python scalars, which are valid Python literals by definition.

Parameters:

encoded (str) – The encoded JSON key string produced by _encode_key.

Returns:

The original Python key.

Return type:

Any

Examples

>>> _decode_key("hello")
'hello'
>>> _decode_key('\\[42]')
'[42]'
>>> _decode_key("[42]")
42
>>> _decode_key("[3.14]")
3.14
>>> _decode_key("[True]")
True
>>> _decode_key("[(1, 2)]")
(1, 2)
>>> _decode_key("[frozenset{1, 2}]")
frozenset({1, 2})

Notes

Decoding rules (applied in order):

  1. Starts with \[ → strip \, return as str.

  2. Starts with [frozenset{ and ends with }] → parse inner CSV of scalars, return as frozenset.

  3. Starts with [( and ends with )]ast.literal_eval(), return as tuple.

  4. Starts with [ and ends with ] → infer scalar type (True/Falsebool; . or efloat; else int).

  5. Otherwise → return as str unchanged.

JSON encoder

class ndict_tools.serialize.NestedDictionaryEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

JSON encoder for _StackedDict instances.

Encodes _StackedDict (and subclasses) as plain nested dict, applying _encode_key to all non-string keys so that the JSON output is valid and fully reversible.

Used internally by _StackedDict.to_json. Not part of the public API.

Examples

>>> import json
>>> from ndict_tools import NestedDictionary
>>> nd = NestedDictionary({"a": {"b": 1}})
>>> json.dumps(nd, cls=NestedDictionaryEncoder)
'{"a": {"b": 1}}'

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(o: Any) Any

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
encode(o: Any) str

Return a JSON string representation of a Python data structure.

>>> from json.encoder import JSONEncoder
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
iterencode(o: Any, _one_shot: bool = False)

Encode the given object and yield each string representation as available.

For example:

for chunk in JSONEncoder().iterencode(bigobject):
    mysocket.write(chunk)
ndict_tools.serialize._make_decoder_hook(cls: type, class_options: dict[str, Any]) Callable[[...], Any]

Return an object_pairs_hook that reconstructs a _StackedDict (or subclass) from JSON key-value pairs.

Parameters:
  • cls (type) – The _StackedDict subclass to instantiate.

  • class_options (dict) – Keyword arguments forwarded to cls.from_dict, must include default_setup.

Returns:

A hook suitable for json.load(..., object_pairs_hook=hook).

Return type:

Callable

Pickle helpers

ndict_tools.serialize._pickle_dump(nd: Any, path: str | Path, protocol: int | None = None) None

Write a _StackedDict to a pickle file with a SHA-256 sidecar.

Writes two files: - <path> — the pickle file - <path>.sha256 — hex digest of the pickle bytes

Parameters:
  • nd (_StackedDict) – The object to pickle.

  • path (str or Path) – Destination file path.

  • protocol (int, optional) – Pickle protocol (default: pickle.DEFAULT_PROTOCOL).

Warns:

UserWarning – Always emits a warning reminding callers that pickle is unsafe with untrusted files.

ndict_tools.serialize._pickle_load(path: str | Path, verify: bool = True) Any

Load a _StackedDict from a pickle file, optionally verifying its SHA-256 sidecar.

Parameters:
  • path (str or Path) – Path to the pickle file.

  • verify (bool, optional) – If True (default), read the .sha256 sidecar and raise StackedValueError if the digest does not match or the sidecar is absent.

Returns:

The unpickled object.

Return type:

Any

Raises:

StackedValueError – If verify=True and the digest mismatches or the sidecar is absent.

Warns:

UserWarning – Always emits a warning reminding callers that pickle is unsafe with untrusted files.