We’re going to jsonify and objectify great creation of Fyodor Dostoyevsky – The Idiot.
We’ll get well structured data ready for further experiments.

Dostoyevsky Idiot JSON

We have “The Idiot” text.
That’s top level function to objectify this text:

def objectify_idiot(url=TEXT_URL):
    text = requests.get(url).text

    idiot = {
        'title': 'The Idiot',
        'author': 'Fyodor Dostoyevsky',
        'text': text}

    part_seps = ['PART {}'.format(roman.toRoman(i)) for i in range(4, 0, -1)]
    part_seps.append('Copyright')

    parts = re.split('|'.join(part_seps), text)
    idiot['copyright'] = parts[-1].strip()

    parts = [p.strip() for p in parts[1:-1]]

    idiot['parts'] = [objectify_part(p, n + 1) for n, p in enumerate(parts)]

    return idiot

objectify_part() and others low level functions implementation can be found here.
Run pip install git+https://github.com/boorstat/python-boorstat.git to install python package with ready to use functions:

from boorstat.lit.dostoyevsky.idiot import idiot

idiot_as_dict = idiot.objectify_idiot()
idiot_as_dict_from_pregenerated_json = idiot.from_json()

Final structure can be understood from pregenerated json.
Also this screenshot from debugger can clarify hierarchy inside “The Idiot” object:

Dostoyevsky Idiot Python object structure