ckipnlp.container.util.parse_tree module

This module provides tree containers for parsed sentences.

class ckipnlp.container.util.parse_tree.ParseNodeData(role: str = None, pos: str = None, word: str = None)[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.util.parse_tree._ParseNodeData

A parse node.

Variables
  • role (str) – the semantic role.

  • pos (str) – the POS-tag.

  • word (str) – the text term.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Used for from_text() and to_text().

'Head:Na:中文字'  # role / POS-tag / text-term
List format

Not implemented.

Dict format

Used for from_dict() and to_dict().

{
    'role': 'Head',   # role
    'pos': 'Na',      # POS-tag
    'word': '中文字',  # text term
}
classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) – text such as 'Head:Na:中文字'.

Note

  • 'Head:Na:中文字' -> role = 'Head', pos = 'Na', word = '中文字'

  • 'Head:Na' -> role = 'Head', pos = 'Na', word = None

  • 'Na' -> role = None, pos = 'Na', word = None

class ckipnlp.container.util.parse_tree.ParseNode(tag=None, identifier=None, expanded=True, data=None)[source]

Bases: ckipnlp.container.base.Base, treelib.node.Node

A parse node for tree.

Variables

data (ParseNodeData) –

See also

treelib.tree.Node

Please refer https://treelib.readthedocs.io/ for built-in usages.

Data Structure Examples

Text format

Not implemented.

List format

Not implemented.

Dict format

Used for to_dict().

{
    'role': 'Head',   # role
    'pos': 'Na',      # POS-tag
    'word': '中文字',  # text term
}
data_class

alias of ParseNodeData

class ckipnlp.container.util.parse_tree.ParseRelation(head: ckipnlp.container.util.parse_tree.ParseNode, tail: ckipnlp.container.util.parse_tree.ParseNode, relation: ckipnlp.container.util.parse_tree.ParseNode)[source]

Bases: ckipnlp.container.base.Base, ckipnlp.container.util.parse_tree._ParseRelation

A parse relation.

Variables
  • head (ParseNode) – the head node.

  • tail (ParseNode) – the tail node.

  • relation (ParseNode) – the relation node. (the semantic role of this node is the relation.)

Notes

The parent of the relation node is always the common ancestor of the head node and tail node.

Data Structure Examples

Text format

Not implemented.

List format

Not implemented.

Dict format

Used for to_dict().

{
    'tail': { 'role': 'Head', 'pos': 'Nab', 'word': '中文字' }, # head node
    'tail': { 'role': 'particle', 'pos': 'Td', 'word': '耶' }, # tail node
    'relation': 'particle',  # relation
}
class ckipnlp.container.util.parse_tree.ParseTree(tree=None, deep=False, node_class=None, identifier=None)[source]

Bases: ckipnlp.container.base.Base, treelib.tree.Tree

A parse tree.

See also

treereelib.tree.Tree

Please refer https://treelib.readthedocs.io/ for built-in usages.

Data Structure Examples

Text format

Used for from_text() and to_text().

'S(Head:Nab:中文字|particle:Td:耶)'
List format

Not implemented.

Dict format

Used for from_dict() and to_dict(). A dictionary such as { 'id': 0, 'data': { ... }, 'children': [ ... ] }, where 'data' is a dictionary with the same format as ParseNodeData.to_dict(), and 'children' is a list of dictionaries of subtrees with the same format as this tree.

{
    'id': 0,
    'data': {
        'role': None,
        'pos': 'S',
        'word': None,
    },
    'children': [
        {
            'id': 1,
            'data': {
                'role': 'Head',
                'pos': 'Nab',
                'word': '中文字',
            },
            'children': [],
        },
        {
            'id': 2,
            'data': {
                'role': 'particle',
                'pos': 'Td',
                'word': '耶',
            },
            'children': [],
        },
    ],
}
Penn Treebank format

Used for from_penn() and to_penn().

[
    'S',
    [ 'Head:Nab', '中文字', ],
    [ 'particle:Td', '耶', ],
]

Note

One may use to_penn() together with SvgLing to generate SVG tree graphs.

node_class

alias of ParseNode

classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) – A parse tree in text format (ParseClause.clause).

to_text(node_id=None)[source]

Transform to plain text.

Parameters

node_id (int) – Output the plain text format for the subtree under node_id.

Returns

str

classmethod from_dict(data)[source]

Construct an instance from python built-in containers.

Parameters

data (str) – A parse tree in dictionary format.

to_dict(node_id=None)[source]

Transform to python built-in containers.

Parameters

node_id (int) – Output the plain text format for the subtree under node_id.

Returns

str

classmethod from_penn(data)[source]

Construct an instance from Penn Treebank format.

to_penn(node_id=None, *, with_role=True, with_word=True, sep=':')[source]

Transform to Penn Treebank format.

Parameters
  • node_id (int) – Output the plain text format for the subtree under node_id.

  • with_role (bool) – Contains role-tag or not.

  • with_word (bool) – Contains word or not.

  • sep (str) – The seperator between role and POS-tag.

Returns

list

show(*, key=<function ParseTree.<lambda>>, idhidden=False, **kwargs)[source]

Show pretty tree.

get_children(node_id, *, role)[source]

Get children of a node with given role.

Parameters
  • node_id (int) – ID of target node.

  • role (str) – the target role.

Yields

ParseNode – the children nodes with given role.

get_heads(root_id=None, *, semantic=True, deep=True)[source]

Get all head nodes of a subtree.

Parameters
  • root_id (int) – ID of the root node of target subtree.

  • semantic (bool) – use semantic/syntactic policy. For semantic mode, return DUMMY or head instead of syntactic Head.

  • deep (bool) – find heads recursively.

Yields

ParseNode – the head nodes.

get_relations(root_id=None, *, semantic=True)[source]

Get all relations of a subtree.

Parameters
  • root_id (int) – ID of the subtree root node.

  • semantic (bool) – please refer get_heads() for policy detail.

Yields

ParseRelation – the relations.

get_subjects(root_id=None, *, semantic=True, deep=True)[source]

Get the subject node of a subtree.

Parameters
  • root_id (int) – ID of the root node of target subtree.

  • semantic (bool) – please refer get_heads() for policy detail.

  • deep (bool) – please refer get_heads() for policy detail.

Yields

ParseNode – the subject node.

Notes

A node can be a subject if either:

  1. is a head of NP

  2. is a head of a subnode (N) of S with subject role

  3. is a head of a subnode (N) of S with neutral role and before the head (V) of S