CKIP CoreNLP

Introduction

CKIP CoreNLP Toolkit

Features

  • Sentence Segmentation

  • Word Segmentation

  • Part-of-Speech Tagging

  • Named-Entity Recognition

  • Constituency Parsing

  • Coreference Resolution

Contributers

Installation

Requirements

Driver Requirements

Driver

Built-in

CkipTagger

CkipClassic

Sentence Segmentation

Word Segmentation†

Part-of-Speech Tagging†

Constituency Parsing

Named-Entity Recognition

Coreference Resolution‡

  • † These drivers require only one of either backends.

  • ‡ Coreference implementation does not require any backend, but requires results from word segmentation, part-of-speech tagging, constituency parsing, and named-entity recognition.

Installation via Pip

Usage

See https://ckipnlp.readthedocs.io/ for API details.

License

CC BY-NC-SA 4.0

Copyright (c) 2018-2020 CKIP Lab under the CC BY-NC-SA 4.0 License.

Usage

CkipNLP provides a set of human language technology tools, including

  • Sentence Segmentation

  • Word Segmentation

  • Part-of-Speech Tagging

  • Named-Entity Recognition

  • Constituency Parsing

  • Coreference Resolution

The library is build around three types of classes:

Containers

Containers Prototypes

All the container objects can be convert from/to other formats:

  • from_text(), to_text() for plain-text conversions;

  • from_list(), to_list() for list-like python object conversions;

  • from_dict(), to_dict() for dictionary-like python object (key-value mappings) conversions;

  • from_json(), to_json() for JSON format conversions (based-on dictionary-like format conversions).

Here are the interfaces, where CONTAINER_CLASS refers to the container class.

obj = CONTAINER_CLASS.from_text(plain_text)
plain_text = obj.to_text()

obj = CONTAINER_CLASS.from_list([ value1, value2 ])
list_obj = obj.to_list()

obj = CONTAINER_CLASS.from_dict({ key: value })
dict_obj = obj.to_dict()

obj = CONTAINER_CLASS.from_json(json_str)
json_str = obj.to_json()

Note that not all container provide all above conversions. Here is the table of implemented methods. Please refer the documentation of each container for format details.

Container

Item

from/to text

from/to list, dict, json

TextParagraph

str

SegSentence

str

SegParagraph

SegSentence

NerToken

NerSentence

NerToken

NerParagraph

NerSentence

ParseClause

only to

ParseSentence

ParseClause

only to

ParseParagraph

ParseSentence

only to

CorefToken

only to

CorefSentence

CorefToken

only to

CorefParagraph

CorefSentence

only to

WS with POS

There are also conversion routines for word-segmentation and part-of-speech containers jointly. For example, WsPosToken provides routines for a word (str) with POS-tag (str):

ws_obj, pos_obj = WsPosToken.from_text('中文字(Na)')
plain_text = WsPosToken.to_text(ws_obj, pos_obj)

ws_obj, pos_obj = WsPosToken.from_list([ '中文字', 'Na' ])
list_obj = WsPosToken.to_list(ws_obj, pos_obj)

ws_obj, pos_obj = WsPosToken.from_dict({ 'word': '中文字', 'pos': 'Na', })
dict_obj = WsPosToken.to_dict(ws_obj, pos_obj)

ws_obj, pos_obj = WsPosToken.from_json(json_str)
json_str = WsPosToken.to_json(ws_obj, pos_obj)

Similarly, WsPosSentence/WsPosParagraph provides routines for word-segmented and POS sentence/paragraph (SegSentence/SegParagraph) respectively.

Parse Tree

In addition to ParseClause, there are also tree utilities base on TreeLib.

ParseTree is the tree structure of a parse clause. One may use from_text() and to_text() for plain-text conversion; from_dict(), to_dict() for dictionary-like object conversion; and also from_json(), to_json() for JSON string conversion.

ParseTree also provide from_penn() and to_penn() methods for Penn Treebank conversion. One may use to_penn() together with SvgLing to generate SVG tree graphs.

ParseTree is a TreeLib tree with ParseNode as its nodes. The data of these nodes is stored in a ParseNodeData (accessed by node.data), which is a tuple of role (semantic role), pos (part-of-speech tagging), word.

ParseTree provides useful methods: get_heads() finds the head words of the clause; get_relations() extracts all relations in the clause; get_subjects() returns the subjects of the clause.

from ckipnlp.container import ParseClause, ParseTree

# 我的早餐、午餐和晚餐都在那場比賽中被吃掉了
clause = ParseClause('S(goal:NP(possessor:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nab(DUMMY1:Nab(DUMMY1:Nab:早餐|Head:Caa:、|DUMMY2:Naa:午餐)|Head:Caa:和|DUMMY2:Nab:晚餐))|quantity:Dab:都|condition:PP(Head:P21:在|DUMMY:GP(DUMMY:NP(Head:Nac:比賽)|Head:Ng:中))|agent:PP(Head:P02:被)|Head:VC31:吃掉|aspect:Di:了)')

tree = clause.to_tree()

print('Show Tree')
tree.show()

print('Get Heads of {}'.format(tree[5]))
print('-- Semantic --')
for head in tree.get_heads(5, semantic=True): print(repr(head))
print('-- Syntactic --')
for head in tree.get_heads(5, semantic=False): print(repr(head))
print()

print('Get Relations of {}'.format(tree[0]))
print('-- Semantic --')
for rel in tree.get_relations(0, semantic=True): print(repr(rel))
print('-- Syntactic --')
for rel in tree.get_relations(0, semantic=False): print(repr(rel))
print()

# 我和食物真的都很不開心
tree_text = 'S(theme:NP(DUMMY1:NP(Head:Nhaa:我)|Head:Caa:和|DUMMY2:NP(Head:Naa:食物))|evaluation:Dbb:真的|quantity:Dab:都|degree:Dfa:很|negation:Dc:不|Head:VH21:開心)'

tree = ParseTree.from_text(tree_text)

print('Show Tree')
tree.show()

print('Get get_subjects of {}'.format(tree[0]))
print('-- Semantic --')
for subject in tree.get_subjects(0, semantic=True): print(repr(subject))
print('-- Syntactic --')
for subject in tree.get_subjects(0, semantic=False): print(repr(subject))
print()

Drivers

class Driver(*, lazy=False, ...)

The prototype of CkipNLP Drivers.

Parameters

lazy (bool) – Lazy initialize the driver. (Call init() at the first call of __call__() instead.)

driver_type: str

The type of this driver.

driver_family: str

The family of this driver.

driver_inputs: Tuple[str, ]

The inputs of this driver.

init()

Initialize the driver (by calling the _init() function).

__call__(*, ...)

Call the driver (by calling the _call() function).

Here are the list of the drivers:

Driver Type \ Family

'default'

'tagger'

'classic'

Sentence Segmenter

CkipSentenceSegmenter

Word Segmenter

CkipTaggerWordSegmenter

CkipClassicWordSegmenter

Pos Tagger

CkipTaggerPosTagger

CkipClassicWordSegmenter

Ner Chunker

CkipTaggerNerChunker

Constituency Parser

CkipClassicConParser

Coref Chunker

CkipCorefChunker

† Not compatible with CkipCorefPipeline.

Pipelines

Kernel Pipeline

The CkipPipeline connect drivers of sentence segmentation, word segmentation, part-of-speech tagging, named-entity recognition, and sentence parsing.

The CkipDocument is the workspace of CkipPipeline with input/output data. Note that CkipPipeline will store the result into CkipDocument in-place.

The CkipPipeline will compute all necessary dependencies. For example, if one calls get_ner() with only raw-text input, the pipeline will automatically calls get_text(), get_ws(), get_pos().

_images/pipeline.svg
from ckipnlp.pipeline import CkipPipeline, CkipDocument

pipeline = CkipPipeline()
doc = CkipDocument(raw='中文字耶,啊哈哈哈')

# Word Segmentation
pipeline.get_ws(doc)
print(doc.ws)
for line in doc.ws:
    print(line.to_text())

# Part-of-Speech Tagging
pipeline.get_pos(doc)
print(doc.pos)
for line in doc.pos:
    print(line.to_text())

# Named-Entity Recognition
pipeline.get_ner(doc)
print(doc.ner)

# Constituency Parsing
pipeline.get_conparse(doc)
print(doc.conparse)

################################################################

from ckipnlp.container.util.wspos import WsPosParagraph

# Word Segmentation & Part-of-Speech Tagging
for line in WsPosParagraph.to_text(doc.ws, doc.pos):
    print(line)

Co-Reference Pipeline

The CkipCorefPipeline is a extension of CkipPipeline by providing coreference resolution. The pipeline first do named-entity recognition as CkipPipeline do, followed by alignment algorithms to fix the word-segmentation and part-of-speech tagging outputs, and then do coreference resolution based sentence parsing result.

The CkipCorefDocument is the workspace of CkipCorefPipeline with input/output data. Note that CkipCorefDocument will store the result into CkipCorefPipeline.

_images/coref_pipeline.svg
from ckipnlp.pipeline import CkipCorefPipeline, CkipDocument

pipeline = CkipCorefPipeline()
doc = CkipDocument(raw='畢卡索他想,完蛋了')

# Co-Reference
corefdoc = pipeline(doc)
print(corefdoc.coref)
for line in corefdoc.coref:
    print(line.to_text())

Tables of Tags

Part-of-Speech Tags

Tag

Description

A

非謂形容詞

Caa

對等連接詞

Cab

連接詞,如:等等

Cba

連接詞,如:的話

Cbb

關聯連接詞

D

副詞

Da

數量副詞

Dfa

動詞前程度副詞

Dfb

動詞後程度副詞

Di

時態標記

Dk

句副詞

DM

定量式

I

感嘆詞

Na

普通名詞

Nb

專有名詞

Nc

地方詞

Ncd

位置詞

Nd

時間詞

Nep

指代定詞

Neqa

數量定詞

Neqb

後置數量定詞

Nes

特指定詞

Neu

數詞定詞

Nf

量詞

Ng

後置詞

Nh

代名詞

Nv

名物化動詞

P

介詞

T

語助詞

VA

動作不及物動詞

VAC

動作使動動詞

VB

動作類及物動詞

VC

動作及物動詞

VCL

動作接地方賓語動詞

VD

雙賓動詞

VF

動作謂賓動詞

VE

動作句賓動詞

VG

分類動詞

VH

狀態不及物動詞

VHC

狀態使動動詞

VI

狀態類及物動詞

VJ

狀態及物動詞

VK

狀態句賓動詞

VL

狀態謂賓動詞

V_2

DE

的之得地

SHI

FW

外文

COLONCATEGORY

冒號

COMMACATEGORY

逗號

DASHCATEGORY

破折號

DOTCATEGORY

點號

ETCCATEGORY

刪節號

EXCLAMATIONCATEGORY

驚嘆號

PARENTHESISCATEGORY

括號

PAUSECATEGORY

頓號

PERIODCATEGORY

句號

QUESTIONCATEGORY

問號

SEMICOLONCATEGORY

分號

SPCHANGECATEGORY

雙直線

WHITESPACE

空白

Constituency Parsing Tags

Tag

Description

S

表示結構樹為句子,以述詞為中心語,此外當主詞和述詞的賓語或補語的型式為句子或子句的時候,詞組結構標記為S,不為NP。

VP

述詞詞組,中心語為述詞(V)。

NP

名詞詞組,中心語為名詞(N)。

GP

方位詞詞組,中心語為方位詞(Ng),所帶論元角色為DUMMY1。

PP

介詞詞組,中心語為介詞(P),所帶論元角色亦為DUMMY。

XP

連接詞詞組,中心語為連接詞(C),X代表一個變數,XP的真正詞類由連接成分決定,例如:連接成分為述詞詞組(VP),則為述詞詞組(VP),連接成分為名詞詞組,則為名詞詞組(NP)。

DM

定量詞詞組。

Constituency Parsing Roles

Role

Description

#修飾物體名詞

apposition

表物體的同位語,即指涉相同的物體。

possessor

表物體的領屬者,包含成員、創造者、擁有者和整體等皆為領屬者。

predication

表修飾物體的相關事件,為名詞的關係子句,與事件中心語有論元關係。

property

表物體的特色和性質,也包含物體相關的時空訊息,是一個較上位而粗略的語意角色。

quantifier

表名詞的數量修飾語,為數量定詞、定量詞等等。

#修飾事件動詞–事件參與者角色

agent

表事件中的肇始者,動作動詞的行動者。

benefactor

表受益的對象,但非主要賓語。

causer

表事件的肇始者,但肇始者並未主動促使事件發生。

companion

表主語的隨同對象。

comparison

表比較的對象,多在比較句中出現。

experiencer

表感受所敘述的情緒感知狀況的主事者,為心靈類述語的主語。

goal

表動作影響的對象,或者為心靈動作的受事對象,在有物件轉移的事件中則是個接受者或終點。

range

表分類的範疇或結果的幅度。為分類動詞及比較句的主要語意角色。

source

表物件轉移的起點。

target

述詞內容表達的對象或是轉移的方向。

theme

表靜態及分類述詞敘述的對象或動態事件中描述存在或位移的主事者,以及因事件動作造成物體的狀態從無到有的受事者,皆使用這個語意角色。

topic

表事件所論述的主題。

#修飾事件動詞–事件附加的角色

aspect

表動作的時貌。

degree

表狀態的程度。

deixis

表動作附加的指示成分。

deontics

表說話者對事件是否成真的態度,標示於此類型的法相副詞。

duration

表事件持續的時間長度。

evaluation

表評價的語氣成分。

epistemics

表說話者對事件是否為真的猜測,標示於此類型的法相副詞。

frequency

表事件的頻率。

instrument

表動作時所使用的工具。

interjection

表句中感嘆詞的角色。

location

表事件發生的地點。

manner

表主語的動作方式。

negation

表否定。

particle

表句尾說話者的語氣。

quantity

表事物的數量。

standard

表憑據。

time

表事件發生的時間。

#修飾事件動詞–從屬關係的語意角色

addition

表附加。

alternative

表聯合複句中選擇的口氣。

avoidance

表應避免的情況。

complement

表補充說明,進一步補充前一事件內容。

conclusion

表引介出的結論。

condition

表條件語氣的句子或是事件狀況。

concession

表讓步語氣的連接。

contrast

表轉折語氣。

conversion

表引出轉變條件下的結果。

exclusion

表屏除的對象。

hypothesis

表假設的語氣。

listing

表條列的項目。

purpose

表目的。

reason

表事件的原因。

rejection

表取捨關係中的應捨部分。

result

表事件的結果。

restriction

表遞進語氣的前半部。

selection

表取捨關係中的應取部分。

uncondition

表與現況不符的假設。

whatever

表不論何種條件。

#標記語法功能

DUMMY

表未定的角色,需要靠其上位詞組的中心語才能決定。

DUMMY1

表未定的角色,需要靠其上位詞組的中心語才能決定。

DUMMY2

表未定的角色,需要靠其上位詞組的中心語才能決定。

Head

表語法的中心語,通常也是語意的中心成分,句子或詞組皆有Head這個角色。

head

在「的」的結構裡,語意和語法的中心語不同時,表示為語意的中心語成分,以別於語法的中心語。

nominal

表名物化結構,用來標示中心語為名物化動詞的名詞短語中的「的」。

ckipnlp package

The Official CKIP CoreNLP Toolkit.

Subpackages

ckipnlp.container package

This module implements specialized container datatypes for CKIPNLP.

Subpackages

ckipnlp.container.util package

This module implements specialized utilities for CKIPNLP containers.

Submodules

ckipnlp.container.util.parse_tree module

This module provides tree containers for parsed sentences.

class ckipnlp.container.util.parse_tree.ParseNodeData(role: str = None, pos: str = None, word: str = None)[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.util.parse_tree._ParseNodeData

A parse node.

Variables
  • role (str) – the semantic role.

  • pos (str) – the POS-tag.

  • word (str) – the text term.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Used for from_text() and to_text().

'Head:Na:中文字'  # role / POS-tag / text-term
List format

Not implemented.

Dict format

Used for from_dict() and to_dict().

{
    'role': 'Head',   # role
    'pos': 'Na',      # POS-tag
    'word': '中文字',  # text term
}
classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) – text such as 'Head:Na:中文字'.

Note

  • 'Head:Na:中文字' -> role = 'Head', pos = 'Na', word = '中文字'

  • 'Head:Na' -> role = 'Head', pos = 'Na', word = None

  • 'Na' -> role = None, pos = 'Na', word = None

class ckipnlp.container.util.parse_tree.ParseNode(tag=None, identifier=None, expanded=True, data=None)[source]

Bases: ckipnlp.container.base.Base, treelib.node.Node

A parse node for tree.

Variables

data (ParseNodeData) –

See also

treelib.tree.Node

Please refer https://treelib.readthedocs.io/ for built-in usages.

Data Structure Examples

Text format

Not implemented.

List format

Not implemented.

Dict format

Used for to_dict().

{
    'role': 'Head',   # role
    'pos': 'Na',      # POS-tag
    'word': '中文字',  # text term
}
data_class

alias of ParseNodeData

class ckipnlp.container.util.parse_tree.ParseRelation(head: ckipnlp.container.util.parse_tree.ParseNode, tail: ckipnlp.container.util.parse_tree.ParseNode, relation: ckipnlp.container.util.parse_tree.ParseNode)[source]

Bases: ckipnlp.container.base.Base, ckipnlp.container.util.parse_tree._ParseRelation

A parse relation.

Variables
  • head (ParseNode) – the head node.

  • tail (ParseNode) – the tail node.

  • relation (ParseNode) – the relation node. (the semantic role of this node is the relation.)

Notes

The parent of the relation node is always the common ancestor of the head node and tail node.

Data Structure Examples

Text format

Not implemented.

List format

Not implemented.

Dict format

Used for to_dict().

{
    'tail': { 'role': 'Head', 'pos': 'Nab', 'word': '中文字' }, # head node
    'tail': { 'role': 'particle', 'pos': 'Td', 'word': '耶' }, # tail node
    'relation': 'particle',  # relation
}
class ckipnlp.container.util.parse_tree.ParseTree(tree=None, deep=False, node_class=None, identifier=None)[source]

Bases: ckipnlp.container.base.Base, treelib.tree.Tree

A parse tree.

See also

treereelib.tree.Tree

Please refer https://treelib.readthedocs.io/ for built-in usages.

Data Structure Examples

Text format

Used for from_text() and to_text().

'S(Head:Nab:中文字|particle:Td:耶)'
List format

Not implemented.

Dict format

Used for from_dict() and to_dict(). A dictionary such as { 'id': 0, 'data': { ... }, 'children': [ ... ] }, where 'data' is a dictionary with the same format as ParseNodeData.to_dict(), and 'children' is a list of dictionaries of subtrees with the same format as this tree.

{
    'id': 0,
    'data': {
        'role': None,
        'pos': 'S',
        'word': None,
    },
    'children': [
        {
            'id': 1,
            'data': {
                'role': 'Head',
                'pos': 'Nab',
                'word': '中文字',
            },
            'children': [],
        },
        {
            'id': 2,
            'data': {
                'role': 'particle',
                'pos': 'Td',
                'word': '耶',
            },
            'children': [],
        },
    ],
}
Penn Treebank format

Used for from_penn() and to_penn().

[
    'S',
    [ 'Head:Nab', '中文字', ],
    [ 'particle:Td', '耶', ],
]

Note

One may use to_penn() together with SvgLing to generate SVG tree graphs.

node_class

alias of ParseNode

classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) – A parse tree in text format (ParseClause.clause).

to_text(node_id=None)[source]

Transform to plain text.

Parameters

node_id (int) – Output the plain text format for the subtree under node_id.

Returns

str

classmethod from_dict(data)[source]

Construct an instance from python built-in containers.

Parameters

data (str) – A parse tree in dictionary format.

to_dict(node_id=None)[source]

Transform to python built-in containers.

Parameters

node_id (int) – Output the plain text format for the subtree under node_id.

Returns

str

classmethod from_penn(data)[source]

Construct an instance from Penn Treebank format.

to_penn(node_id=None, *, with_role=True, with_word=True, sep=':')[source]

Transform to Penn Treebank format.

Parameters
  • node_id (int) – Output the plain text format for the subtree under node_id.

  • with_role (bool) – Contains role-tag or not.

  • with_word (bool) – Contains word or not.

  • sep (str) – The seperator between role and POS-tag.

Returns

list

show(*, key=<function ParseTree.<lambda>>, idhidden=False, **kwargs)[source]

Show pretty tree.

get_children(node_id, *, role)[source]

Get children of a node with given role.

Parameters
  • node_id (int) – ID of target node.

  • role (str) – the target role.

Yields

ParseNode – the children nodes with given role.

get_heads(root_id=None, *, semantic=True, deep=True)[source]

Get all head nodes of a subtree.

Parameters
  • root_id (int) – ID of the root node of target subtree.

  • semantic (bool) – use semantic/syntactic policy. For semantic mode, return DUMMY or head instead of syntactic Head.

  • deep (bool) – find heads recursively.

Yields

ParseNode – the head nodes.

get_relations(root_id=None, *, semantic=True)[source]

Get all relations of a subtree.

Parameters
  • root_id (int) – ID of the subtree root node.

  • semantic (bool) – please refer get_heads() for policy detail.

Yields

ParseRelation – the relations.

get_subjects(root_id=None, *, semantic=True, deep=True)[source]

Get the subject node of a subtree.

Parameters
  • root_id (int) – ID of the root node of target subtree.

  • semantic (bool) – please refer get_heads() for policy detail.

  • deep (bool) – please refer get_heads() for policy detail.

Yields

ParseNode – the subject node.

Notes

A node can be a subject if either:

  1. is a head of NP

  2. is a head of a subnode (N) of S with subject role

  3. is a head of a subnode (N) of S with neutral role and before the head (V) of S

ckipnlp.container.util.wspos module

This module provides containers for word-segmented sentences with part-of-speech-tags.

class ckipnlp.container.util.wspos.WsPosToken(word: str = None, pos: str = None)[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.util.wspos._WsPosToken

A word with POS-tag.

Variables
  • word (str) – the word.

  • pos (str) – the POS-tag.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Used for from_text() and to_text().

'中文字(Na)'  # word / POS-tag
List format

Used for from_list() and to_list().

[
    '中文字', # word
    'Na',    # POS-tag
]
Dict format

Used for from_dict() and to_dict().

{
    'word': '中文字', # word
    'pos': 'Na',     # POS-tag
}
classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) – text such as '中文字(Na)'.

Note

  • '中文字(Na)' -> word = '中文字', pos = 'Na'

  • '中文字' -> word = '中文字', pos = None

class ckipnlp.container.util.wspos.WsPosSentence[source]

Bases: object

A helper class for data conversion of word-segmented and part-of-speech sentences.

classmethod from_text(data)[source]

Convert text format to word-segmented and part-of-speech sentences.

Parameters

data (str) – text such as '中文字(Na)\u3000耶(T)'.

Returns

static to_text(word, pos)[source]

Convert text format to word-segmented and part-of-speech sentences.

Parameters
Returns

str – text such as '中文字(Na)\u3000耶(T)'.

class ckipnlp.container.util.wspos.WsPosParagraph[source]

Bases: object

A helper class for data conversion of word-segmented and part-of-speech sentence lists.

classmethod from_text(data)[source]

Convert text format to word-segmented and part-of-speech sentence lists.

Parameters

data (Sequence[str]) – list of sentences such as '中文字(Na)\u3000耶(T)'.

Returns

static to_text(word, pos)[source]

Convert text format to word-segmented and part-of-speech sentence lists.

Parameters
Returns

List[str] – list of sentences such as '中文字(Na)\u3000耶(T)'.

Submodules

ckipnlp.container.base module

This module provides base containers.

class ckipnlp.container.base.Base[source]

Bases: object

The base CKIPNLP container.

abstract classmethod from_text(data)[source]

Construct an instance from text format.

Parameters

data (str) –

abstract to_text()[source]

Transform to plain text.

Returns

str

abstract classmethod from_list(data)[source]

Construct an instance from python built-in containers.

abstract to_list()[source]

Transform to python built-in containers.

abstract classmethod from_dict(data)[source]

Construct an instance from python built-in containers.

abstract to_dict()[source]

Transform to python built-in containers.

classmethod from_json(data, **kwargs)[source]

Construct an instance from JSON format.

Parameters

data (str) – please refer from_dict() for format details.

to_json(**kwargs)[source]

Transform to JSON format.

Returns

str

class ckipnlp.container.base.BaseTuple[source]

Bases: ckipnlp.container.base.Base

The base CKIPNLP tuple.

classmethod from_list(data)[source]

Construct an instance from python built-in containers.

Parameters

data (list) –

to_list()[source]

Transform to python built-in containers.

Returns

list

classmethod from_dict(data)[source]

Construct an instance from python built-in containers.

Parameters

data (dict) –

to_dict()[source]

Transform to python built-in containers.

Returns

dict

class ckipnlp.container.base.BaseList(initlist=None)[source]

Bases: ckipnlp.container.base._BaseList, ckipnlp.container.base._InterfaceItem

The base CKIPNLP list.

item_class = Not Implemented

Must be a CKIPNLP container class.

class ckipnlp.container.base.BaseList0(initlist=None)[source]

Bases: ckipnlp.container.base._BaseList, ckipnlp.container.base._InterfaceBuiltInItem

The base CKIPNLP list with built-in item class.

item_class = Not Implemented

Must be a built-in type.

class ckipnlp.container.base.BaseSentence(initlist=None)[source]

Bases: ckipnlp.container.base._BaseSentence, ckipnlp.container.base._InterfaceItem

The base CKIPNLP sentence.

item_class = Not Implemented

Must be a CKIPNLP container class.

class ckipnlp.container.base.BaseSentence0(initlist=None)[source]

Bases: ckipnlp.container.base._BaseSentence, ckipnlp.container.base._InterfaceBuiltInItem

The base CKIPNLP sentence with built-in item class.

item_class = Not Implemented

Must be a built-in type.

ckipnlp.container.coref module

This module provides containers for coreference sentences.

class ckipnlp.container.coref.CorefToken(word, coref, idx, **kwargs)[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.coref._CorefToken

A coreference token.

Variables
  • word (str) – the token word.

  • coref (Tuple[int, str]) –

    the coreference ID and type. None if not a coreference source or target.

    • type:
      • ’source’: coreference source.

      • ’target’: coreference target.

      • ’zero’: null element coreference target.

  • idx (Tuple[int, int]) – the node indexes (clause index, token index) in parse tree. idx[1] = None if this node is a null element or the punctuations.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Used for to_text().

'畢卡索_0'
List format

Used for from_list() and to_list().

[
    '畢卡索',       # token word
    (0, 'source'), # coref ID and type
    (2, 2),        # node index
]
Dict format

Used for from_dict() and to_dict().

{
    'word': '畢卡索',        # token word
    'coref': (0, 'source'), # coref ID and type
    'idx': (2, 2),          # node index
}
class ckipnlp.container.coref.CorefSentence(initlist=None)[source]

Bases: ckipnlp.container.base.BaseSentence

A list of coreference sentence.

Data Structure Examples

Text format

Used for to_text().

# Token segmented by \u3000 (full-width space)
'「 完蛋 了 !」 , 畢卡索_0 他_0 想'
List format

Used for from_list() and to_list().

[
    [ '「', None, (0, None,), ],
    [ '完蛋', None, (1, 0,), ],
    [ '了', None, (1, 1,), ],
    [ '!」', None, (1, None,), ],
    [ '畢卡索', (0, 'source'), (2, 2,), ],
    [ '他', (0, 'target'), (2, 3,), ],
    [ '想', None, (2, 4,), ],
]
Dict format

Used for from_dict() and to_dict().

[
    { 'word': '「', 'coref': None, 'idx': (0, None,), },
    { 'word': '完蛋', 'coref': None, 'idx': (1, 0,), },
    { 'word': '了', 'coref': None, 'idx': (1, 1,), },
    { 'word': '!」', 'coref': None, 'idx': (1, None,), },
    { 'word': '畢卡索', 'coref': (0, 'source'), 'idx': (2, 2,), },
    { 'word': '他', 'coref': (0, 'target'), 'idx': (2, 3,), },
    { 'word': '想', 'coref': None, 'idx': (2, 4,), },
]
item_class

alias of CorefToken

class ckipnlp.container.coref.CorefParagraph(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList

A list of coreference sentence.

Data Structure Examples

Text format

Used for to_text().

[
    '「 完蛋 了 !」 , 畢卡索_0 他_0 想', # Sentence 1
    '但是 None_0 也 沒有 辦法', # Sentence 1
]
List format

Used for from_list() and to_list().

[
    [ # Sentence 1
        [ '「', None, (0, None,), ],
        [ '完蛋', None, (1, 0,), ],
        [ '了', None, (1, 1,), ],
        [ '!」', None, (1, None,), ],
        [ '畢卡索', (0, 'source'), (2, 2,), ],
        [ '他', (0, 'target'), (2, 3,), ],
        [ '想', None, (2, 4,), ],
    ],
    [ # Sentence 2
        [ '但是', None, (0, 1,), ],
        [ None, (0, 'zero'), (0, None,), ],
        [ '也', None, (0, 2,), ],
        [ '沒有', None, (0, 3,), ],
        [ '辦法', None, (0, 5,), ],
    ],
]
Dict format

Used for from_dict() and to_dict().

[
    [ # Sentence 1
        { 'word': '「', 'coref': None, 'idx': (0, None,), },
        { 'word': '完蛋', 'coref': None, 'idx': (1, 0,), },
        { 'word': '了', 'coref': None, 'idx': (1, 1,), },
        { 'word': '!」', 'coref': None, 'idx': (1, None,), },
        { 'word': '畢卡索', 'coref': (0, 'source'), 'idx': (2, 2,), },
        { 'word': '他', 'coref': (0, 'target'), 'idx': (2, 3,), },
        { 'word': '想', 'coref': None, 'idx': (2, 4,), },
    ],
    [ # Sentence 2
        { 'word': '但是', 'coref': None, 'idx': (0, 1,), },
        { 'word': None, 'coref': (0, 'zero'), 'idx': (0, None,), },
        { 'word': '也', 'coref': None, 'idx': (0, 2,), },
        { 'word': '沒有', 'coref': None, 'idx': (0, 3,), },
        { 'word': '辦法', 'coref': None, 'idx': (0, 5,), },
    ],
]
item_class

alias of CorefSentence

ckipnlp.container.ner module

This module provides containers for NER sentences.

class ckipnlp.container.ner.NerToken(word, ner, idx, **kwargs)[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.ner._NerToken

A named-entity recognition token.

Variables
  • word (str) – the token word.

  • ner (str) – the NER-tag.

  • idx (Tuple[int, int]) – the starting / ending index.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Not implemented

List format

Used for from_list() and to_list().

[
    '中文字'     # token word
    'LANGUAGE', # NER-tag
    (0, 3),     # starting / ending index.
]
Dict format

Used for from_dict() and to_dict().

{
    'word': '中文字',   # token word
    'ner': 'LANGUAGE', # NER-tag
    'idx': (0, 3),     # starting / ending index.
}
CkipTagger format

Used for from_tagger() and to_tagger().

(
    0,          # starting index
    3,          # ending index
    'LANGUAGE', # NER-tag
    '中文字',    # token word
)
classmethod from_tagger(data)[source]

Construct an instance from CkipTagger format.

to_tagger()[source]

Transform to CkipTagger format.

class ckipnlp.container.ner.NerSentence(initlist=None)[source]

Bases: ckipnlp.container.base.BaseSentence

A named-entity recognition sentence.

Data Structure Examples

Text format

Not implemented

List format

Used for from_list() and to_list().

[
    [ '美國', 'GPE', (0, 2), ],   # name-entity 1
    [ '參議院', 'ORG', (3, 5), ], # name-entity 2
]
Dict format

Used for from_dict() and to_dict().

[
    { 'word': '美國', 'ner': 'GPE', 'idx': (0, 2), },   # name-entity 1
    { 'word': '參議院', 'ner': 'ORG', 'idx': (3, 5), }, # name-entity 2
]
CkipTagger format

Used for from_tagger() and to_tagger().

[
    ( 0, 2, 'GPE', '美國', ),   # name-entity 1
    ( 3, 5, 'ORG', '參議院', ), # name-entity 2
]
item_class

alias of NerToken

classmethod from_tagger(data)[source]

Construct an instance from CkipTagger format.

to_tagger()[source]

Transform to CkipTagger format.

class ckipnlp.container.ner.NerParagraph(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList

A list of named-entity recognition sentence.

Data Structure Examples

Text format

Not implemented

List format

Used for from_list() and to_list().

[
    [ # Sentence 1
        [ '中文字', 'LANGUAGE', (0, 3), ],
    ],
    [ # Sentence 2
        [ '美國', 'GPE', (0, 2), ],
        [ '參議院', 'ORG', (3, 5), ],
    ],
]
Dict format

Used for from_dict() and to_dict().

[
    [ # Sentence 1
        { 'word': '中文字', 'ner': 'LANGUAGE', 'idx': (0, 3), },
    ],
    [ # Sentence 2
        { 'word': '美國', 'ner': 'GPE', 'idx': (0, 2), },
        { 'word': '參議院', 'ner': 'ORG', 'idx': (3, 5), },
    ],
]
CkipTagger format

Used for from_tagger() and to_tagger().

[
    [ # Sentence 1
        ( 0, 3, 'LANGUAGE', '中文字', ),
    ],
    [ # Sentence 2
        ( 0, 2, 'GPE', '美國', ),
        ( 3, 5, 'ORG', '參議院', ),
    ],
]
item_class

alias of NerSentence

classmethod from_tagger(data)[source]

Construct an instance from CkipTagger format.

to_tagger()[source]

Transform to CkipTagger format.

ckipnlp.container.parse module

This module provides containers for parsed sentences.

class ckipnlp.container.parse.ParseClause(clause: str = None, delim: str = '')[source]

Bases: ckipnlp.container.base.BaseTuple, ckipnlp.container.parse._ParseClause

A parse clause.

Variables
  • clause (str) – the parse clause.

  • delim (str) – the punctuations after this clause.

Note

This class is an subclass of tuple. To change the attribute, please create a new instance instead.

Data Structure Examples

Text format

Used for to_text().

'S(Head:Nab:中文字|particle:Td:耶)' # delim are ignored
List format

Used for from_list(), and to_list().

[
    'S(Head:Nab:中文字|particle:Td:耶)', # parse clause
    ',',                               # punctuations
]
Dict format

Used for from_dict() and to_dict().

{
    'clause': 'S(Head:Nab:中文字|particle:Td:耶)', # parse clause
    'delim': ',',                                # punctuations
}
to_tree()[source]

Transform to tree format.

Returns

ParseTree – the tree format of this clause. (None if clause is None)

class ckipnlp.container.parse.ParseSentence(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList

A parse sentence.

Data Structure Examples

Text format

Used for to_text().

[ # delim are ignored
    'S(Head:Nab:中文字|particle:Td:耶)',                    # Clause 1
    '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), # Clause 2
]
List format

Used for from_list(), and to_list().

[
    [ # Clause 1
        'S(Head:Nab:中文字|particle:Td:耶)',
        ',',
    ],
    [ # Clause 2
        '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈),
        '。',
    ],
]
Dict format

Used for from_dict() and to_dict().

[
    { # Clause 1
        'clause': 'S(Head:Nab:中文字|particle:Td:耶)',
        'delim': ',',
    },
    { # Clause 2
        'clause': '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈),
        'delim': '。',
    },
]
item_class

alias of ParseClause

class ckipnlp.container.parse.ParseParagraph(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList

A list of parse sentence.

Data Structure Examples

Text format

Used for to_text().

[ # delim are ignored
    [ # Sentence 1
        'S(Head:Nab:中文字|particle:Td:耶)',
        '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈),
    ],
    [ # Sentence 2
        None,
        'VP(Head:VH11:完蛋|particle:Ta:了),
        'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)',
    ],
]
List format

Used for from_list(), and to_list().

[
    [ # Sentence 1
        [
            'S(Head:Nab:中文字|particle:Td:耶)',
            ',',
        ],
        [
            '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈),
            '。',
        ],
    ],
    [ # Sentence 2
        [
            None,
            '「',
        ],
        [
            'VP(Head:VH11:完蛋|particle:Ta:了),
            '!」',
        ],
        [
            'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)',
            '',
        ],
    ],
]
Dict format

Used for from_dict(), and to_dict().

[
    [ # Sentence 1
        {
            'clause': 'S(Head:Nab:中文字|particle:Td:耶)',
            'delim': ',',
        },
        {
            'clause': '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈),
            'delim': '。',
        },
    ],
    [ # Sentence 2
        {
            'clause': None,
            'delim': '「',
        },
        {
            'clause': 'VP(Head:VH11:完蛋|particle:Ta:了),
            'delim': '!」',
        },
        {
            'clause': 'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)',
            'delim': '',
        },
    ],
]
item_class

alias of ParseSentence

ckipnlp.container.seg module

This module provides containers for word-segmented sentences.

class ckipnlp.container.seg.SegSentence(initlist=None)[source]

Bases: ckipnlp.container.base.BaseSentence0

A word-segmented sentence.

Data Structure Examples

Text format

Used for from_text() and to_text().

'中文字 耶 , 啊 哈 哈哈 。' # Words segmented by \u3000 (full-width space)
List/Dict format

Used for from_list(), to_list(), from_dict(), and to_dict().

[ '中文字', '耶', ',', '啊', '哈', '哈哈', '。', ]

Note

This class is also used for part-of-speech tagging.

item_class

alias of builtins.str

class ckipnlp.container.seg.SegParagraph(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList

A list of word-segmented sentences.

Data Structure Examples

Text format

Used for from_text() and to_text().

[
    '中文字 耶 , 啊 哈 哈 。',        # Sentence 1
    '「 完蛋 了 ! 」 , 畢卡索 他 想', # Sentence 2
]
List/Dict format

Used for from_list(), to_list(), from_dict(), and to_dict().

[
    [ '中文字', '耶', ',', '啊', '哈', '哈哈', '。', ],            # Sentence 1
    [ '「', '完蛋', '了', '!', '」', ',', '畢卡索', '他', '想', ], # Sentence 2
]

Note

This class is also used for part-of-speech tagging.

item_class

alias of SegSentence

ckipnlp.container.text module

This module provides containers for text sentences.

class ckipnlp.container.text.TextParagraph(initlist=None)[source]

Bases: ckipnlp.container.base.BaseList0

A list of text sentence.

Data Structure Examples

Text/List/Dict format

Used for from_text(), to_text(), from_list(), to_list(), from_dict(), and to_dict().

[
    '中文字耶,啊哈哈哈。',    # Sentence 1
    '「完蛋了!」畢卡索他想', # Sentence 2
]
item_class

alias of builtins.str

ckipnlp.driver package

This module implements CKIPNLP drivers.

Submodules

ckipnlp.driver.base module

This module provides base drivers.

class ckipnlp.driver.base.DriverRegister[source]

Bases: object

The driver registering utility.

class ckipnlp.driver.base.BaseDriver(*, lazy=False)[source]

Bases: object

The base CKIPNLP driver.

class ckipnlp.driver.base.DummyDriver(*, lazy=False)[source]

Bases: ckipnlp.driver.base.BaseDriver

The dummy driver.

ckipnlp.driver.classic module

This module provides drivers with CkipClassic backend.

class ckipnlp.driver.classic.CkipClassicWordSegmenter(*, lazy=False, do_pos=False, lexicons=None)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP word segmentation driver with CkipClassic backend.

Parameters
  • lazy (bool) – Lazy initialize the driver.

  • do_pos (bool) – Returns POS-tag or not

  • lexicons (Iterable[Tuple[str, str]]) – A list of the lexicon words and their POS-tags.

__call__(*, text)

Apply word segmentation.

Parameters

text (TextParagraph) — The sentences.

Returns
  • ws (TextParagraph) — The word-segmented sentences.

  • pos (TextParagraph) — The part-of-speech sentences. (returns if do_pos is set.)

class ckipnlp.driver.classic.CkipClassicConParser(*, lazy=False)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP constituency parsing driver with CkipClassic backend.

Parameters

lazy (bool) – Lazy initialize the driver.

__call__(*, ws, pos)

Apply constituency parsing.

Parameters
Returns

conparse (ParseSentence) — The constituency-parsing sentences.

ckipnlp.driver.coref module

This module provides built-in coreference resolution driver.

class ckipnlp.driver.coref.CkipCorefChunker(*, lazy=False)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP coreference resolution driver.

Parameters

lazy (bool) – Lazy initialize the driver.

__call__(*, conparse)

Apply coreference delectation.

Parameters

conparse (ParseParagraph) — The constituency-parsing sentences.

Returns

coref (CorefParagraph) — The coreference results.

static transform_ws(*, text, ws, ner)[source]

Transform word-segmented sentence lists (create a new instance).

static transform_pos(*, ws, pos, ner)[source]

Transform pos-tag sentence lists (modify in-place).

ckipnlp.driver.ss module

This module provides built-in sentence segmentation driver.

class ckipnlp.driver.ss.CkipSentenceSegmenter(*, lazy=False, delims='\n', keep_delims=False)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP sentence segmentation driver.

Parameters
  • lazy (bool) – Lazy initialize the driver.

  • delims (str) – The delimiters.

  • keep_delims (bool) – Keep the delimiters.

__call__(*, raw, keep_all=True)

Apply sentence segmentation.

Parameters

raw (str) — The raw text.

Returns

text (TextParagraph) — The sentences.

ckipnlp.driver.tagger module

This module provides drivers with CkipTagger backend.

class ckipnlp.driver.tagger.CkipTaggerWordSegmenter(*, lazy=False, disable_cuda=True, recommend_lexicons={}, coerce_lexicons={}, **opts)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP word segmentation driver with CkipTagger backend.

Parameters
  • lazy (bool) – Lazy initialize the driver.

  • disable_cuda (bool) – Disable GPU usage.

  • recommend_lexicons (Mapping[str, float]) – A mapping of lexicon words to their relative weights.

  • coerce_lexicons (Mapping[str, float]) – A mapping of lexicon words to their relative weights.

Other Parameters

**opts – Extra options for ckiptagger.WS.__call__(). (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)

__call__(*, text)

Apply word segmentation.

Parameters

text (TextParagraph) — The sentences.

Returns

ws (TextParagraph) — The word-segmented sentences.

class ckipnlp.driver.tagger.CkipTaggerPosTagger(*, lazy=False, disable_cuda=True, **opts)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP part-of-speech tagging driver with CkipTagger backend.

Parameters
  • lazy (bool) – Lazy initialize the driver.

  • disable_cuda (bool) – Disable GPU usage.

Other Parameters

**opts – Extra options for ckiptagger.POS.__call__(). (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)

__call__(*, text)

Apply part-of-speech tagging.

Parameters

ws (TextParagraph) — The word-segmented sentences.

Returns

pos (TextParagraph) — The part-of-speech sentences.

class ckipnlp.driver.tagger.CkipTaggerNerChunker(*, lazy=False, disable_cuda=True, **opts)[source]

Bases: ckipnlp.driver.base.BaseDriver

The CKIP named-entity recognition driver with CkipTagger backend.

Parameters
  • lazy (bool) – Lazy initialize the driver.

  • disable_cuda (bool) – Disable GPU usage.

Other Parameters

**opts – Extra options for ckiptagger.NER.__call__(). (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)

__call__(*, text)

Apply named-entity recognition.

Parameters
Returns

ner (NerParagraph) — The named-entity recognition results.

ckipnlp.pipeline package

This module implements CKIPNLP pipelines.

Submodules

ckipnlp.pipeline.coref module

This module provides coreference resolution pipeline.

class ckipnlp.pipeline.coref.CkipCorefDocument(*, ws=None, pos=None, conparse=None, coref=None)[source]

Bases: collections.abc.Mapping

The coreference document.

Variables
class ckipnlp.pipeline.coref.CkipCorefPipeline(*, coref_chunker='default', lazy=True, opts={}, **kwargs)[source]

Bases: ckipnlp.pipeline.kernel.CkipPipeline

The coreference resolution pipeline.

Parameters
  • sentence_segmenter (str) – The type of sentence segmenter.

  • word_segmenter (str) – The type of word segmenter.

  • pos_tagger (str) – The type of part-of-speech tagger.

  • ner_chunker (str) – The type of named-entity recognition chunker.

  • con_parser (str) – The type of constituency parser.

  • coref_chunker (str) – The type of coreference resolution chunker.

Other Parameters
  • lazy (bool) – Lazy initialize the drivers.

  • opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.

__call__(doc)[source]

Apply coreference delectation.

Parameters

doc (CkipDocument) – The input document.

Returns

corefdoc (CkipCorefDocument) – The coreference document.

Note

doc is also modified if necessary dependencies (ws, pos, ner) is not computed yet.

get_coref(doc, corefdoc)[source]

Apply coreference delectation.

Parameters
Returns

corefdoc.coref (CorefParagraph) – The coreference results.

Note

This routine modify corefdoc inplace.

doc is also modified if necessary dependencies (ws, pos, ner) is not computed yet.

ckipnlp.pipeline.kernel module

This module provides kernel CKIPNLP pipeline.

class ckipnlp.pipeline.kernel.CkipDocument(*, raw=None, text=None, ws=None, pos=None, ner=None, conparse=None)[source]

Bases: collections.abc.Mapping

The kernel document.

Variables
  • raw (str) – The unsegmented text input.

  • text (TextParagraph) – The sentences.

  • ws (SegParagraph) – The word-segmented sentences.

  • pos (SegParagraph) – The part-of-speech sentences.

  • ner (NerParagraph) – The named-entity recognition results.

  • conparse (ParseParagraph) – The constituency-parsing sentences.

class ckipnlp.pipeline.kernel.CkipPipeline(*, sentence_segmenter='default', word_segmenter='tagger', pos_tagger='tagger', con_parser='classic', ner_chunker='tagger', lazy=True, opts={})[source]

Bases: object

The kernel pipeline.

Parameters
  • sentence_segmenter (str) – The type of sentence segmenter.

  • word_segmenter (str) – The type of word segmenter.

  • pos_tagger (str) – The type of part-of-speech tagger.

  • ner_chunker (str) – The type of named-entity recognition chunker.

  • con_parser (str) – The type of constituency parser.

Other Parameters
  • lazy (bool) – Lazy initialize the drivers.

  • opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.

get_text(doc)[source]

Apply sentence segmentation.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.text (TextParagraph) – The sentences.

Note

This routine modify doc inplace.

get_ws(doc)[source]

Apply word segmentation.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.ws (SegParagraph) – The word-segmented sentences.

Note

This routine modify doc inplace.

get_pos(doc)[source]

Apply part-of-speech tagging.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.pos (SegParagraph) – The part-of-speech sentences.

Note

This routine modify doc inplace.

get_ner(doc)[source]

Apply named-entity recognition.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.ner (NerParagraph) – The named-entity recognition results.

Note

This routine modify doc inplace.

get_conparse(doc)[source]

Apply constituency parsing.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.conparse (ParseParagraph) – The constituency parsing sentences.

Note

This routine modify doc inplace.

ckipnlp.util package

This module implements extra utilities for CKIPNLP.

Submodules

ckipnlp.util.data module

This module implements data loading utilities for CKIPNLP.

ckipnlp.util.data.get_tagger_data()

Get CkipTagger data directory.

ckipnlp.util.data.install_tagger_data(src_dir, *, copy=False)

Link/Copy CkipTagger data directory.

ckipnlp.util.data.download_tagger_data()

Download CkipTagger data directory.

ckipnlp.util.logger module

This module implements logging utilities for CKIPNLP.

ckipnlp.util.logger.get_logger()[source]

Get the CKIPNLP logger.

Index

Module Index