CKIP CoreNLP¶
Introduction¶
CKIP CoreNLP Toolkit¶
Features¶
Sentence Segmentation
Word Segmentation
Part-of-Speech Tagging
Named-Entity Recognition
Constituency Parsing
Coreference Resolution
Online Demo¶
Installation¶
Requirements¶
Python 3.6+
TreeLib 1.5+
CkipTagger 0.1.1+ [Optional, Recommended]
CkipClassic 1.0+ [Optional]
TensorFlow / TensorFlow-GPU 1.13.1+, <2 [Required by CkipTagger]
Driver Requirements¶
Driver |
Built-in |
CkipTagger |
CkipClassic |
---|---|---|---|
Sentence Segmentation |
✔ |
||
Word Segmentation† |
✔ |
✔ |
|
Part-of-Speech Tagging† |
✔ |
✔ |
|
Constituency Parsing |
✔ |
||
Named-Entity Recognition |
✔ |
||
Coreference Resolution‡ |
✔ |
✔ |
✔ |
† These drivers require only one of either backends.
‡ Coreference implementation does not require any backend, but requires results from word segmentation, part-of-speech tagging, constituency parsing, and named-entity recognition.
Installation via Pip¶
No backend (not recommended):
pip install ckipnlp
.With CkipTagger backend (recommended):
pip install ckipnlp[tagger]
orpip install ckipnlp[tagger-gpu]
.With CkipClassic backend: Please refer https://ckip-classic.readthedocs.io/en/latest/main/readme.html#installation for CkipClassic installation guide.
Usage¶
See https://ckipnlp.readthedocs.io/ for API details.
Usage¶
CkipNLP provides a set of human language technology tools, including
Sentence Segmentation
Word Segmentation
Part-of-Speech Tagging
Named-Entity Recognition
Constituency Parsing
Coreference Resolution
The library is build around three types of classes:
Containers such as
SegParagraph
are the basic data structures for inputs and outputs.Drivers such as
CkipTaggerWordSegmenter
that apply specific tool on the inputs.Pipelines such as
CkipPipeline
are collections of drivers that automatically handles the dependencies between inputs and outputs.
Containers¶
Containers Prototypes¶
All the container objects can be convert from/to other formats:
from_text()
,to_text()
for plain-text conversions;from_list()
,to_list()
for list-like python object conversions;from_dict()
,to_dict()
for dictionary-like python object (key-value mappings) conversions;from_json()
,to_json()
for JSON format conversions (based-on dictionary-like format conversions).
Here are the interfaces, where CONTAINER_CLASS
refers to the container class.
obj = CONTAINER_CLASS.from_text(plain_text)
plain_text = obj.to_text()
obj = CONTAINER_CLASS.from_list([ value1, value2 ])
list_obj = obj.to_list()
obj = CONTAINER_CLASS.from_dict({ key: value })
dict_obj = obj.to_dict()
obj = CONTAINER_CLASS.from_json(json_str)
json_str = obj.to_json()
Note that not all container provide all above conversions. Here is the table of implemented methods. Please refer the documentation of each container for format details.
Container |
Item |
from/to text |
from/to list, dict, json |
---|---|---|---|
|
✔ |
✔ |
|
|
✔ |
✔ |
|
✔ |
✔ |
||
✘ |
✔ |
||
✘ |
✔ |
||
✘ |
✔ |
||
only to |
✔ |
||
only to |
✔ |
||
only to |
✔ |
||
only to |
✔ |
||
only to |
✔ |
||
only to |
✔ |
WS with POS¶
There are also conversion routines for word-segmentation and part-of-speech containers jointly. For example, WsPosToken
provides routines for a word (str
) with POS-tag (str
):
ws_obj, pos_obj = WsPosToken.from_text('中文字(Na)')
plain_text = WsPosToken.to_text(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_list([ '中文字', 'Na' ])
list_obj = WsPosToken.to_list(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_dict({ 'word': '中文字', 'pos': 'Na', })
dict_obj = WsPosToken.to_dict(ws_obj, pos_obj)
ws_obj, pos_obj = WsPosToken.from_json(json_str)
json_str = WsPosToken.to_json(ws_obj, pos_obj)
Similarly, WsPosSentence
/WsPosParagraph
provides routines for word-segmented and POS sentence/paragraph (SegSentence
/SegParagraph
) respectively.
Parse Tree¶
In addition to ParseClause
, there are also tree utilities base on TreeLib.
ParseTree
is the tree structure of a parse clause. One may use from_text()
and to_text()
for plain-text conversion; from_dict()
, to_dict()
for dictionary-like object conversion; and also from_json()
, to_json()
for JSON string conversion.
ParseTree
also provide from_penn()
and to_penn()
methods for Penn Treebank conversion. One may use to_penn()
together with SvgLing to generate SVG tree graphs.
ParseTree
is a TreeLib tree with ParseNode
as its nodes. The data of these nodes is stored in a ParseNodeData
(accessed by node.data
), which is a tuple of role
(semantic role), pos
(part-of-speech tagging), word
.
ParseTree
provides useful methods: get_heads()
finds the head words of the clause; get_relations()
extracts all relations in the clause; get_subjects()
returns the subjects of the clause.
from ckipnlp.container import ParseClause, ParseTree
# 我的早餐、午餐和晚餐都在那場比賽中被吃掉了
clause = ParseClause('S(goal:NP(possessor:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nab(DUMMY1:Nab(DUMMY1:Nab:早餐|Head:Caa:、|DUMMY2:Naa:午餐)|Head:Caa:和|DUMMY2:Nab:晚餐))|quantity:Dab:都|condition:PP(Head:P21:在|DUMMY:GP(DUMMY:NP(Head:Nac:比賽)|Head:Ng:中))|agent:PP(Head:P02:被)|Head:VC31:吃掉|aspect:Di:了)')
tree = clause.to_tree()
print('Show Tree')
tree.show()
print('Get Heads of {}'.format(tree[5]))
print('-- Semantic --')
for head in tree.get_heads(5, semantic=True): print(repr(head))
print('-- Syntactic --')
for head in tree.get_heads(5, semantic=False): print(repr(head))
print()
print('Get Relations of {}'.format(tree[0]))
print('-- Semantic --')
for rel in tree.get_relations(0, semantic=True): print(repr(rel))
print('-- Syntactic --')
for rel in tree.get_relations(0, semantic=False): print(repr(rel))
print()
# 我和食物真的都很不開心
tree_text = 'S(theme:NP(DUMMY1:NP(Head:Nhaa:我)|Head:Caa:和|DUMMY2:NP(Head:Naa:食物))|evaluation:Dbb:真的|quantity:Dab:都|degree:Dfa:很|negation:Dc:不|Head:VH21:開心)'
tree = ParseTree.from_text(tree_text)
print('Show Tree')
tree.show()
print('Get get_subjects of {}'.format(tree[0]))
print('-- Semantic --')
for subject in tree.get_subjects(0, semantic=True): print(repr(subject))
print('-- Syntactic --')
for subject in tree.get_subjects(0, semantic=False): print(repr(subject))
print()
Drivers¶
-
class
Driver
(*, lazy=False, ...) The prototype of CkipNLP Drivers.
- Parameters
lazy (bool) – Lazy initialize the driver. (Call
init()
at the first call of__call__()
instead.)
-
driver_type
: str¶ The type of this driver.
-
driver_family
: str¶ The family of this driver.
-
driver_inputs
: Tuple[str, …]¶ The inputs of this driver.
-
init
()¶ Initialize the driver (by calling the
_init()
function).
-
__call__
(*, ...)¶ Call the driver (by calling the
_call()
function).
Here are the list of the drivers:
Driver Type \ Family |
|
|
|
---|---|---|---|
Sentence Segmenter |
|||
Word Segmenter |
|||
Pos Tagger |
|||
Ner Chunker |
|||
Constituency Parser |
|||
Coref Chunker |
† Not compatible with CkipCorefPipeline
.
Pipelines¶
Kernel Pipeline¶
The CkipPipeline
connect drivers of sentence segmentation, word segmentation, part-of-speech tagging, named-entity recognition, and sentence parsing.
The CkipDocument
is the workspace of CkipPipeline
with input/output data. Note that CkipPipeline
will store the result into CkipDocument
in-place.
The CkipPipeline
will compute all necessary dependencies. For example, if one calls get_ner()
with only raw-text input, the pipeline will automatically calls get_text()
, get_ws()
, get_pos()
.
from ckipnlp.pipeline import CkipPipeline, CkipDocument
pipeline = CkipPipeline()
doc = CkipDocument(raw='中文字耶,啊哈哈哈')
# Word Segmentation
pipeline.get_ws(doc)
print(doc.ws)
for line in doc.ws:
print(line.to_text())
# Part-of-Speech Tagging
pipeline.get_pos(doc)
print(doc.pos)
for line in doc.pos:
print(line.to_text())
# Named-Entity Recognition
pipeline.get_ner(doc)
print(doc.ner)
# Constituency Parsing
pipeline.get_conparse(doc)
print(doc.conparse)
################################################################
from ckipnlp.container.util.wspos import WsPosParagraph
# Word Segmentation & Part-of-Speech Tagging
for line in WsPosParagraph.to_text(doc.ws, doc.pos):
print(line)
Co-Reference Pipeline¶
The CkipCorefPipeline
is a extension of CkipPipeline
by providing coreference resolution. The pipeline first do named-entity recognition as CkipPipeline
do, followed by alignment algorithms to fix the word-segmentation and part-of-speech tagging outputs, and then do coreference resolution based sentence parsing result.
The CkipCorefDocument
is the workspace of CkipCorefPipeline
with input/output data. Note that CkipCorefDocument
will store the result into CkipCorefPipeline
.
from ckipnlp.pipeline import CkipCorefPipeline, CkipDocument
pipeline = CkipCorefPipeline()
doc = CkipDocument(raw='畢卡索他想,完蛋了')
# Co-Reference
corefdoc = pipeline(doc)
print(corefdoc.coref)
for line in corefdoc.coref:
print(line.to_text())
Tables of Tags¶
Part-of-Speech Tags¶
Tag |
Description |
---|---|
A |
非謂形容詞 |
Caa |
對等連接詞 |
Cab |
連接詞,如:等等 |
Cba |
連接詞,如:的話 |
Cbb |
關聯連接詞 |
D |
副詞 |
Da |
數量副詞 |
Dfa |
動詞前程度副詞 |
Dfb |
動詞後程度副詞 |
Di |
時態標記 |
Dk |
句副詞 |
DM |
定量式 |
I |
感嘆詞 |
Na |
普通名詞 |
Nb |
專有名詞 |
Nc |
地方詞 |
Ncd |
位置詞 |
Nd |
時間詞 |
Nep |
指代定詞 |
Neqa |
數量定詞 |
Neqb |
後置數量定詞 |
Nes |
特指定詞 |
Neu |
數詞定詞 |
Nf |
量詞 |
Ng |
後置詞 |
Nh |
代名詞 |
Nv |
名物化動詞 |
P |
介詞 |
T |
語助詞 |
VA |
動作不及物動詞 |
VAC |
動作使動動詞 |
VB |
動作類及物動詞 |
VC |
動作及物動詞 |
VCL |
動作接地方賓語動詞 |
VD |
雙賓動詞 |
VF |
動作謂賓動詞 |
VE |
動作句賓動詞 |
VG |
分類動詞 |
VH |
狀態不及物動詞 |
VHC |
狀態使動動詞 |
VI |
狀態類及物動詞 |
VJ |
狀態及物動詞 |
VK |
狀態句賓動詞 |
VL |
狀態謂賓動詞 |
V_2 |
有 |
DE |
的之得地 |
SHI |
是 |
FW |
外文 |
COLONCATEGORY |
冒號 |
COMMACATEGORY |
逗號 |
DASHCATEGORY |
破折號 |
DOTCATEGORY |
點號 |
ETCCATEGORY |
刪節號 |
EXCLAMATIONCATEGORY |
驚嘆號 |
PARENTHESISCATEGORY |
括號 |
PAUSECATEGORY |
頓號 |
PERIODCATEGORY |
句號 |
QUESTIONCATEGORY |
問號 |
SEMICOLONCATEGORY |
分號 |
SPCHANGECATEGORY |
雙直線 |
WHITESPACE |
空白 |
Constituency Parsing Tags¶
Tag |
Description |
---|---|
S |
表示結構樹為句子,以述詞為中心語,此外當主詞和述詞的賓語或補語的型式為句子或子句的時候,詞組結構標記為S,不為NP。 |
VP |
述詞詞組,中心語為述詞(V)。 |
NP |
名詞詞組,中心語為名詞(N)。 |
GP |
方位詞詞組,中心語為方位詞(Ng),所帶論元角色為DUMMY1。 |
PP |
介詞詞組,中心語為介詞(P),所帶論元角色亦為DUMMY。 |
XP |
連接詞詞組,中心語為連接詞(C),X代表一個變數,XP的真正詞類由連接成分決定,例如:連接成分為述詞詞組(VP),則為述詞詞組(VP),連接成分為名詞詞組,則為名詞詞組(NP)。 |
DM |
定量詞詞組。 |
Constituency Parsing Roles¶
Role |
Description |
---|---|
#修飾物體名詞 |
|
apposition |
表物體的同位語,即指涉相同的物體。 |
possessor |
表物體的領屬者,包含成員、創造者、擁有者和整體等皆為領屬者。 |
predication |
表修飾物體的相關事件,為名詞的關係子句,與事件中心語有論元關係。 |
property |
表物體的特色和性質,也包含物體相關的時空訊息,是一個較上位而粗略的語意角色。 |
quantifier |
表名詞的數量修飾語,為數量定詞、定量詞等等。 |
#修飾事件動詞–事件參與者角色 |
|
agent |
表事件中的肇始者,動作動詞的行動者。 |
benefactor |
表受益的對象,但非主要賓語。 |
causer |
表事件的肇始者,但肇始者並未主動促使事件發生。 |
companion |
表主語的隨同對象。 |
comparison |
表比較的對象,多在比較句中出現。 |
experiencer |
表感受所敘述的情緒感知狀況的主事者,為心靈類述語的主語。 |
goal |
表動作影響的對象,或者為心靈動作的受事對象,在有物件轉移的事件中則是個接受者或終點。 |
range |
表分類的範疇或結果的幅度。為分類動詞及比較句的主要語意角色。 |
source |
表物件轉移的起點。 |
target |
述詞內容表達的對象或是轉移的方向。 |
theme |
表靜態及分類述詞敘述的對象或動態事件中描述存在或位移的主事者,以及因事件動作造成物體的狀態從無到有的受事者,皆使用這個語意角色。 |
topic |
表事件所論述的主題。 |
#修飾事件動詞–事件附加的角色 |
|
aspect |
表動作的時貌。 |
degree |
表狀態的程度。 |
deixis |
表動作附加的指示成分。 |
deontics |
表說話者對事件是否成真的態度,標示於此類型的法相副詞。 |
duration |
表事件持續的時間長度。 |
evaluation |
表評價的語氣成分。 |
epistemics |
表說話者對事件是否為真的猜測,標示於此類型的法相副詞。 |
frequency |
表事件的頻率。 |
instrument |
表動作時所使用的工具。 |
interjection |
表句中感嘆詞的角色。 |
location |
表事件發生的地點。 |
manner |
表主語的動作方式。 |
negation |
表否定。 |
particle |
表句尾說話者的語氣。 |
quantity |
表事物的數量。 |
standard |
表憑據。 |
time |
表事件發生的時間。 |
#修飾事件動詞–從屬關係的語意角色 |
|
addition |
表附加。 |
alternative |
表聯合複句中選擇的口氣。 |
avoidance |
表應避免的情況。 |
complement |
表補充說明,進一步補充前一事件內容。 |
conclusion |
表引介出的結論。 |
condition |
表條件語氣的句子或是事件狀況。 |
concession |
表讓步語氣的連接。 |
contrast |
表轉折語氣。 |
conversion |
表引出轉變條件下的結果。 |
exclusion |
表屏除的對象。 |
hypothesis |
表假設的語氣。 |
listing |
表條列的項目。 |
purpose |
表目的。 |
reason |
表事件的原因。 |
rejection |
表取捨關係中的應捨部分。 |
result |
表事件的結果。 |
restriction |
表遞進語氣的前半部。 |
selection |
表取捨關係中的應取部分。 |
uncondition |
表與現況不符的假設。 |
whatever |
表不論何種條件。 |
#標記語法功能 |
|
DUMMY |
表未定的角色,需要靠其上位詞組的中心語才能決定。 |
DUMMY1 |
表未定的角色,需要靠其上位詞組的中心語才能決定。 |
DUMMY2 |
表未定的角色,需要靠其上位詞組的中心語才能決定。 |
Head |
表語法的中心語,通常也是語意的中心成分,句子或詞組皆有Head這個角色。 |
head |
在「的」的結構裡,語意和語法的中心語不同時,表示為語意的中心語成分,以別於語法的中心語。 |
nominal |
表名物化結構,用來標示中心語為名物化動詞的名詞短語中的「的」。 |
ckipnlp package¶
The Official CKIP CoreNLP Toolkit.
Subpackages
ckipnlp.container package¶
This module implements specialized container datatypes for CKIPNLP.
Subpackages
ckipnlp.container.util package¶
This module implements specialized utilities for CKIPNLP containers.
Submodules
ckipnlp.container.util.parse_tree module¶
This module provides tree containers for parsed sentences.
-
class
ckipnlp.container.util.parse_tree.
ParseNodeData
(role: str = None, pos: str = None, word: str = None)[source]¶ Bases:
ckipnlp.container.base.BaseTuple
,ckipnlp.container.util.parse_tree._ParseNodeData
A parse node.
- Variables
role (str) – the semantic role.
pos (str) – the POS-tag.
word (str) – the text term.
Note
This class is an subclass of
tuple
. To change the attribute, please create a new instance instead.Data Structure Examples
- Text format
Used for
from_text()
andto_text()
.'Head:Na:中文字' # role / POS-tag / text-term
- List format
Not implemented.
- Dict format
Used for
from_dict()
andto_dict()
.{ 'role': 'Head', # role 'pos': 'Na', # POS-tag 'word': '中文字', # text term }
-
class
ckipnlp.container.util.parse_tree.
ParseNode
(tag=None, identifier=None, expanded=True, data=None)[source]¶ Bases:
ckipnlp.container.base.Base
,treelib.node.Node
A parse node for tree.
- Variables
data (
ParseNodeData
) –
See also
treelib.tree.Node
Please refer https://treelib.readthedocs.io/ for built-in usages.
Data Structure Examples
- Text format
Not implemented.
- List format
Not implemented.
- Dict format
Used for
to_dict()
.{ 'role': 'Head', # role 'pos': 'Na', # POS-tag 'word': '中文字', # text term }
-
data_class
¶ alias of
ParseNodeData
-
class
ckipnlp.container.util.parse_tree.
ParseRelation
(head: ckipnlp.container.util.parse_tree.ParseNode, tail: ckipnlp.container.util.parse_tree.ParseNode, relation: ckipnlp.container.util.parse_tree.ParseNode)[source]¶ Bases:
ckipnlp.container.base.Base
,ckipnlp.container.util.parse_tree._ParseRelation
A parse relation.
- Variables
Notes
The parent of the relation node is always the common ancestor of the head node and tail node.
Data Structure Examples
- Text format
Not implemented.
- List format
Not implemented.
- Dict format
Used for
to_dict()
.{ 'tail': { 'role': 'Head', 'pos': 'Nab', 'word': '中文字' }, # head node 'tail': { 'role': 'particle', 'pos': 'Td', 'word': '耶' }, # tail node 'relation': 'particle', # relation }
-
class
ckipnlp.container.util.parse_tree.
ParseTree
(tree=None, deep=False, node_class=None, identifier=None)[source]¶ Bases:
ckipnlp.container.base.Base
,treelib.tree.Tree
A parse tree.
See also
treereelib.tree.Tree
Please refer https://treelib.readthedocs.io/ for built-in usages.
Data Structure Examples
- Text format
Used for
from_text()
andto_text()
.'S(Head:Nab:中文字|particle:Td:耶)'
- List format
Not implemented.
- Dict format
Used for
from_dict()
andto_dict()
. A dictionary such as{ 'id': 0, 'data': { ... }, 'children': [ ... ] }
, where'data'
is a dictionary with the same format asParseNodeData.to_dict()
, and'children'
is a list of dictionaries of subtrees with the same format as this tree.{ 'id': 0, 'data': { 'role': None, 'pos': 'S', 'word': None, }, 'children': [ { 'id': 1, 'data': { 'role': 'Head', 'pos': 'Nab', 'word': '中文字', }, 'children': [], }, { 'id': 2, 'data': { 'role': 'particle', 'pos': 'Td', 'word': '耶', }, 'children': [], }, ], }
- Penn Treebank format
Used for
from_penn()
andto_penn()
.[ 'S', [ 'Head:Nab', '中文字', ], [ 'particle:Td', '耶', ], ]
-
classmethod
from_text
(data)[source]¶ Construct an instance from text format.
- Parameters
data (str) – A parse tree in text format (
ParseClause.clause
).
See also
-
to_text
(node_id=None)[source]¶ Transform to plain text.
- Parameters
node_id (int) – Output the plain text format for the subtree under node_id.
- Returns
str
-
classmethod
from_dict
(data)[source]¶ Construct an instance from python built-in containers.
- Parameters
data (str) – A parse tree in dictionary format.
-
to_dict
(node_id=None)[source]¶ Transform to python built-in containers.
- Parameters
node_id (int) – Output the plain text format for the subtree under node_id.
- Returns
str
-
to_penn
(node_id=None, *, with_role=True, with_word=True, sep=':')[source]¶ Transform to Penn Treebank format.
- Parameters
node_id (int) – Output the plain text format for the subtree under node_id.
with_role (bool) – Contains role-tag or not.
with_word (bool) – Contains word or not.
sep (str) – The seperator between role and POS-tag.
- Returns
list
-
get_children
(node_id, *, role)[source]¶ Get children of a node with given role.
- Parameters
node_id (int) – ID of target node.
role (str) – the target role.
- Yields
ParseNode
– the children nodes with given role.
-
get_heads
(root_id=None, *, semantic=True, deep=True)[source]¶ Get all head nodes of a subtree.
- Parameters
root_id (int) – ID of the root node of target subtree.
semantic (bool) – use semantic/syntactic policy. For semantic mode, return
DUMMY
orhead
instead of syntacticHead
.deep (bool) – find heads recursively.
- Yields
ParseNode
– the head nodes.
-
get_relations
(root_id=None, *, semantic=True)[source]¶ Get all relations of a subtree.
- Parameters
root_id (int) – ID of the subtree root node.
semantic (bool) – please refer
get_heads()
for policy detail.
- Yields
ParseRelation
– the relations.
-
get_subjects
(root_id=None, *, semantic=True, deep=True)[source]¶ Get the subject node of a subtree.
- Parameters
root_id (int) – ID of the root node of target subtree.
semantic (bool) – please refer
get_heads()
for policy detail.deep (bool) – please refer
get_heads()
for policy detail.
- Yields
ParseNode
– the subject node.
Notes
A node can be a subject if either:
is a head of NP
is a head of a subnode (N) of S with subject role
is a head of a subnode (N) of S with neutral role and before the head (V) of S
ckipnlp.container.util.wspos module¶
This module provides containers for word-segmented sentences with part-of-speech-tags.
-
class
ckipnlp.container.util.wspos.
WsPosToken
(word: str = None, pos: str = None)[source]¶ Bases:
ckipnlp.container.base.BaseTuple
,ckipnlp.container.util.wspos._WsPosToken
A word with POS-tag.
- Variables
word (str) – the word.
pos (str) – the POS-tag.
Note
This class is an subclass of tuple. To change the attribute, please create a new instance instead.
Data Structure Examples
- Text format
Used for
from_text()
andto_text()
.'中文字(Na)' # word / POS-tag
- List format
Used for
from_list()
andto_list()
.[ '中文字', # word 'Na', # POS-tag ]
- Dict format
Used for
from_dict()
andto_dict()
.{ 'word': '中文字', # word 'pos': 'Na', # POS-tag }
-
class
ckipnlp.container.util.wspos.
WsPosSentence
[source]¶ Bases:
object
A helper class for data conversion of word-segmented and part-of-speech sentences.
-
classmethod
from_text
(data)[source]¶ Convert text format to word-segmented and part-of-speech sentences.
- Parameters
data (str) – text such as
'中文字(Na)\u3000耶(T)'
.- Returns
SegSentence
– the word sentenceSegSentence
– the POS-tag sentence.
-
static
to_text
(word, pos)[source]¶ Convert text format to word-segmented and part-of-speech sentences.
- Parameters
word (
SegSentence
) – the word sentencepos (
SegSentence
) – the POS-tag sentence.
- Returns
str – text such as
'中文字(Na)\u3000耶(T)'
.
-
classmethod
-
class
ckipnlp.container.util.wspos.
WsPosParagraph
[source]¶ Bases:
object
A helper class for data conversion of word-segmented and part-of-speech sentence lists.
-
classmethod
from_text
(data)[source]¶ Convert text format to word-segmented and part-of-speech sentence lists.
- Parameters
data (Sequence[str]) – list of sentences such as
'中文字(Na)\u3000耶(T)'
.- Returns
SegParagraph
– the word sentence listSegParagraph
– the POS-tag sentence list.
-
static
to_text
(word, pos)[source]¶ Convert text format to word-segmented and part-of-speech sentence lists.
- Parameters
word (
SegParagraph
) – the word sentence listpos (
SegParagraph
) – the POS-tag sentence list.
- Returns
List[str] – list of sentences such as
'中文字(Na)\u3000耶(T)'
.
-
classmethod
Submodules
ckipnlp.container.base module¶
This module provides base containers.
-
class
ckipnlp.container.base.
Base
[source]¶ Bases:
object
The base CKIPNLP container.
-
abstract classmethod
from_text
(data)[source]¶ Construct an instance from text format.
- Parameters
data (str) –
-
abstract classmethod
from_list
(data)[source]¶ Construct an instance from python built-in containers.
-
abstract classmethod
from_dict
(data)[source]¶ Construct an instance from python built-in containers.
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
abstract classmethod
-
class
ckipnlp.container.base.
BaseTuple
[source]¶ Bases:
ckipnlp.container.base.Base
The base CKIPNLP tuple.
-
classmethod
from_list
(data)[source]¶ Construct an instance from python built-in containers.
- Parameters
data (list) –
-
classmethod
-
class
ckipnlp.container.base.
BaseList
(initlist=None)[source]¶ Bases:
ckipnlp.container.base._BaseList
,ckipnlp.container.base._InterfaceItem
The base CKIPNLP list.
-
item_class
= Not Implemented¶ Must be a CKIPNLP container class.
-
-
class
ckipnlp.container.base.
BaseList0
(initlist=None)[source]¶ Bases:
ckipnlp.container.base._BaseList
,ckipnlp.container.base._InterfaceBuiltInItem
The base CKIPNLP list with built-in item class.
-
item_class
= Not Implemented¶ Must be a built-in type.
-
ckipnlp.container.coref module¶
This module provides containers for coreference sentences.
-
class
ckipnlp.container.coref.
CorefToken
(word, coref, idx, **kwargs)[source]¶ Bases:
ckipnlp.container.base.BaseTuple
,ckipnlp.container.coref._CorefToken
A coreference token.
- Variables
word (str) – the token word.
coref (Tuple[int, str]) –
the coreference ID and type. None if not a coreference source or target.
- type:
’source’: coreference source.
’target’: coreference target.
’zero’: null element coreference target.
idx (Tuple[int, int]) – the node indexes (clause index, token index) in parse tree. idx[1] = None if this node is a null element or the punctuations.
Note
This class is an subclass of
tuple
. To change the attribute, please create a new instance instead.Data Structure Examples
- Text format
Used for
to_text()
.'畢卡索_0'
- List format
Used for
from_list()
andto_list()
.[ '畢卡索', # token word (0, 'source'), # coref ID and type (2, 2), # node index ]
- Dict format
Used for
from_dict()
andto_dict()
.{ 'word': '畢卡索', # token word 'coref': (0, 'source'), # coref ID and type 'idx': (2, 2), # node index }
-
class
ckipnlp.container.coref.
CorefSentence
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseSentence
A list of coreference sentence.
Data Structure Examples
- Text format
Used for
to_text()
.# Token segmented by \u3000 (full-width space) '「 完蛋 了 !」 , 畢卡索_0 他_0 想'
- List format
Used for
from_list()
andto_list()
.[ [ '「', None, (0, None,), ], [ '完蛋', None, (1, 0,), ], [ '了', None, (1, 1,), ], [ '!」', None, (1, None,), ], [ '畢卡索', (0, 'source'), (2, 2,), ], [ '他', (0, 'target'), (2, 3,), ], [ '想', None, (2, 4,), ], ]
- Dict format
Used for
from_dict()
andto_dict()
.[ { 'word': '「', 'coref': None, 'idx': (0, None,), }, { 'word': '完蛋', 'coref': None, 'idx': (1, 0,), }, { 'word': '了', 'coref': None, 'idx': (1, 1,), }, { 'word': '!」', 'coref': None, 'idx': (1, None,), }, { 'word': '畢卡索', 'coref': (0, 'source'), 'idx': (2, 2,), }, { 'word': '他', 'coref': (0, 'target'), 'idx': (2, 3,), }, { 'word': '想', 'coref': None, 'idx': (2, 4,), }, ]
-
item_class
¶ alias of
CorefToken
-
class
ckipnlp.container.coref.
CorefParagraph
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList
A list of coreference sentence.
Data Structure Examples
- Text format
Used for
to_text()
.[ '「 完蛋 了 !」 , 畢卡索_0 他_0 想', # Sentence 1 '但是 None_0 也 沒有 辦法', # Sentence 1 ]
- List format
Used for
from_list()
andto_list()
.[ [ # Sentence 1 [ '「', None, (0, None,), ], [ '完蛋', None, (1, 0,), ], [ '了', None, (1, 1,), ], [ '!」', None, (1, None,), ], [ '畢卡索', (0, 'source'), (2, 2,), ], [ '他', (0, 'target'), (2, 3,), ], [ '想', None, (2, 4,), ], ], [ # Sentence 2 [ '但是', None, (0, 1,), ], [ None, (0, 'zero'), (0, None,), ], [ '也', None, (0, 2,), ], [ '沒有', None, (0, 3,), ], [ '辦法', None, (0, 5,), ], ], ]
- Dict format
Used for
from_dict()
andto_dict()
.[ [ # Sentence 1 { 'word': '「', 'coref': None, 'idx': (0, None,), }, { 'word': '完蛋', 'coref': None, 'idx': (1, 0,), }, { 'word': '了', 'coref': None, 'idx': (1, 1,), }, { 'word': '!」', 'coref': None, 'idx': (1, None,), }, { 'word': '畢卡索', 'coref': (0, 'source'), 'idx': (2, 2,), }, { 'word': '他', 'coref': (0, 'target'), 'idx': (2, 3,), }, { 'word': '想', 'coref': None, 'idx': (2, 4,), }, ], [ # Sentence 2 { 'word': '但是', 'coref': None, 'idx': (0, 1,), }, { 'word': None, 'coref': (0, 'zero'), 'idx': (0, None,), }, { 'word': '也', 'coref': None, 'idx': (0, 2,), }, { 'word': '沒有', 'coref': None, 'idx': (0, 3,), }, { 'word': '辦法', 'coref': None, 'idx': (0, 5,), }, ], ]
-
item_class
¶ alias of
CorefSentence
ckipnlp.container.ner module¶
This module provides containers for NER sentences.
-
class
ckipnlp.container.ner.
NerToken
(word, ner, idx, **kwargs)[source]¶ Bases:
ckipnlp.container.base.BaseTuple
,ckipnlp.container.ner._NerToken
A named-entity recognition token.
- Variables
word (str) – the token word.
ner (str) – the NER-tag.
idx (Tuple[int, int]) – the starting / ending index.
Note
This class is an subclass of
tuple
. To change the attribute, please create a new instance instead.Data Structure Examples
- Text format
Not implemented
- List format
Used for
from_list()
andto_list()
.[ '中文字' # token word 'LANGUAGE', # NER-tag (0, 3), # starting / ending index. ]
- Dict format
Used for
from_dict()
andto_dict()
.{ 'word': '中文字', # token word 'ner': 'LANGUAGE', # NER-tag 'idx': (0, 3), # starting / ending index. }
- CkipTagger format
Used for
from_tagger()
andto_tagger()
.( 0, # starting index 3, # ending index 'LANGUAGE', # NER-tag '中文字', # token word )
-
class
ckipnlp.container.ner.
NerSentence
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseSentence
A named-entity recognition sentence.
Data Structure Examples
- Text format
Not implemented
- List format
Used for
from_list()
andto_list()
.[ [ '美國', 'GPE', (0, 2), ], # name-entity 1 [ '參議院', 'ORG', (3, 5), ], # name-entity 2 ]
- Dict format
Used for
from_dict()
andto_dict()
.[ { 'word': '美國', 'ner': 'GPE', 'idx': (0, 2), }, # name-entity 1 { 'word': '參議院', 'ner': 'ORG', 'idx': (3, 5), }, # name-entity 2 ]
- CkipTagger format
Used for
from_tagger()
andto_tagger()
.[ ( 0, 2, 'GPE', '美國', ), # name-entity 1 ( 3, 5, 'ORG', '參議院', ), # name-entity 2 ]
-
class
ckipnlp.container.ner.
NerParagraph
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList
A list of named-entity recognition sentence.
Data Structure Examples
- Text format
Not implemented
- List format
Used for
from_list()
andto_list()
.[ [ # Sentence 1 [ '中文字', 'LANGUAGE', (0, 3), ], ], [ # Sentence 2 [ '美國', 'GPE', (0, 2), ], [ '參議院', 'ORG', (3, 5), ], ], ]
- Dict format
Used for
from_dict()
andto_dict()
.[ [ # Sentence 1 { 'word': '中文字', 'ner': 'LANGUAGE', 'idx': (0, 3), }, ], [ # Sentence 2 { 'word': '美國', 'ner': 'GPE', 'idx': (0, 2), }, { 'word': '參議院', 'ner': 'ORG', 'idx': (3, 5), }, ], ]
- CkipTagger format
Used for
from_tagger()
andto_tagger()
.[ [ # Sentence 1 ( 0, 3, 'LANGUAGE', '中文字', ), ], [ # Sentence 2 ( 0, 2, 'GPE', '美國', ), ( 3, 5, 'ORG', '參議院', ), ], ]
-
item_class
¶ alias of
NerSentence
ckipnlp.container.parse module¶
This module provides containers for parsed sentences.
-
class
ckipnlp.container.parse.
ParseClause
(clause: str = None, delim: str = '')[source]¶ Bases:
ckipnlp.container.base.BaseTuple
,ckipnlp.container.parse._ParseClause
A parse clause.
- Variables
clause (str) – the parse clause.
delim (str) – the punctuations after this clause.
Note
This class is an subclass of
tuple
. To change the attribute, please create a new instance instead.Data Structure Examples
- Text format
Used for
to_text()
.'S(Head:Nab:中文字|particle:Td:耶)' # delim are ignored
- List format
Used for
from_list()
, andto_list()
.[ 'S(Head:Nab:中文字|particle:Td:耶)', # parse clause ',', # punctuations ]
- Dict format
Used for
from_dict()
andto_dict()
.{ 'clause': 'S(Head:Nab:中文字|particle:Td:耶)', # parse clause 'delim': ',', # punctuations }
-
class
ckipnlp.container.parse.
ParseSentence
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList
A parse sentence.
Data Structure Examples
- Text format
Used for
to_text()
.[ # delim are ignored 'S(Head:Nab:中文字|particle:Td:耶)', # Clause 1 '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), # Clause 2 ]
- List format
Used for
from_list()
, andto_list()
.[ [ # Clause 1 'S(Head:Nab:中文字|particle:Td:耶)', ',', ], [ # Clause 2 '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), '。', ], ]
- Dict format
Used for
from_dict()
andto_dict()
.[ { # Clause 1 'clause': 'S(Head:Nab:中文字|particle:Td:耶)', 'delim': ',', }, { # Clause 2 'clause': '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), 'delim': '。', }, ]
-
item_class
¶ alias of
ParseClause
-
class
ckipnlp.container.parse.
ParseParagraph
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList
A list of parse sentence.
Data Structure Examples
- Text format
Used for
to_text()
.[ # delim are ignored [ # Sentence 1 'S(Head:Nab:中文字|particle:Td:耶)', '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), ], [ # Sentence 2 None, 'VP(Head:VH11:完蛋|particle:Ta:了), 'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)', ], ]
- List format
Used for
from_list()
, andto_list()
.[ [ # Sentence 1 [ 'S(Head:Nab:中文字|particle:Td:耶)', ',', ], [ '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), '。', ], ], [ # Sentence 2 [ None, '「', ], [ 'VP(Head:VH11:完蛋|particle:Ta:了), '!」', ], [ 'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)', '', ], ], ]
- Dict format
Used for
from_dict()
, andto_dict()
.[ [ # Sentence 1 { 'clause': 'S(Head:Nab:中文字|particle:Td:耶)', 'delim': ',', }, { 'clause': '%(particle:I:啊|manner:Dh:哈|manner:Dh:哈|time:Dh:哈), 'delim': '。', }, ], [ # Sentence 2 { 'clause': None, 'delim': '「', }, { 'clause': 'VP(Head:VH11:完蛋|particle:Ta:了), 'delim': '!」', }, { 'clause': 'S(agent:NP(apposition:Nba:畢卡索|Head:Nhaa:他)|Head:VE2:想)', 'delim': '', }, ], ]
-
item_class
¶ alias of
ParseSentence
ckipnlp.container.seg module¶
This module provides containers for word-segmented sentences.
-
class
ckipnlp.container.seg.
SegSentence
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseSentence0
A word-segmented sentence.
Data Structure Examples
- Text format
Used for
from_text()
andto_text()
.'中文字 耶 , 啊 哈 哈哈 。' # Words segmented by \u3000 (full-width space)
- List/Dict format
Used for
from_list()
,to_list()
,from_dict()
, andto_dict()
.[ '中文字', '耶', ',', '啊', '哈', '哈哈', '。', ]
Note
This class is also used for part-of-speech tagging.
-
item_class
¶ alias of
builtins.str
-
class
ckipnlp.container.seg.
SegParagraph
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList
A list of word-segmented sentences.
Data Structure Examples
- Text format
Used for
from_text()
andto_text()
.[ '中文字 耶 , 啊 哈 哈 。', # Sentence 1 '「 完蛋 了 ! 」 , 畢卡索 他 想', # Sentence 2 ]
- List/Dict format
Used for
from_list()
,to_list()
,from_dict()
, andto_dict()
.[ [ '中文字', '耶', ',', '啊', '哈', '哈哈', '。', ], # Sentence 1 [ '「', '完蛋', '了', '!', '」', ',', '畢卡索', '他', '想', ], # Sentence 2 ]
Note
This class is also used for part-of-speech tagging.
-
item_class
¶ alias of
SegSentence
ckipnlp.container.text module¶
This module provides containers for text sentences.
-
class
ckipnlp.container.text.
TextParagraph
(initlist=None)[source]¶ Bases:
ckipnlp.container.base.BaseList0
A list of text sentence.
Data Structure Examples
- Text/List/Dict format
Used for
from_text()
,to_text()
,from_list()
,to_list()
,from_dict()
, andto_dict()
.[ '中文字耶,啊哈哈哈。', # Sentence 1 '「完蛋了!」畢卡索他想', # Sentence 2 ]
-
item_class
¶ alias of
builtins.str
ckipnlp.driver package¶
This module implements CKIPNLP drivers.
Submodules
ckipnlp.driver.base module¶
This module provides base drivers.
-
class
ckipnlp.driver.base.
DummyDriver
(*, lazy=False)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The dummy driver.
ckipnlp.driver.classic module¶
This module provides drivers with CkipClassic backend.
-
class
ckipnlp.driver.classic.
CkipClassicWordSegmenter
(*, lazy=False, do_pos=False, lexicons=None)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP word segmentation driver with CkipClassic backend.
- Parameters
lazy (bool) – Lazy initialize the driver.
do_pos (bool) – Returns POS-tag or not
lexicons (Iterable[Tuple[str, str]]) – A list of the lexicon words and their POS-tags.
-
__call__
(*, text)¶ Apply word segmentation.
- Parameters
text (
TextParagraph
) — The sentences.- Returns
ws (
TextParagraph
) — The word-segmented sentences.pos (
TextParagraph
) — The part-of-speech sentences. (returns if do_pos is set.)
-
class
ckipnlp.driver.classic.
CkipClassicConParser
(*, lazy=False)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP constituency parsing driver with CkipClassic backend.
- Parameters
lazy (bool) – Lazy initialize the driver.
-
__call__
(*, ws, pos)¶ Apply constituency parsing.
- Parameters
ws (
TextParagraph
) — The word-segmented sentences.pos (
TextParagraph
) — The part-of-speech sentences.
- Returns
conparse (
ParseSentence
) — The constituency-parsing sentences.
ckipnlp.driver.coref module¶
This module provides built-in coreference resolution driver.
-
class
ckipnlp.driver.coref.
CkipCorefChunker
(*, lazy=False)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP coreference resolution driver.
- Parameters
lazy (bool) – Lazy initialize the driver.
-
__call__
(*, conparse)¶ Apply coreference delectation.
- Parameters
conparse (
ParseParagraph
) — The constituency-parsing sentences.- Returns
coref (
CorefParagraph
) — The coreference results.
ckipnlp.driver.ss module¶
This module provides built-in sentence segmentation driver.
-
class
ckipnlp.driver.ss.
CkipSentenceSegmenter
(*, lazy=False, delims='\n', keep_delims=False)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP sentence segmentation driver.
- Parameters
lazy (bool) – Lazy initialize the driver.
delims (str) – The delimiters.
keep_delims (bool) – Keep the delimiters.
-
__call__
(*, raw, keep_all=True)¶ Apply sentence segmentation.
- Parameters
raw (str) — The raw text.
- Returns
text (
TextParagraph
) — The sentences.
ckipnlp.driver.tagger module¶
This module provides drivers with CkipTagger backend.
-
class
ckipnlp.driver.tagger.
CkipTaggerWordSegmenter
(*, lazy=False, disable_cuda=True, recommend_lexicons={}, coerce_lexicons={}, **opts)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP word segmentation driver with CkipTagger backend.
- Parameters
lazy (bool) – Lazy initialize the driver.
disable_cuda (bool) – Disable GPU usage.
recommend_lexicons (Mapping[str, float]) – A mapping of lexicon words to their relative weights.
coerce_lexicons (Mapping[str, float]) – A mapping of lexicon words to their relative weights.
- Other Parameters
**opts – Extra options for
ckiptagger.WS.__call__()
. (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)
-
__call__
(*, text)¶ Apply word segmentation.
- Parameters
text (
TextParagraph
) — The sentences.- Returns
ws (
TextParagraph
) — The word-segmented sentences.
-
class
ckipnlp.driver.tagger.
CkipTaggerPosTagger
(*, lazy=False, disable_cuda=True, **opts)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP part-of-speech tagging driver with CkipTagger backend.
- Parameters
lazy (bool) – Lazy initialize the driver.
disable_cuda (bool) – Disable GPU usage.
- Other Parameters
**opts – Extra options for
ckiptagger.POS.__call__()
. (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)
-
__call__
(*, text)¶ Apply part-of-speech tagging.
- Parameters
ws (
TextParagraph
) — The word-segmented sentences.- Returns
pos (
TextParagraph
) — The part-of-speech sentences.
-
class
ckipnlp.driver.tagger.
CkipTaggerNerChunker
(*, lazy=False, disable_cuda=True, **opts)[source]¶ Bases:
ckipnlp.driver.base.BaseDriver
The CKIP named-entity recognition driver with CkipTagger backend.
- Parameters
lazy (bool) – Lazy initialize the driver.
disable_cuda (bool) – Disable GPU usage.
- Other Parameters
**opts – Extra options for
ckiptagger.NER.__call__()
. (Please refer https://github.com/ckiplab/ckiptagger#4-run-the-ws-pos-ner-pipeline for details.)
-
__call__
(*, text)¶ Apply named-entity recognition.
- Parameters
ws (
TextParagraph
) — The word-segmented sentences.pos (
TextParagraph
) — The part-of-speech sentences.
- Returns
ner (
NerParagraph
) — The named-entity recognition results.
ckipnlp.pipeline package¶
This module implements CKIPNLP pipelines.
Submodules
ckipnlp.pipeline.coref module¶
This module provides coreference resolution pipeline.
-
class
ckipnlp.pipeline.coref.
CkipCorefDocument
(*, ws=None, pos=None, conparse=None, coref=None)[source]¶ Bases:
collections.abc.Mapping
The coreference document.
- Variables
ws (
SegParagraph
) – The word-segmented sentences.pos (
SegParagraph
) – The part-of-speech sentences.conparse (
ParseParagraph
) – The constituency sentences.coref (
CorefParagraph
) – The coreference resolution results.
-
class
ckipnlp.pipeline.coref.
CkipCorefPipeline
(*, coref_chunker='default', lazy=True, opts={}, **kwargs)[source]¶ Bases:
ckipnlp.pipeline.kernel.CkipPipeline
The coreference resolution pipeline.
- Parameters
sentence_segmenter (str) – The type of sentence segmenter.
word_segmenter (str) – The type of word segmenter.
pos_tagger (str) – The type of part-of-speech tagger.
ner_chunker (str) – The type of named-entity recognition chunker.
con_parser (str) – The type of constituency parser.
coref_chunker (str) – The type of coreference resolution chunker.
- Other Parameters
lazy (bool) – Lazy initialize the drivers.
opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.
-
__call__
(doc)[source]¶ Apply coreference delectation.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
corefdoc (
CkipCorefDocument
) – The coreference document.
Note
doc is also modified if necessary dependencies (ws, pos, ner) is not computed yet.
-
get_coref
(doc, corefdoc)[source]¶ Apply coreference delectation.
- Parameters
doc (
CkipDocument
) – The input document.corefdoc (
CkipCorefDocument
) – The input document for coreference.
- Returns
corefdoc.coref (
CorefParagraph
) – The coreference results.
Note
This routine modify corefdoc inplace.
doc is also modified if necessary dependencies (ws, pos, ner) is not computed yet.
ckipnlp.pipeline.kernel module¶
This module provides kernel CKIPNLP pipeline.
-
class
ckipnlp.pipeline.kernel.
CkipDocument
(*, raw=None, text=None, ws=None, pos=None, ner=None, conparse=None)[source]¶ Bases:
collections.abc.Mapping
The kernel document.
- Variables
raw (str) – The unsegmented text input.
text (
TextParagraph
) – The sentences.ws (
SegParagraph
) – The word-segmented sentences.pos (
SegParagraph
) – The part-of-speech sentences.ner (
NerParagraph
) – The named-entity recognition results.conparse (
ParseParagraph
) – The constituency-parsing sentences.
-
class
ckipnlp.pipeline.kernel.
CkipPipeline
(*, sentence_segmenter='default', word_segmenter='tagger', pos_tagger='tagger', con_parser='classic', ner_chunker='tagger', lazy=True, opts={})[source]¶ Bases:
object
The kernel pipeline.
- Parameters
sentence_segmenter (str) – The type of sentence segmenter.
word_segmenter (str) – The type of word segmenter.
pos_tagger (str) – The type of part-of-speech tagger.
ner_chunker (str) – The type of named-entity recognition chunker.
con_parser (str) – The type of constituency parser.
- Other Parameters
lazy (bool) – Lazy initialize the drivers.
opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.
-
get_text
(doc)[source]¶ Apply sentence segmentation.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.text (
TextParagraph
) – The sentences.
Note
This routine modify doc inplace.
-
get_ws
(doc)[source]¶ Apply word segmentation.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.ws (
SegParagraph
) – The word-segmented sentences.
Note
This routine modify doc inplace.
-
get_pos
(doc)[source]¶ Apply part-of-speech tagging.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.pos (
SegParagraph
) – The part-of-speech sentences.
Note
This routine modify doc inplace.
-
get_ner
(doc)[source]¶ Apply named-entity recognition.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.ner (
NerParagraph
) – The named-entity recognition results.
Note
This routine modify doc inplace.
-
get_conparse
(doc)[source]¶ Apply constituency parsing.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.conparse (
ParseParagraph
) – The constituency parsing sentences.
Note
This routine modify doc inplace.
ckipnlp.util package¶
This module implements extra utilities for CKIPNLP.
Submodules
ckipnlp.util.data module¶
This module implements data loading utilities for CKIPNLP.
-
ckipnlp.util.data.
get_tagger_data
()¶ Get CkipTagger data directory.
-
ckipnlp.util.data.
install_tagger_data
(src_dir, *, copy=False)¶ Link/Copy CkipTagger data directory.
-
ckipnlp.util.data.
download_tagger_data
()¶ Download CkipTagger data directory.