CKIP CoreNLP Wrappers¶
Introduction¶
External Links¶
Requirements¶
Attention
For Python 2 users, please use PyCkip 0.4.2 instead.
CKIPWS (Optional)¶
CKIP Word Segmentation Linux version 20190524+
CKIPParser (Optional)¶
CKIP Parser Linux version 20190506+ (20190725+ recommended)
Installation¶
Denote <ckipws-linux-root>
as the root path of CKIPWS Linux Version, and <ckipparser-linux-root>
as the root path of CKIPParser Linux Version.
Install Using Pip¶
pip install --upgrade ckipnlp
pip install --no-deps --force-reinstall --upgrade ckipnlp \
--install-option='--ws' \
--install-option='--ws-dir=<ckipws-linux-root>' \
--install-option='--parser' \
--install-option='--parser-dir=<ckipparser-linux-root>'
Ignore ws/parser options if one doesn’t have CKIPWS/CKIPParser.
Installation Options¶
Option |
Detail |
Default Value |
---|---|---|
|
Enable/disable CKIPWS. |
False |
|
Enable/disable CKIPParser. |
False |
|
CKIPWS root directory. |
|
|
CKIPWS libraries directory |
|
|
CKIPWS share directory |
|
|
CKIPParser root directory. |
|
|
CKIPParser libraries directory |
|
|
CKIPParser share directory |
|
|
“Data2” directory |
|
|
“Rule” directory |
|
|
“RDB” directory |
|
Usage¶
See http://ckipnlp.readthedocs.io/ for API details.
CKIPWS¶
import ckipnlp.ws
print(ckipnlp.__name__, ckipnlp.__version__)
ws = ckipnlp.ws.CkipWs(logger=False)
print(ws('中文字喔'))
for l in ws.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ws.apply_file(ifile='sample/sample.txt', ofile='output/sample.tag', uwfile='output/sample.uw')
with open('output/sample.tag') as fin:
print(fin.read())
with open('output/sample.uw') as fin:
print(fin.read())
CKIPParser¶
import ckipnlp.parser
print(ckipnlp.__name__, ckipnlp.__version__)
ps = ckipnlp.parser.CkipParser(logger=False)
print(ps('中文字喔'))
for l in ps.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ps.apply_file(ifile='sample/sample.txt', ofile='output/sample.tree')
with open('output/sample.tree') as fin:
print(fin.read())
Utilities¶
import ckipnlp
print(ckipnlp.__name__, ckipnlp.__version__)
from ckipnlp.util.ws import *
from ckipnlp.util.parser import *
# Format CkipWs output
ws_text = ['中文字(Na) 喔(T)', '啊哈(I) 哈哈(D)']
# Show Sentence List
ws_sents = WsSentenceList.from_text(ws_text)
print(repr(ws_sents))
print(ws_sents.to_text())
# Show Each Sentence
for ws_sent in ws_sents: print(repr(ws_sent))
for ws_sent in ws_sents: print(ws_sent.to_text())
# Show CkipParser output as tree
tree_text = '#1:1.[0] S(theme:NP(possessor:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nab(DUMMY1:Nab(DUMMY1:Nab:早餐|Head:Caa:、|DUMMY2:Naa:午餐)|Head:Caa:和|DUMMY2:Nab:晚餐))|quantity:Dab:都|target:PP(Head:P30:往|DUMMY:NP(property:Ncb:天|Head:Ncda:上))|Head:VA11:飛|aspect:Di:了)#'
tree = ParserTree.from_text(tree_text)
tree.show()
# Get heads of tree
for node in tree.get_heads(): print(node)
# Get heads of node 1
for node in tree.get_heads(1): print(node)
# Get heads of node 2
for node in tree.get_heads(2): print(node)
# Get heads of node 13
for node in tree.get_heads(13): print(node)
# Get relations
for rel in tree.get_relations(): print(rel)
FAQ¶
Danger
Due to C code implementation, both CkipWs
and CkipParser
can only be instance once.
Tip
The CKIPWS throws “what(): locale::facet::_S_create_c_locale name not valid”. What should I do?
Install locale data.
apt-get install locales-all
Tip
The CKIPParser throws “ImportError: libCKIPParser.so: cannot open shared object file: No such file or directory”. What should I do?
Add below command to ~/.bashrc
:
export LD_LIBRARY_PATH=<ckipparser-linux-root>/lib:$LD_LIBRARY_PATH
ckipnlp package¶
Subpackages¶
ckipnlp.parser package¶
-
class
ckipnlp.parser.
CkipParser
(*, logger=False, ini_file=None, ws_ini_file=None, lex_list=None, **kwargs)[source]¶ Bases:
object
The CKIP sentence parsing driver.
- Parameters
logger (bool) – enable logger.
lex_list (Iterable) – passed to
ckipnlp.util.ini.create_ws_lex()
, overridden lex_file forckipnlp.util.ini.create_ws_ini()
.ini_file (str) – the path to the INI file.
ws_ini_file (str) – the path to the INI file for CKIPWS.
- Other Parameters
** – the configs for CKIPParser, passed to
ckipnlp.util.ini.create_parser_ini()
, ignored if ini_file is set.** – the configs for CKIPWS, passed to
ckipnlp.util.ini.create_ws_ini()
, ignored if ws_ini_file is set.
Danger
Never instance more than one object of this class!
-
apply
(text)[source]¶ Parse a sentence.
- Parameters
text (str) – the input sentence.
- Returns
str – the output sentence.
Hint
One may also call this method as
__call__()
.
ckipnlp.util package¶
Submodules¶
ckipnlp.util.ini module¶
-
ckipnlp.util.ini.
create_ws_lex
(*lex_list)[source]¶ Generate CKIP word segmentation lexicon file.
- Parameters
*lex_list (Tuple[str, str]) – the lexicon word and its POS-tag.
- Returns
lex_file (str) – the name of the lexicon file.
f_lex (TextIO) – the file object.
Attention
Remember to close f_lex manually.
-
ckipnlp.util.ini.
create_ws_ini
(*, data2_dir=None, lex_file=None, new_style_format=False, show_category=True, sentence_max_word_num=80, **options)[source]¶ Generate CKIP word segmentation config.
- Parameters
data2_dir (str) – the path to the folder “Data2/”.
lex_file (str) – the path to the user-defined lexicon file.
new_style_format (bool) – split sentences by newline characters (“\n”) rather than punctuations.
show_category (bool) – show part-of-speech tags.
sentence_max_word_num (int) – maximum number of words per sentence.
- Returns
ini_file (str) – the name of the config file.
f_ini (TextIO) – the file object.
Attention
Remember to close f_ini manually.
-
ckipnlp.util.ini.
create_parser_ini
(*, ws_ini_file, rule_dir=None, rdb_dir=None, do_ws=True, do_parse=True, do_role=True, sentence_delim=',, ;。!?', **options)[source]¶ Generate CKIP parser config.
- Parameters
rule_dir (str) – the path to “Rule/”.
rdb_dir (str) – the path to “RDB/”.
do_ws (bool) – do word-segmentation.
do_parse (bool) – do parsing.
do_role (bool) – do role.
sentence_delim (str) – the sentence delimiters.
- Returns
ini_file (str) – the name of the config file.
f_ini (TextIO) – the file object.
Attention
Remember to close f_ini manually.
ckipnlp.util.parser module¶
-
class
ckipnlp.util.parser.
ParserNodeData
[source]¶ Bases:
tuple
A parser node.
-
property
role
¶ str – the role.
-
property
pos
¶ str – the post-tag.
-
property
term
¶ str – the text term.
-
classmethod
from_text
(text)[source]¶ Construct an instance from
ckipnlp.parser.CkipParser
output.- Parameters
data (str) – text such as
'Head:Na:中文字'
.
Notes
'Head:Na:中文字'
-> role ='Head'
, pos ='Na'
, term ='中文字'
'Head:Na'
-> role ='Head'
, pos ='Na'
, term =None
'Na'
-> role =None
, pos ='Na'
, term =None
-
classmethod
from_dict
(data)[source]¶ Construct an instance from python built-in containers.
- Parameters
data (dict) – dictionary such as
{ 'role': 'Head', 'pos': 'Na', 'term': '中文字' }
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
property
-
class
ckipnlp.util.parser.
ParserNode
(tag=None, identifier=None, expanded=True, data=None)[source]¶ Bases:
treelib.node.Node
A parser node for tree.
-
data
¶ - Type
See also
treelib.tree.Node
Please refer https://treelib.readthedocs.io/ for built-in usages.
-
data_class
¶ alias of
ParserNodeData
-
-
class
ckipnlp.util.parser.
ParserRelation
[source]¶ Bases:
tuple
A parser relation.
-
property
head
¶ ParserNode
– the head node.
-
property
tail
¶ ParserNode
– the tail node.
-
property
relation
¶ str – the relation.
-
property
-
class
ckipnlp.util.parser.
ParserTree
(tree=None, deep=False, node_class=None)[source]¶ Bases:
treelib.tree.Tree
A parsed tree.
See also
treereelib.tree.Tree
Please refer https://treelib.readthedocs.io/ for built-in usages.
-
node_class
¶ alias of
ParserNode
-
static
normalize_text
(tree_text)[source]¶ Text normalization for
ckipnlp.parser.CkipParser
output.Remove leading number and trailing
#
.
-
classmethod
from_text
(tree_text, *, normalize=True)[source]¶ Create a
ParserTree
object fromckipnlp.parser.CkipParser
output.- Parameters
text (str) – A parsed tree from
ckipnlp.parser.CkipParser
output.normalize (bool) – Do text normalization using
normalize_text()
.
-
classmethod
from_dict
(data)[source]¶ Construct an instance from python built-in containers.
- Parameters
data (dict) – dictionary such as
{ 'id': 0, 'data': { ... }, 'children': [ ... ] }
, where'data'
is a dictionary with the same format asParserNodeData.to_dict()
, and'children'
is a list of dictionaries of subtrees with the same format as this tree.
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
get_children
(node_id, *, role)[source]¶ Get children of a node with given role.
- Parameters
node_id (int) – ID of target node.
role (str) – the target role.
- Yields
ParserNode
– the children nodes with given role.
-
get_heads
(root_id=0, *, semantic=True, deep=True)[source]¶ Get all head nodes of a subtree.
- Parameters
root_id (int) – ID of the root node of target subtree.
semantic (bool) – use semantic/syntactic policy. For semantic mode, return
DUMMY
orhead
instead of syntacticHead
.deep (bool) – find heads recursively.
- Yields
ParserNode
– the head nodes.
-
get_relations
(root_id=0, *, semantic=True)[source]¶ Get all relations of a subtree.
- Parameters
root_id (int) – ID of the subtree root node.
semantic (bool) – please refer
get_heads()
for policy detail.
- Yields
ParserRelation
– the relations.
ckipnlp.util.ws module¶
-
class
ckipnlp.util.ws.
WsWord
[source]¶ Bases:
tuple
A word-segmented word.
-
property
word
¶ str – the word.
-
property
pos
¶ str – the post-tag.
-
classmethod
from_text
(data)[source]¶ Construct an instance from
ckipnlp.ws.CkipWs
output.- Parameters
data (str) – text such as
'中文字(Na)'
.
Notes
'中文字(Na)'
-> word ='中文字'
, pos ='Na'
'中文字'
-> word ='中文字'
, pos =None
-
classmethod
from_dict
(data)[source]¶ Construct an instance from python built-in containers.
- Parameters
data (dict) – dictionary such as
{ 'word': '中文字', 'pos': 'Na' }
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
property
-
class
ckipnlp.util.ws.
WsSentence
(initlist=None)[source]¶ Bases:
collections.UserList
A word-segmented sentence.
-
classmethod
from_text
(data)[source]¶ Construct an instance from
ckipnlp.ws.CkipWs
output.- Parameters
data (str) – text such as
'中文字(Na)\u3000喔(T)'
.
-
classmethod
from_dict
(data)[source]¶ Construct an instance a from python built-in containers.
- Parameters
data (Sequence[dict]) – list of objects as
WsWord.from_dict()
input.
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
classmethod
-
class
ckipnlp.util.ws.
WsSentenceList
(initlist=None)[source]¶ Bases:
collections.UserList
A list of word-segmented sentence.
-
item_class
¶ alias of
WsSentence
-
classmethod
from_text
(data)[source]¶ Construct an instance from
ckipnlp.ws.CkipWs
output.- Parameters
data (Sequence[str]) – list of texts as
WsSentence.from_text()
input.
-
classmethod
from_dict
(data)[source]¶ Construct an instance a from python built-in containers.
- Parameters
data (Sequence[Sequence[dict]]) – list of objects as
WsSentence.from_dict()
input.
-
classmethod
from_json
(data, **kwargs)[source]¶ Construct an instance from JSON format.
- Parameters
data (str) – please refer
from_dict()
for format details.
-
ckipnlp.ws package¶
-
class
ckipnlp.ws.
CkipWs
(*, logger=False, ini_file=None, lex_list=None, **kwargs)[source]¶ Bases:
object
The CKIP word segmentation driver.
- Parameters
logger (bool) – enable logger.
lex_list (Iterable) – passed to
ckipnlp.util.ini.create_ws_lex()
overridden lex_file forckipnlp.util.ini.create_ws_ini()
.ini_file (str) – the path to the INI file.
- Other Parameters
** – the configs for CKIPWS, passed to
ckipnlp.util.ini.create_ws_ini()
, ignored if ini_file is set.
Danger
Never instance more than one object of this class!
-
apply
(text)[source]¶ Segment a sentence.
- Parameters
text (str) – the input sentence.
- Returns
str – the output sentence.
Hint
One may also call this method as
__call__()
.