CKIP CoreNLP Wrappers¶
Introduction¶
Author¶
- Mu Yang <emfomy@gmail.com>
Documentation¶
Requirements¶
CkipWs (Optional)¶
- CKIP Word Segmentation Linux version (20190524+)
CkipParser (Optional)¶
- CKIP Parser Linux version (20190506+)
- Boost C++ Libraries 1.54.0
Installation¶
Denote <ckipws-linux-root>
as the root path of CKIPWS Linux Version, and <ckipparser-linux-root>
as the root path of CKIP-Parser Linux Version.
Step 1: Setup CKIPWS & CKIP-Parser environment¶
Add below command to ~/.bashrc
:
export LD_LIBRARY_PATH=<ckipws-linux-root>/lib:<ckipparser-linux-root>/lib:$LD_LIBRARY_PATH
Step 2: Install Using Pip¶
pip install ckipnlp \
--install-option='--ws' \
--install-option='--ws-dir=<ckipws-linux-root>' \
--install-option='--parser' \
--install-option='--parser-dir=<ckipparser-linux-root>'
Ignore ws/parser options if one doesn’t have CKIPWS/CKIP-Parser.
Installation Options¶
Option | Detail | Default Value |
---|---|---|
--[no-]ws |
Enable/disable CKIPWS. | False |
--[no-]parser |
Enable/disable CKIP-Parser. | False |
--ws-dir=<ws-dir> |
CKIPWS root directory. | |
--ws-lib-dir=<ws-lib-dir> |
CKIPWS libraries directory | <ws-dir>/lib |
--ws-share-dir=<ws-share-dir> |
CKIPWS share directory | <ws-dir> |
--parser-dir=<parser-dir> |
CKIP-Parser root directory. | |
--parser-lib-dir=<parser-lib-dir> |
CKIP-Parser libraries directory | <parser-dir>/lib |
--parser-share-dir=<parser-share-dir> |
CKIP-Parser share directory | <parser-dir> |
--data2-dir=<data2-dir> |
“Data2” directory | <ws-share-dir>/Data2 |
--rule-dir=<rule-dir> |
“Rule” directory | <parser-share-dir>/Rule |
--rdb-dir=<rdb-dir> |
“RDB” directory | <parser-share-dir>/RDB |
Usage¶
See http://ckipnlp.readthedocs.io/ for API details.
CKIPWS¶
import ckipnlp.ws
print(ckipnlp.__name__, ckipnlp.__version__)
ws = ckipnlp.ws.CkipWs(logger=False)
print(ws('中文字喔'))
for l in ws.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ws.apply_file(ifile='sample/sample.txt', ofile='output/sample.tag', uwfile='output/sample.uw')
with open('output/sample.tag') as fin:
print(fin.read())
with open('output/sample.uw') as fin:
print(fin.read())
CKIP-Parser¶
import ckipnlp.parser
print(ckipnlp.__name__, ckipnlp.__version__)
ps = ckipnlp.parser.CkipParser(logger=False)
print(ps('中文字喔'))
for l in ps.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ps = ckipnlp.parser.CkipParser(logger=False)
print(ps('中文字喔'))
for l in ps.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ps.apply_file(ifile='sample/sample.txt', ofile='output/sample.tree')
with open('output/sample.tree') as fin:
print(fin.read())
Utilities¶
import ckipnlp
print(ckipnlp.__name__, ckipnlp.__version__)
from ckipnlp.util.ws import *
from ckipnlp.util.parser import *
# Format CkipWs output
ws_text = ['中文字(Na) 喔(T)', '啊哈(I) 哈哈(D)']
for text in ws_text: print(ckipnlp.util.ws.WsSentence.from_text(text))
for text in ws_text: print(repr(ckipnlp.util.ws.WsSentence.from_text(text)))
# Show CkipParser output as tree
tree_text = 'S(theme:NP(property:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nad(DUMMY1:Nab:早餐|Head:Caa:和|DUMMY2:Naa:午餐))|quantity:Dab:都|Head:VC31:吃完|aspect:Di:了)'
tree = ParserTree.from_text(tree_text)
tree.show()
# Get dummies of node 5
for node in tree.get_dummies(5): print(node)
# Get heads of node 1
for node in tree.get_heads(1): print(node)
# Get relations
for r in tree.get_relations(0): print(r)
FAQ¶
- The CKIPWS throws “
what(): locale::facet::_S_create_c_locale name not valid
”. What should I do?
apt-get install locales-all
License¶
ckipnlp.ws package¶
-
class
ckipnlp.ws.
CkipWs
(*, logger=False, inifile=None, **options)[source]¶ Bases:
object
The CKIP word segmentation driver.
Parameters: - logger (bool) – enable logger.
- inifile (str) – the path to the INI file.
- options – the options, see
ckipnlp.util.ini.create_ws_ini()
.
-
apply
(text)[source]¶ Segment a sentence.
Parameters: text (str) – the input sentence. Returns: str – the output sentence. Notes
One may also call this method as
__call__()
.
ckipnlp.parser package¶
-
class
ckipnlp.parser.
CkipParser
(*, logger=False, inifile=None, wsinifile=None, **options)[source]¶ Bases:
object
The CKIP sentence parsing driver.
Parameters: - logger (bool) – enable logger.
- inifile (str) – the path to the INI file.
- wsinifile (str) – the path to the INI file for CKIPWS.
- options – the options, see
ckipnlp.util.ini.create_ws_ini()
andckipnlp.util.ini.create_parser_ini()
-
apply
(text)[source]¶ Segment a sentence.
Parameters: text (str) – the input sentence. Returns: str – the output sentence. Notes
One may also call this method as
__call__()
.
ckipnlp.util package¶
Submodules¶
ckipnlp.util.ini module¶
-
ckipnlp.util.ini.
create_ws_ini
(*, data2dir=None, lexfile=None, new_style_format=False, show_category=True, sentence_max_word_num=80, **options)[source]¶ Generate CKIP word segmentation config.
Parameters: - data2dir (str) – the path to the folder “Data2/”.
- lexfile (str) – the path to the user-defined lexicon file.
- new_style_format (bool) – split sentences by newline characters (“\n”) rather than punctuations.
- show_category (bool) – show part-of-speech tags.
- sentence_max_word_num (int) – maximum number of words per sentence.
-
ckipnlp.util.ini.
create_parser_ini
(*, wsinifile, ruledir=None, rdbdir=None, do_ws=True, do_parse=True, do_role=True, sentence_delim=',, ;。!?', **options)[source]¶ Generate CKIP parser config.
Parameters: - ruledir (str) – the path to “Rule/”.
- rdbdir (str) – the path to “RDB/”.
- do_ws (bool) – do word-segmentation.
- do_parse (bool) – do parsing.
- do_role (bool) – do role.
- sentence_delim (str) – the sentence delimiters.
ckipnlp.util.parser module¶
-
class
ckipnlp.util.parser.
ParserNode
[source]¶ Bases:
ckipnlp.util.parser._ParserNode
A parser node.
- Fields:
- role (str): the role.
- pos (str): the post-tag.
- term (str): the text term.
-
classmethod
from_text
(text)[source]¶ Create
ParserNode
object fromckipnlp.parser.CkipParser
output.
-
class
ckipnlp.util.parser.
ParserRelationNode
[source]¶ Bases:
ckipnlp.util.parser._ParserRelationNode
A parser relation node.
- Fields:
- node (
treelib.Node
): the node. - role (str): the relation role.
- node (
-
class
ckipnlp.util.parser.
ParserRelation
[source]¶ Bases:
ckipnlp.util.parser._ParserRelation
A parser relation.
- Fields:
- head (
ParserRelationNode
): the head node. - tail (
ParserRelationNode
): the tail node.
- head (
-
class
ckipnlp.util.parser.
ParserTree
(tree=None, deep=False, node_class=None)[source]¶ Bases:
treelib.tree.Tree
A parsed tree.
-
classmethod
from_text
(tree_text)[source]¶ Create
ParserTree
object fromckipnlp.parser.CkipParser
output.
-
has_dummies
(node_id)[source]¶ Determine if a node has dummies.
Parameters: node_id (int) – ID of target node. Returns: bool – whether or not target node has dummies.
-
get_dummies
(node_id, deep=True, _check=True)[source]¶ Get dummies of a node.
Parameters: - node_id (int) – ID of target node.
- deep (bool) – find dummies recursively.
Returns: tuple – the dummies (
ParserNode
).Raises: LookupError
– when target node has no dummy (only when _check is set).
-
get_heads
(root_id=0, deep=True)[source]¶ Get all head nodes of a subtree.
Parameters: - node_id (int) – ID of the root node of target subtree.
- deep (bool) – find heads recursively.
Returns: - list – the head nodes (
ParserNode
). ParserNode
– the head node (when deep is set).
Todo
Get information of nodes with pos type PP or GP.
-
get_relations
(root_id=0)[source]¶ Get all relations of a subtree.
Parameters: node_id (int) – ID of the subtree root node. Yields: ParserRelation
– the relation.
-
classmethod
ckipnlp.util.ws module¶
-
class
ckipnlp.util.ws.
WsWord
[source]¶ Bases:
ckipnlp.util.ws._WsWord
A word-segmented word.
- Fields:
- word (str): the word.
- pos (str): the post-tag.
-
classmethod
from_text
(text)[source]¶ Create
WsWord
object fromckipnlp.ws.CkipWs
output.
-
class
ckipnlp.util.ws.
WsSentence
(initlist=None)[source]¶ Bases:
collections.UserList
A word-segmented sentence.
- Items:
WsWord
: the words.
-
classmethod
from_text
(text)[source]¶ Create
WsSentence
object fromckipnlp.ws.CkipWs
output.
Todo List¶
Todo
Get information of nodes with pos type PP or GP.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/ckipnlp/checkouts/0.5.1/ckipnlp/util/parser.py:docstring of ckipnlp.util.parser.ParserTree.get_heads, line 11.)