CKIP CoreNLP Wrappers¶
Introduction¶
Author¶
- Mu Yang <emfomy@gmail.com>
Requirements¶
Note
For Python 2 users, please use PyCkip 0.4.2 instead.
CkipWs (Optional)¶
- CKIP Word Segmentation Linux version 20190524+
CkipParser (Optional)¶
- CKIP Parser Linux version 20190506+ (20190725+ recommended)
Installation¶
Denote <ckipws-linux-root>
as the root path of CKIPWS Linux Version, and <ckipparser-linux-root>
as the root path of CKIP-Parser Linux Version.
Install Using Pip¶
pip install --upgrade ckipnlp
pip install --no-deps --force-reinstall --upgrade ckipnlp \
--install-option='--ws' \
--install-option='--ws-dir=<ckipws-linux-root>' \
--install-option='--parser' \
--install-option='--parser-dir=<ckipparser-linux-root>'
Ignore ws/parser options if one doesn’t have CKIPWS/CKIP-Parser.
Installation Options¶
Option | Detail | Default Value |
---|---|---|
--[no-]ws |
Enable/disable CKIPWS. | False |
--[no-]parser |
Enable/disable CKIP-Parser. | False |
--ws-dir=<ws-dir> |
CKIPWS root directory. | |
--ws-lib-dir=<ws-lib-dir> |
CKIPWS libraries directory | <ws-dir>/lib |
--ws-share-dir=<ws-share-dir> |
CKIPWS share directory | <ws-dir> |
--parser-dir=<parser-dir> |
CKIP-Parser root directory. | |
--parser-lib-dir=<parser-lib-dir> |
CKIP-Parser libraries directory | <parser-dir>/lib |
--parser-share-dir=<parser-share-dir> |
CKIP-Parser share directory | <parser-dir> |
--data2-dir=<data2-dir> |
“Data2” directory | <ws-share-dir>/Data2 |
--rule-dir=<rule-dir> |
“Rule” directory | <parser-share-dir>/Rule |
--rdb-dir=<rdb-dir> |
“RDB” directory | <parser-share-dir>/RDB |
Usage¶
See http://ckipnlp.readthedocs.io/ for API details.
CKIPWS¶
import ckipnlp.ws
print(ckipnlp.__name__, ckipnlp.__version__)
ws = ckipnlp.ws.CkipWs(logger=False)
print(ws('中文字喔'))
for l in ws.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ws.apply_file(ifile='sample/sample.txt', ofile='output/sample.tag', uwfile='output/sample.uw')
with open('output/sample.tag') as fin:
print(fin.read())
with open('output/sample.uw') as fin:
print(fin.read())
CKIP-Parser¶
import ckipnlp.parser
print(ckipnlp.__name__, ckipnlp.__version__)
ps = ckipnlp.parser.CkipParser(logger=False)
print(ps('中文字喔'))
for l in ps.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ps = ckipnlp.parser.CkipParser(logger=False)
print(ps('中文字喔'))
for l in ps.apply_list(['中文字喔', '啊哈哈哈']): print(l)
ps.apply_file(ifile='sample/sample.txt', ofile='output/sample.tree')
with open('output/sample.tree') as fin:
print(fin.read())
Utilities¶
import ckipnlp
print(ckipnlp.__name__, ckipnlp.__version__)
from ckipnlp.util.ws import *
from ckipnlp.util.parser import *
# Format CkipWs output
ws_text = ['中文字(Na) 喔(T)', '啊哈(I) 哈哈(D)']
for text in ws_text: print(ckipnlp.util.ws.WsSentence.from_text(text))
for text in ws_text: print(repr(ckipnlp.util.ws.WsSentence.from_text(text)))
# Show CkipParser output as tree
tree_text = 'S(theme:NP(property:N‧的(head:Nhaa:我|Head:DE:的)|Head:Nad(DUMMY1:Nab:早餐|Head:Caa:和|DUMMY2:Naa:午餐))|quantity:Dab:都|Head:VC31:吃完|aspect:Di:了)'
tree = ParserTree.from_text(tree_text)
tree.show()
# Get dummies of node 5
for node in tree.get_dummies(5): print(node)
# Get heads of node 1
for node in tree.get_heads(1): print(node)
# Get relations
for r in tree.get_relations(0): print(r)
FAQ¶
Warning
The CKIPWS throws “what(): locale::facet::_S_create_c_locale name not valid
”. What should I do?
Install locale data.
apt-get install locales-all
Warning
The CKIPParser throws “ImportError: libCKIPParser.so: cannot open shared object file: No such file or directory
”. What should I do?
Add below command to ~/.bashrc
:
export LD_LIBRARY_PATH=<ckipparser-linux-root>/lib:$LD_LIBRARY_PATH
ckipnlp.ws package¶
-
class
ckipnlp.ws.
CkipWs
(*, logger=False, inifile=None, **options)[source]¶ Bases:
object
The CKIP word segmentation driver.
Parameters: - logger (bool) – enable logger.
- inifile (str) – the path to the INI file.
- options – the options, see
ckipnlp.util.ini.create_ws_ini()
.
-
apply
(text)[source]¶ Segment a sentence.
Parameters: text (str) – the input sentence. Returns: str – the output sentence. Notes
One may also call this method as
__call__()
.
ckipnlp.parser package¶
-
class
ckipnlp.parser.
CkipParser
(*, logger=False, inifile=None, wsinifile=None, **options)[source]¶ Bases:
object
The CKIP sentence parsing driver.
Parameters: - logger (bool) – enable logger.
- inifile (str) – the path to the INI file.
- wsinifile (str) – the path to the INI file for CKIPWS.
- options – the options, see
ckipnlp.util.ini.create_ws_ini()
andckipnlp.util.ini.create_parser_ini()
-
apply
(text)[source]¶ Segment a sentence.
Parameters: text (str) – the input sentence. Returns: str – the output sentence. Notes
One may also call this method as
__call__()
.
ckipnlp.util package¶
Submodules¶
ckipnlp.util.ini module¶
-
ckipnlp.util.ini.
create_ws_ini
(*, data2dir=None, lexfile=None, new_style_format=False, show_category=True, sentence_max_word_num=80, **options)[source]¶ Generate CKIP word segmentation config.
Parameters: - data2dir (str) – the path to the folder “Data2/”.
- lexfile (str) – the path to the user-defined lexicon file.
- new_style_format (bool) – split sentences by newline characters (“\n”) rather than punctuations.
- show_category (bool) – show part-of-speech tags.
- sentence_max_word_num (int) – maximum number of words per sentence.
-
ckipnlp.util.ini.
create_parser_ini
(*, wsinifile, ruledir=None, rdbdir=None, do_ws=True, do_parse=True, do_role=True, sentence_delim=',, ;。!?', **options)[source]¶ Generate CKIP parser config.
Parameters: - ruledir (str) – the path to “Rule/”.
- rdbdir (str) – the path to “RDB/”.
- do_ws (bool) – do word-segmentation.
- do_parse (bool) – do parsing.
- do_role (bool) – do role.
- sentence_delim (str) – the sentence delimiters.
ckipnlp.util.parser module¶
-
class
ckipnlp.util.parser.
ParserNodeData
[source]¶ Bases:
ckipnlp.util.parser._ParserNodeData
A parser node.
- Fields:
- role (str): the role.
- pos (str): the post-tag.
- term (str): the text term.
-
classmethod
from_text
(text)[source]¶ Create
ParserNodeData
object fromckipnlp.parser.CkipParser
output.
-
class
ckipnlp.util.parser.
ParserNode
(tag=None, identifier=None, expanded=True, data=None)[source]¶ Bases:
treelib.node.Node
A parser node for tree.
-
class
ckipnlp.util.parser.
ParserRelation
[source]¶ Bases:
ckipnlp.util.parser._ParserRelation
A parser relation.
- Fields:
- head (
ParserNode
): the head node. - tail (
ParserNode
): the tail node. - relation (str): the relation.
- head (
-
class
ckipnlp.util.parser.
ParserTree
(tree=None, deep=False, node_class=None)[source]¶ Bases:
treelib.tree.Tree
A parsed tree.
-
classmethod
from_text
(tree_text)[source]¶ Create
ParserTree
object fromckipnlp.parser.CkipParser
output.
-
has_dummies
(node_id)[source]¶ Determine if a node has dummies.
Parameters: node_id (int) – ID of target node. Returns: bool – whether or not target node has dummies.
-
get_dummies
(node_id, deep=True, _check=True)[source]¶ Get dummies of a node.
Parameters: - node_id (int) – ID of target node.
- deep (bool) – find dummies recursively.
Returns: tuple – the dummies (
ParserNode
).Raises: LookupError
– when target node has no dummy (only when _check is set).
-
get_heads
(root_id=0, deep=True)[source]¶ Get all head nodes of a subtree.
Parameters: - node_id (int) – ID of the root node of target subtree.
- deep (bool) – find heads recursively.
Returns: - list – the head nodes (
ParserNode
). ParserNode
– the head node (when deep is set).
Todo
Get information of nodes with pos type PP or GP.
-
get_relations
(root_id=0)[source]¶ Get all relations of a subtree.
Parameters: node_id (int) – ID of the subtree root node. Yields: ParserRelation
– the relation.
-
classmethod
Todo List¶
Todo
Get information of nodes with pos type PP or GP.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/ckipnlp/checkouts/0.6.0/ckipnlp/util/parser.py:docstring of ckipnlp.util.parser.ParserTree.get_heads, line 11.)
\ Sort by:\ best rated\ newest\ oldest\
\\
Add a comment\ (markup):
\``code``
, \ code blocks:::
and an indented block after blank line