ckipnlp.pipeline.core module¶
This module provides core CKIPNLP pipeline.
-
class
ckipnlp.pipeline.core.
CkipDocument
(*, raw=None, text=None, ws=None, pos=None, ner=None, parsed=None)[source]¶ Bases:
collections.abc.Mapping
The core document.
- Variables
raw (str) – The unsegmented text input.
text (
TextParagraph
) – The sentences.ws (
SegParagraph
) – The word-segmented sentences.pos (
SegParagraph
) – The part-of-speech sentences.ner (
NerParagraph
) – The named-entity recognition results.parsed (
ParsedParagraph
) – The parsed-sentences.
-
class
ckipnlp.pipeline.core.
CkipPipeline
(*, sentence_segmenter=<DriverFamily.BUILTIN: 1>, word_segmenter=<DriverFamily.TAGGER: 2>, pos_tagger=<DriverFamily.TAGGER: 2>, sentence_parser=<DriverFamily.CLASSIC: 3>, ner_chunker=<DriverFamily.TAGGER: 2>, lazy=True, opts={})[source]¶ Bases:
object
The core pipeline.
- Parameters
sentence_segmenter (
DriverFamily
) – The type of sentence segmenter.word_segmenter (
DriverFamily
) – The type of word segmenter.pos_tagger (
DriverFamily
) – The type of part-of-speech tagger.ner_chunker (
DriverFamily
) – The type of named-entity recognition chunker.sentence_parser (
DriverFamily
) – The type of sentence parser.
- Other Parameters
lazy (bool) – Lazy initialize the drivers.
opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.
-
get_text
(doc)[source]¶ Apply sentence segmentation.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.text (
TextParagraph
) – The sentences.
Note
This routine modify doc inplace.
-
get_ws
(doc)[source]¶ Apply word segmentation.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.ws (
SegParagraph
) – The word-segmented sentences.
Note
This routine modify doc inplace.
-
get_pos
(doc)[source]¶ Apply part-of-speech tagging.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.pos (
SegParagraph
) – The part-of-speech sentences.
Note
This routine modify doc inplace.
-
get_ner
(doc)[source]¶ Apply named-entity recognition.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.ner (
NerParagraph
) – The named-entity recognition results.
Note
This routine modify doc inplace.
-
get_parsed
(doc)[source]¶ Apply sentence parsing.
- Parameters
doc (
CkipDocument
) – The input document.- Returns
doc.parsed (
ParsedParagraph
) – The parsed sentences.
Note
This routine modify doc inplace.