ckipnlp.pipeline.core module¶

This module provides core CKIPNLP pipeline.

class ckipnlp.pipeline.core.CkipDocument(*, raw=None, text=None, ws=None, pos=None, ner=None, parsed=None)[source]¶

Bases: collections.abc.Mapping

The core document.

Variables

raw (str) – The unsegmented text input.
text (TextParagraph) – The sentences.
ws (SegParagraph) – The word-segmented sentences.
pos (SegParagraph) – The part-of-speech sentences.
ner (NerParagraph) – The named-entity recognition results.
parsed (ParsedParagraph) – The parsed-sentences.

class ckipnlp.pipeline.core.CkipPipeline(*, sentence_segmenter=<DriverFamily.BUILTIN: 1>, word_segmenter=<DriverFamily.TAGGER: 2>, pos_tagger=<DriverFamily.TAGGER: 2>, sentence_parser=<DriverFamily.CLASSIC: 3>, ner_chunker=<DriverFamily.TAGGER: 2>, lazy=True, opts={})[source]¶

Bases: object

The core pipeline.

Parameters

sentence_segmenter (DriverFamily) – The type of sentence segmenter.
word_segmenter (DriverFamily) – The type of word segmenter.
pos_tagger (DriverFamily) – The type of part-of-speech tagger.
ner_chunker (DriverFamily) – The type of named-entity recognition chunker.
sentence_parser (DriverFamily) – The type of sentence parser.

Other Parameters

lazy (bool) – Lazy initialize the drivers.
opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.

get_text(doc)[source]¶

Apply sentence segmentation.

Parameters: doc (CkipDocument) – The input document.
Returns: doc.text (TextParagraph) – The sentences.

Note