ckipnlp.pipeline.kernel module

This module provides kernel CKIPNLP pipeline.

class ckipnlp.pipeline.kernel.CkipDocument(*, raw=None, text=None, ws=None, pos=None, ner=None, conparse=None)[source]

Bases: collections.abc.Mapping

The kernel document.

Variables
  • raw (str) – The unsegmented text input.

  • text (TextParagraph) – The sentences.

  • ws (SegParagraph) – The word-segmented sentences.

  • pos (SegParagraph) – The part-of-speech sentences.

  • ner (NerParagraph) – The named-entity recognition results.

  • conparse (ParseParagraph) – The constituency-parsing sentences.

class ckipnlp.pipeline.kernel.CkipPipeline(*, sentence_segmenter='default', word_segmenter='tagger', pos_tagger='tagger', con_parser='classic-client', ner_chunker='tagger', lazy=True, opts={})[source]

Bases: object

The kernel pipeline.

Parameters
  • sentence_segmenter (str) – The type of sentence segmenter.

  • word_segmenter (str) – The type of word segmenter.

  • pos_tagger (str) – The type of part-of-speech tagger.

  • ner_chunker (str) – The type of named-entity recognition chunker.

  • con_parser (str) – The type of constituency parser.

Other Parameters
  • lazy (bool) – Lazy initialize the drivers.

  • opts (Dict[str, Dict]) – The driver options. Key: driver name (e.g. ‘sentence_segmenter’); Value: a dictionary of options.

get_text(doc)[source]

Apply sentence segmentation.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.text (TextParagraph) – The sentences.

Note

This routine modify doc inplace.

get_ws(doc)[source]

Apply word segmentation.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.ws (SegParagraph) – The word-segmented sentences.

Note

This routine modify doc inplace.

get_pos(doc)[source]

Apply part-of-speech tagging.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.pos (SegParagraph) – The part-of-speech sentences.

Note

This routine modify doc inplace.

get_ner(doc)[source]

Apply named-entity recognition.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.ner (NerParagraph) – The named-entity recognition results.

Note

This routine modify doc inplace.

get_conparse(doc)[source]

Apply constituency parsing.

Parameters

doc (CkipDocument) – The input document.

Returns

doc.conparse (ParseParagraph) – The constituency parsing sentences.

Note

This routine modify doc inplace.