spaCyの基本操作
基本操作
基本的な操作を備忘録として残す。
import spacy nlp = spacy.load('en') doc = nlp(u'Jeffrey Navin saw the girl with the telescope. She looked very strong.')
Spacyの単語は文字列ではなく品詞情報などを含む特殊なオブジェクト
doc[0] >> Jeffrey type(doc[0]) >> spacy.tokens.token.Token
sentenceに分ける。
sentences = list(doc.sents) >> [Jeffrey Navin saw the girl with the telescope., She looked very strong.] len(sentences ) >> 2
品詞情報の抽出
(doc[0].pos_, doc[0].pos) >> (u'PROPN', 94)
名詞のチャンクの抽出
list(doc.noun_chunks)[0].text >> u'Jeffrey Navin'
原形・品詞タグの取得
for sent in doc.sents: for token in sent: print str(token),token.lemma_, token.tag_
Jeffrey jeffrey NNP
Navin navin NNP
saw see VBD
the the DT
girl girl NN
with with IN
the the DT
telescope telescope NN
. . .
She -PRON- PRP
looked look VBD
very very RB
strong strong JJ
. . .
固有表現の抽出
for sent in doc.sents: for token in sent: print str(token), token.ent_type_ if token.ent_type_ else 'O'
Jeffrey PERSON
Navin PERSON
saw O
the O
girl O
with O
the O
telescope O
. O
She O
looked O
very O
strong O
. O
もう少し詳細な固有表現抽出
doc = nlp(u'San Francisco considers banning sidewalk delivery robots') ents = [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents] >> [(u'San Francisco', 0, 13, u'GPE')] doc = nlp(u'Tom goes to New York') ents = [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents] >> [(u'Tom', 0, 3, u'PERSON'), (u'New York', 12, 20, u'GPE')]
基本操作はこんな感じ。