如何在nltk中使用hunpos标记文本文件？

有人可以帮助我在nltk中使用标记语料库的语法吗？我该为hunpos.HunPosTagger模块导入什么？我怎么HunPosTag语料库？请参阅下面的代码。

import nltk 
from nltk.corpus import PlaintextCorpusReader  
from nltk.corpus.util import LazyCorpusLoader  

corpus_root = './'  
reader = PlaintextCorpusReader (corpus_root, '.*')  

ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)  
ntuen.fileids()  
isinstance (ntuen, PlaintextCorpusReader)  


# So how do I hunpos tag `ntuen`? I can't get the following code to work.
# please help me to correct my python syntax errors, I'm new to python 
# but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunPosTagger('english.model')
for sentence in ntu.sent() ##looping through the no. of sentence
     ht.tag(ntusent()[i])

已邀请:

1 个回复

峨躬坎抬焚

import nltk 
from nltk.tag.hunpos import HunposTagger
from nltk.tokenize import word_tokenize

corpus = "so how do i hunpos tag my ntuen ? i can't get the following code to work."
#please help me to correct my python syntax errors, i'm new to python 
#but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunposTagger('en_wsj.model')
print ht.tag(word_tokenize(corpus))

我觉得问题是你没有对这些单词进行标记，但是还有其他原因导致代码无效（它是HunposTagger，而不是HunPosTagger）。我从你的问题中做了这个简化的例子。如果您还有其他问题，请发表评论。我从这里得到了所有内容：http：//code.google.com/p/hunpos/ python hunpos.py [（'so'，'RB'），（'how'，'WRB'），（'do'，'VBP'），（'i'，'FW'），（'hunpos'，'NN'），（'tag'，'NN'），（'我'，'PRP $'），（'ntuen'，'NN'），（'？'，'。'），（'我'，'FW' ），（'ca'，'MD'），（“not”，“RB”），（“get”，“VB”），（“the”，“DT”），（“追随”，“ JJ'），（'code'，'NN'），（'to'，'TO'），（'work'，'VB'），（'。'，'。'）]

要回复问题请先登录或注册

如何在nltk中使用hunpos标记文本文件？

1 个回复

发起人

corpus

pos_tagger

问题状态

如何在nltk中使用hunpos标记文本文件？

与内容相关的链接

1 个回复

发起人

corpus

pos_tagger

问题状态