使用WordNet和NLTK替换语料库中的同义词 - python

试着编写简单的python脚本,它将使用NLTK来查找和替换txt文件中的同义词。 以下代码给出了错误:
Traceback (most recent call last):
  File "C:UsersNedimDocumentssinon2.py", line 21, in <module>
    change(word)
  File "C:UsersNedimDocumentssinon2.py", line 4, in change
    synonym = wn.synset(word + ".n.01").lemma_names
TypeError: can only concatenate list (not "str") to list
这是代码:
from nltk.corpus import wordnet as wn

def change(word):
    synonym = wn.synset(word + ".n.01").lemma_names

    if word in synonym:

            filename = open("C:/Users/tester/Desktop/test.txt").read()
            writeSynonym = filename.replace(str(word), str(synonym[0]))
            f = open("C:/Users/tester/Desktop/test.txt", 'w')
            f.write(writeSynonym)
            f.close()

f = open("C:/Users/tester/Desktop/test.txt")
lines = f.readlines()

for i in range(len(lines)):

    word = lines[i].split()
    change(word)
    
已邀请:
这不是非常有效,并且这不会取代单个同义词。因为每个单词可能有多个同义词。你可以选择哪个,
from nltk.corpus import wordnet as wn
from nltk.corpus.reader.plaintext import PlaintextCorpusReader


corpus_root = 'C://Users//tester//Desktop//'
wordlists = PlaintextCorpusReader(corpus_root, '.*')


for word in wordlists.words('test.txt'):
    synonymList = set()
    wordNetSynset =  wn.synsets(word)
    for synSet in wordNetSynset:
        for synWords in synSet.lemma_names:
            synonymList.add(synWords)
    print synonymList
    
两件事情。首先,您可以将文件读取部分更改为:
for line in open("C:/Users/tester/Desktop/test.txt"):
    word = line.split()
第二,
.split()
返回一个字符串列表,而你的
change
函数似乎一次只能操作一个单词。这就是导致例外的原因。你的
word
实际上就是一个清单。 如果要处理该行上的每个单词,请将其显示为:
for line in open("C:/Users/tester/Desktop/test.txt"):
    words = line.split()
    for word in words:
        change(word)
    

要回复问题请先登录注册