Stanford Parser và NLTK

Question 1

Có thể sử dụng Stanford Parser trong NLTK không? (Tôi không nói về Stanford POS.)

Question 2

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Chắc chắn, hãy thử những điều sau đây bằng Python:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/path/to/standford/jars'
os.environ['STANFORD_MODELS'] = '/path/to/standford/jars'

parser = stanford.StanfordParser(model_path="/location/of/the/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

# GUI
for line in sentences:
    for sentence in line:
        sentence.draw()

Đầu ra:

[Cây ('ROOT', [Cây ('S', [Cây ('INTJ', [Cây ('UH', ['Xin chào'])]), Cây (',', [',']), Cây ('NP', [Cây ('PRP $', ['Của tôi']), Cây ('NN', ['tên'])]), Cây ('VP', [Cây ('VBZ', [ 'is']), Tree ('ADJP', [Tree ('JJ', ['Melroy'])])]), Tree ('.', ['.'])])]), Tree (' ROOT ', [Tree (' SBARQ ', [Tree (' WHNP ', [Tree (' WP ', [' What '])]), Tree (' SQ ', [Tree (' VBZ ', [' is ') ]), Tree ('NP', [Tree ('PRP $', ['your']), Tree ('NN', ['name'])])]), Tree ('.', ['? '])])])]

Lưu ý 1: Trong ví dụ này, cả bộ phân tích cú pháp và lọ mô hình đều nằm trong cùng một thư mục.

Lưu ý 2:

Tên tệp của trình phân tích cú pháp stanford là: stanford-parser.jar
Tên tệp của các mô hình stanford là: stanford-parser-xxx-models.jar

Lưu ý 3: Tệp englishPCFG.ser.gz có thể được tìm thấy bên trong tệp models.jar (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz). Vui lòng sử dụng trình quản lý lưu trữ come để 'giải nén' tệp models.jar.

Lưu ý 4: Hãy chắc chắn rằng bạn đang sử dụng Java JRE (Runtime Environment) 1.8 còn được gọi là Oracle JDK 8. Nếu không, bạn sẽ nhận được: Unsupported major.minor phiên bản 52.0.

Cài đặt

Tải xuống NLTK v3 từ: https://github.com/nltk/nltk . Và cài đặt NLTK:

sudo python setup.py cài đặt
Bạn có thể sử dụng trình tải xuống NLTK để tải Stanford Parser, sử dụng Python:
```
import nltk
nltk.download()
```
Hãy thử ví dụ của tôi! (đừng quên thay đổi đường dẫn jar và thay đổi đường dẫn mô hình thành vị trí ser.gz)

HOẶC LÀ:

Tải xuống và cài đặt NLTK v3, tương tự như trên.
Tải xuống phiên bản mới nhất từ ( tên tệp phiên bản hiện tại là stanford-parser-full-2015-01-29.zip): http://nlp.stanford.edu/software/lex-parser.shtml#Download
Giải nén standford-parser-full-20xx-xx-xx.zip.
Tạo một thư mục mới (trong ví dụ của tôi là 'lọ'). Đặt các tệp đã giải nén vào thư mục jar này: stanford-parser-3.xx-models.jar và stanford-parser.jar.

Như được hiển thị ở trên, bạn có thể sử dụng các biến môi trường (STANFORD_PARSER & STANFORD_MODELS) để trỏ đến thư mục 'lọ' này. Tôi đang sử dụng Linux, vì vậy nếu bạn sử dụng Windows, vui lòng sử dụng một cái gì đó như: C: // folder // jar.
Mở stanford-parser-3.xx-models.jar bằng trình quản lý Lưu trữ (7zip).
Duyệt bên trong tệp jar; edu / stanford / nlp / models / lexparser. Một lần nữa, giải nén tệp có tên 'englishPCFG.ser.gz'. Ghi nhớ vị trí bạn giải nén tệp ser.gz này.
Khi tạo một cá thể StanfordParser, bạn có thể cung cấp đường dẫn mô hình dưới dạng tham số. Đây là đường dẫn đầy đủ đến mô hình, trong trường hợp của chúng tôi là /location/of/englishPCFG.ser.gz.
Hãy thử ví dụ của tôi! (đừng quên thay đổi đường dẫn jar và thay đổi đường dẫn mô hình thành vị trí ser.gz)

Question 3

Câu trả lời không được chấp nhận

Câu trả lời bên dưới không được dùng nữa, vui lòng sử dụng giải pháp trên https://stackoverflow.com/a/51981566/610569 cho NLTK v3.3 trở lên.

ĐÃ CHỈNH SỬA

Lưu ý: Câu trả lời sau sẽ chỉ hoạt động trên:

Phiên bản NLTK> = 3.2.4
Stanford Tools được biên soạn từ 2015-04-20
Python 2.7, 3.4 và 3.5 (Python 3.6 chưa được hỗ trợ chính thức)

Vì cả hai công cụ thay đổi khá nhanh và API có thể trông rất khác sau 3-6 tháng. Vui lòng coi câu trả lời sau là tạm thời và không phải là một bản sửa lỗi vĩnh viễn.

Luôn tham khảo https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software để biết hướng dẫn mới nhất về cách giao diện các công cụ Stanford NLP bằng NLTK !!

TL; DR

cd $HOME

# Update / Install NLTK
pip install -U nltk

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip


export STANFORDTOOLSDIR=$HOME

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

Sau đó:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

>>> from nltk.parse.stanford import StanfordDependencyParser
>>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> print [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])])]

Trong thời gian dài:

Đầu tiên , phải lưu ý rằng các công cụ Stanford NLP được viết bằng Java và NLTK được viết bằng Python . Cách NLTK giao tiếp công cụ là thông qua lệnh gọi công cụ Java thông qua giao diện dòng lệnh.

Thứ hai , NLTKAPI cho các công cụ NLP của Stanford đã thay đổi khá nhiều kể từ phiên bản 3.1. Vì vậy, bạn nên cập nhật gói NLTK của mình lên v3.1.

Thứ ba , NLTKAPI đến Công cụ NLP của Stanford bao quanh các công cụ NLP riêng lẻ, ví dụ: Trình gắn thẻ POS Stanford , Trình gắn thẻ NER của Stanford , Trình phân tích cú pháp Stanford .

Đối với trình gắn thẻ POS và NER, nó KHÔNG quấn quanh gói Stanford Core NLP .

Đối với Stanford Parser, đó là một trường hợp đặc biệt khi nó bao quanh cả Stanford Parser và Stanford Core NLP (cá nhân tôi chưa sử dụng cái sau bằng NLTK, tôi muốn theo dõi phần trình diễn của @ dimazest trên http: //www.eecs. qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html )

Lưu ý rằng tính NLTK v3.1, các STANFORD_JARvà STANFORD_PARSERcác biến bị phản đối và không còn sử dụng

Trong thời gian dài hơn:

BƯỚC 1

Giả sử rằng bạn đã cài đặt Java một cách thích hợp trên hệ điều hành của mình.

Bây giờ, hãy cài đặt / cập nhật phiên bản NLTK của bạn (xem http://www.nltk.org/install.html ):

Sử dụng pip :sudo pip install -U nltk
Bản phân phối Debian (sử dụng apt-get):sudo apt-get install python-nltk

Đối với Windows (Sử dụng cài đặt nhị phân 32-bit):

Cài đặt Python 3.4: http://www.python.org/downloads/ (tránh các phiên bản 64-bit)
Cài đặt Numpy (tùy chọn): http://sourceforge.net/projects/numpy/files/NumPy/ (phiên bản chỉ định pythnon3.4)
Cài đặt NLTK: http://pypi.python.org/pypi/nltk
Cài đặt thử nghiệm: Bắt đầu> Python34, sau đó gõ nhập nltk

( Tại sao không phải là 64 bit? Xem https://github.com/nltk/nltk/issues/1079 )

Sau đó, khỏi hoang tưởng, hãy kiểm tra lại nltkphiên bản của bạn bên trong python:

from __future__ import print_function
import nltk
print(nltk.__version__)

Hoặc trên dòng lệnh:

python3 -c "import nltk; print(nltk.__version__)"

Đảm bảo rằng bạn xem 3.1như đầu ra.

Đối với những điều hoang tưởng hơn nữa, hãy kiểm tra xem tất cả API công cụ NLP Stanford yêu thích của bạn đều có sẵn:

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
from nltk.parse.stanford import StanfordNeuralDependencyParser
from nltk.tag.stanford import StanfordPOSTagger, StanfordNERTagger
from nltk.tokenize.stanford import StanfordTokenizer

( Lưu ý : Việc nhập ở trên sẽ CHỈ đảm bảo rằng bạn đang sử dụng phiên bản NLTK chính xác có chứa các API này. Không thấy lỗi trong quá trình nhập không có nghĩa là bạn đã định cấu hình thành công API NLTK để sử dụng Công cụ Stanford)

BƯỚC 2

Bây giờ bạn đã kiểm tra xem bạn có đúng phiên bản NLTK có chứa giao diện công cụ Stanford NLP cần thiết hay không. Bạn cần tải xuống và giải nén tất cả các công cụ Stanford NLP cần thiết.

TL; DR , trong Unix:

cd $HOME

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip

Trong Windows / Mac:

Tải xuống và giải nén trình phân tích cú pháp từ http://nlp.stanford.edu/software/lex-parser.shtml#Download
Tải xuống và giải nén trình gắn thẻ PHIÊN BẢN ĐẦY ĐỦ từ http://nlp.stanford.edu/software/tagger.shtml#Download
Tải xuống và giải nén trình gắn thẻ NER từ http://nlp.stanford.edu/software/CRF-NER.shtml#Download

BƯỚC 3

Thiết lập các biến môi trường để NLTK có thể tự động tìm đường dẫn tệp có liên quan. Bạn phải đặt các biến sau:

Thêm .jartệp Stanford NLP thích hợp vào CLASSPATHbiến môi trường.
- ví dụ: đối với NER, nó sẽ là stanford-ner-2015-04-20/stanford-ner.jar
- ví dụ: đối với POS, nó sẽ là stanford-postagger-full-2015-04-20/stanford-postagger.jar
- ví dụ: đối với trình phân tích cú pháp, nó sẽ là stanford-parser-full-2015-04-20/stanford-parser.jarvà tệp jar của mô hình phân tích cú pháp,stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar
Thêm thư mục mô hình thích hợp vào STANFORD_MODELSbiến (tức là thư mục mà bạn có thể tìm thấy nơi lưu các mô hình được đào tạo trước)
- ví dụ: đối với NER, nó sẽ ở stanford-ner-2015-04-20/classifiers/
- ví dụ: đối với POS, nó sẽ ở stanford-postagger-full-2015-04-20/models/
- ví dụ: đối với Trình phân tích cú pháp, sẽ không có thư mục mô hình.

Trong mã, hãy thấy rằng nó tìm kiếm STANFORD_MODELSthư mục trước khi thêm tên mô hình. Cũng thấy rằng, API cũng tự động cố gắng tìm kiếm các môi trường hệ điều hành cho `CLASSPATH )

Lưu ý rằng kể từ NLTK v3.1, các STANFORD_JARbiến không được dùng nữa và KHÔNG CÒN ĐƯỢC sử dụng . Các đoạn mã được tìm thấy trong các câu hỏi Stackoverflow sau có thể không hoạt động:

TL; DR cho BƯỚC 3 trên Ubuntu

export STANFORDTOOLSDIR=/home/path/to/stanford/tools/

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

( Đối với Windows : Xem https://stackoverflow.com/a/17176423/610569 để biết hướng dẫn thiết lập các biến môi trường)

Bạn PHẢI đặt các biến như trên trước khi bắt đầu python, sau đó:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

Ngoài ra, bạn có thể thử thêm các biến môi trường bên trong python, như các câu trả lời trước đó đã đề xuất nhưng bạn cũng có thể trực tiếp yêu cầu trình phân tích cú pháp / trình gắn thẻ khởi tạo đến đường dẫn trực tiếp nơi bạn lưu giữ .jartệp và các mô hình của mình.

KHÔNG cần đặt các biến môi trường nếu bạn sử dụng phương pháp sau NHƯNG khi API thay đổi tên tham số của nó, bạn sẽ cần phải thay đổi tương ứng. Đó là lý do tại sao bạn nên đặt các biến môi trường hơn là sửa đổi mã python của bạn cho phù hợp với phiên bản NLTK.

Ví dụ ( không đặt bất kỳ biến môi trường nào ):

# POS tagging:

from nltk.tag import StanfordPOSTagger

stanford_pos_dir = '/home/alvas/stanford-postagger-full-2015-04-20/'
eng_model_filename= stanford_pos_dir + 'models/english-left3words-distsim.tagger'
my_path_to_jar= stanford_pos_dir + 'stanford-postagger.jar'

st = StanfordPOSTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('What is the airspeed of an unladen swallow ?'.split())


# NER Tagging:
from nltk.tag import StanfordNERTagger

stanford_ner_dir = '/home/alvas/stanford-ner/'
eng_model_filename= stanford_ner_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'

st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

# Parsing:
from nltk.parse.stanford import StanfordParser

stanford_parser_dir = '/home/alvas/stanford-parser/'
eng_model_path = stanford_parser_dir  + "edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir  + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir  + "stanford-parser.jar"

parser=StanfordParser(model_path=eng_model_path, path_to_models_jar=my_path_to_models_jar, path_to_jar=my_path_to_jar)

Question 4

Câu trả lời không được chấp nhận

Câu trả lời bên dưới không được dùng nữa, vui lòng sử dụng giải pháp trên https://stackoverflow.com/a/51981566/610569 cho NLTK v3.3 trở lên.

Đã chỉnh sửa

Kể từ trình phân tích cú pháp Stanford hiện tại (2015-04-20), đầu ra mặc định cho lexparser.shđã thay đổi nên tập lệnh bên dưới sẽ không hoạt động.

Nhưng câu trả lời này được giữ lại vì lợi ích kế thừa, nó vẫn sẽ hoạt động với http://nlp.stanford.edu/software/stanford-parser-2012-11-12.zip .

Câu trả lời gốc

Tôi đề nghị bạn không gây rối với Jython, JPype. Hãy để python làm nội dung python và để java làm nội dung java, nhận đầu ra Stanford Parser thông qua bảng điều khiển.

Sau khi bạn đã cài đặt Stanford Parser trong thư mục chính của mình ~/, chỉ cần sử dụng công thức python này để nhận phân tích cú pháp có dấu ngoặc phẳng:

import os
sentence = "this is a foo bar i want to parse."

os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parser_out if i.strip()[0] == "("] )
print bracketed_parse

Question 5

Kể từ NLTK v3.3, người dùng nên tránh từ các trình gắn thẻ Stanford NER hoặc POS nltk.tagvà tránh khỏi trình phân đoạn / mã phân đoạn Stanford nltk.tokenize.

Thay vào đó hãy sử dụng nltk.parse.corenlp.CoreNLPParserAPI mới .

Vui lòng xem https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK

(Tránh câu trả lời chỉ liên kết, tôi đã dán các tài liệu từ NLTK github wiki bên dưới)

Trước tiên, hãy cập nhật NLTK của bạn

pip3 install -U nltk # Make sure is >=3.3

Sau đó tải xuống các gói CoreNLP cần thiết:

cd ~
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27

# Get the Chinese model 
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Get the Arabic model
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties 

# Get the French model
wget http://nlp.stanford.edu/software/stanford-french-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-french.properties 

# Get the German model
wget http://nlp.stanford.edu/software/stanford-german-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-german.properties 


# Get the Spanish model
wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-spanish.properties

Tiếng Anh

Vẫn trong stanford-corenlp-full-2018-02-27thư mục, khởi động máy chủ:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,ner,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000 &

Sau đó bằng Python:

>>> from nltk.parse import CoreNLPParser

# Lexical Parser
>>> parser = CoreNLPParser(url='http://localhost:9000')

# Parse tokenized text.
>>> list(parser.parse('What is the airspeed of an unladen swallow ?'.split()))
[Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NN', ['airspeed'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['an']), Tree('JJ', ['unladen'])])]), Tree('S', [Tree('VP', [Tree('VB', ['swallow'])])])])]), Tree('.', ['?'])])])]

# Parse raw string.
>>> list(parser.raw_parse('What is the airspeed of an unladen swallow ?'))
[Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NN', ['airspeed'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['an']), Tree('JJ', ['unladen'])])]), Tree('S', [Tree('VP', [Tree('VB', ['swallow'])])])])]), Tree('.', ['?'])])])]

# Neural Dependency Parser
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parses = dep_parser.parse('What is the airspeed of an unladen swallow ?'.split())
>>> [[(governor, dep, dependent) for governor, dep, dependent in parse.triples()] for parse in parses]
[[(('What', 'WP'), 'cop', ('is', 'VBZ')), (('What', 'WP'), 'nsubj', ('airspeed', 'NN')), (('airspeed', 'NN'), 'det', ('the', 'DT')), (('airspeed', 'NN'), 'nmod', ('swallow', 'VB')), (('swallow', 'VB'), 'case', ('of', 'IN')), (('swallow', 'VB'), 'det', ('an', 'DT')), (('swallow', 'VB'), 'amod', ('unladen', 'JJ')), (('What', 'WP'), 'punct', ('?', '.'))]]


# Tokenizer
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> list(parser.tokenize('What is the airspeed of an unladen swallow?'))
['What', 'is', 'the', 'airspeed', 'of', 'an', 'unladen', 'swallow', '?']

# POS Tagger
>>> pos_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
>>> list(pos_tagger.tag('What is the airspeed of an unladen swallow ?'.split()))
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

# NER Tagger
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> list(ner_tagger.tag(('Rami Eid is studying at Stony Brook University in NY'.split())))
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'STATE_OR_PROVINCE')]

người Trung Quốc

Khởi động máy chủ hơi khác một chút, vẫn từ thư mục `stanford-corenlp-full-2018-02-27:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000

Trong Python:

>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

>>> list(parser.parse(parser.tokenize(u'我家没有电脑。')))
[Tree('ROOT', [Tree('IP', [Tree('IP', [Tree('NP', [Tree('NN', ['我家'])]), Tree('VP', [Tree('VE', ['没有']), Tree('NP', [Tree('NN', ['电脑'])])])]), Tree('PU', ['。'])])])]

tiếng Ả Rập

Khởi động máy chủ:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Trong Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9005')
>>> text = u'انا حامل'

# Parser.
>>> parser.raw_parse(text)
<list_iterator object at 0x7f0d894c9940>
>>> list(parser.raw_parse(text))
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['انا'])]), Tree('NP', [Tree('NN', ['حامل'])])])])]
>>> list(parser.parse(parser.tokenize(text)))
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['انا'])]), Tree('NP', [Tree('NN', ['حامل'])])])])]

# Tokenizer / Segmenter.
>>> list(parser.tokenize(text))
['انا', 'حامل']

# POS tagg
>>> pos_tagger = CoreNLPParser('http://localhost:9005', tagtype='pos')
>>> list(pos_tagger.tag(parser.tokenize(text)))
[('انا', 'PRP'), ('حامل', 'NN')]


# NER tag
>>> ner_tagger = CoreNLPParser('http://localhost:9005', tagtype='ner')
>>> list(ner_tagger.tag(parser.tokenize(text)))
[('انا', 'O'), ('حامل', 'O')]

người Pháp

Khởi động máy chủ:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-french.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9004  -port 9004 -timeout 15000

Trong Python:

>>> parser = CoreNLPParser('http://localhost:9004')
>>> list(parser.parse('Je suis enceinte'.split()))
[Tree('ROOT', [Tree('SENT', [Tree('NP', [Tree('PRON', ['Je']), Tree('VERB', ['suis']), Tree('AP', [Tree('ADJ', ['enceinte'])])])])])]
>>> pos_tagger = CoreNLPParser('http://localhost:9004', tagtype='pos')
>>> pos_tagger.tag('Je suis enceinte'.split())
[('Je', 'PRON'), ('suis', 'VERB'), ('enceinte', 'ADJ')]

tiếng Đức

Khởi động máy chủ:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-german.properties \
-preload tokenize,ssplit,pos,ner,parse \
-status_port 9002  -port 9002 -timeout 15000

Trong Python:

>>> parser = CoreNLPParser('http://localhost:9002')
>>> list(parser.raw_parse('Ich bin schwanger'))
[Tree('ROOT', [Tree('NUR', [Tree('S', [Tree('PPER', ['Ich']), Tree('VAFIN', ['bin']), Tree('AP', [Tree('ADJD', ['schwanger'])])])])])]
>>> list(parser.parse('Ich bin schwanger'.split()))
[Tree('ROOT', [Tree('NUR', [Tree('S', [Tree('PPER', ['Ich']), Tree('VAFIN', ['bin']), Tree('AP', [Tree('ADJD', ['schwanger'])])])])])]


>>> pos_tagger = CoreNLPParser('http://localhost:9002', tagtype='pos')
>>> pos_tagger.tag('Ich bin schwanger'.split())
[('Ich', 'PPER'), ('bin', 'VAFIN'), ('schwanger', 'ADJD')]

>>> pos_tagger = CoreNLPParser('http://localhost:9002', tagtype='pos')
>>> pos_tagger.tag('Ich bin schwanger'.split())
[('Ich', 'PPER'), ('bin', 'VAFIN'), ('schwanger', 'ADJD')]

>>> ner_tagger = CoreNLPParser('http://localhost:9002', tagtype='ner')
>>> ner_tagger.tag('Donald Trump besuchte Angela Merkel in Berlin.'.split())
[('Donald', 'PERSON'), ('Trump', 'PERSON'), ('besuchte', 'O'), ('Angela', 'PERSON'), ('Merkel', 'PERSON'), ('in', 'O'), ('Berlin', 'LOCATION'), ('.', 'O')]

người Tây Ban Nha

Khởi động máy chủ:

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-spanish.properties \
-preload tokenize,ssplit,pos,ner,parse \
-status_port 9003  -port 9003 -timeout 15000

Trong Python:

>>> pos_tagger = CoreNLPParser('http://localhost:9003', tagtype='pos')
>>> pos_tagger.tag(u'Barack Obama salió con Michael Jackson .'.split())
[('Barack', 'PROPN'), ('Obama', 'PROPN'), ('salió', 'VERB'), ('con', 'ADP'), ('Michael', 'PROPN'), ('Jackson', 'PROPN'), ('.', 'PUNCT')]
>>> ner_tagger = CoreNLPParser('http://localhost:9003', tagtype='ner')
>>> ner_tagger.tag(u'Barack Obama salió con Michael Jackson .'.split())
[('Barack', 'PERSON'), ('Obama', 'PERSON'), ('salió', 'O'), ('con', 'O'), ('Michael', 'PERSON'), ('Jackson', 'PERSON'), ('.', 'O')]

Question 6

Có giao diện python cho trình phân tích cú pháp stanford

http://projects.csail.mit.edu/spatial/Stanford_Parser

Question 7

Trang phần mềm Stanford Core NLP có danh sách các trình bao bọc python:

http://nlp.stanford.edu/software/corenlp.shtml#Extensions

Question 8

Nếu tôi nhớ rõ, trình phân tích cú pháp Stanford là một thư viện java, do đó bạn phải có trình thông dịch Java chạy trên máy chủ / máy tính của mình.

Tôi đã sử dụng nó một lần là một máy chủ, kết hợp với một tập lệnh php. Tập lệnh đã sử dụng hàm execute () của php để thực hiện lệnh gọi dòng lệnh tới trình phân tích cú pháp như sau:

<?php

exec( "java -cp /pathTo/stanford-parser.jar -mx100m edu.stanford.nlp.process.DocumentPreprocessor /pathTo/fileToParse > /pathTo/resultFile 2>/dev/null" );

?>

Tôi không nhớ tất cả các chi tiết của lệnh này, về cơ bản nó đã mở fileToParse, phân tích cú pháp nó và viết kết quả đầu ra trong resultFile. Sau đó PHP sẽ mở tệp kết quả để sử dụng tiếp.

Phần cuối của lệnh chuyển hướng chi tiết của trình phân tích cú pháp tới NULL, để ngăn thông tin dòng lệnh không cần thiết làm phiền tập lệnh.

Tôi không biết nhiều về Python, nhưng có thể có một cách để thực hiện các cuộc gọi dòng lệnh.

Đó có thể không phải là con đường chính xác mà bạn mong đợi, nhưng hy vọng nó sẽ mang lại cho bạn một số cảm hứng. May mắn nhất.

Question 9

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Đây là một bản điều chỉnh của mã nguy hiểm98 hoạt động với nltk3.0.0 trên windoze và có lẽ là các nền tảng khác, hãy điều chỉnh tên thư mục sao cho phù hợp với thiết lập của bạn:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = 'd:/stanford-parser'
os.environ['STANFORD_MODELS'] = 'd:/stanford-parser'
os.environ['JAVAHOME'] = 'c:/Program Files/java/jre7/bin'

parser = stanford.StanfordParser(model_path="d:/stanford-grammars/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

Lưu ý rằng lệnh phân tích cú pháp đã thay đổi (xem mã nguồn tại www.nltk.org/_modules/nltk/parse/stanford.html) và bạn cần xác định biến JAVAHOME. Tôi đã cố gắng để nó đọc tệp ngữ pháp tại chỗ trong jar, nhưng cho đến nay vẫn không làm được.

Question 10

Bạn có thể sử dụng đầu ra Stanford Parsers để tạo Cây trong nltk (nltk.tree.Tree).

Giả sử trình phân tích cú pháp stanford cung cấp cho bạn một tệp trong đó có chính xác một cây phân tích cú pháp cho mỗi câu. Sau đó, ví dụ này hoạt động, mặc dù nó có thể trông không giống con trăn lắm:

f = open(sys.argv[1]+".output"+".30"+".stp", "r")
parse_trees_text=[]
tree = ""
for line in f:
  if line.isspace():
    parse_trees_text.append(tree)
tree = ""
  elif "(. ...))" in line:
#print "YES"
tree = tree+')'
parse_trees_text.append(tree)
tree = ""
  else:
tree = tree + line

parse_trees=[]
for t in parse_trees_text:
  tree = nltk.Tree(t)
  tree.__delitem__(len(tree)-1) #delete "(. .))" from tree (you don't need that)
  s = traverse(tree)
  parse_trees.append(tree)

Question 11

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Vì không ai thực sự đề cập đến và bằng cách nào đó, điều đó khiến tôi gặp rắc rối rất nhiều, đây là một cách thay thế để sử dụng trình phân tích cú pháp Stanford trong python:

stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar'    
parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                        path_to_models_jar=stanford_model_jar)

bằng cách này, bạn không cần phải lo lắng về đường dẫn nữa.

Đối với những người không thể sử dụng nó đúng cách trên Ubuntu hoặc chạy mã trong Eclipse.

Question 12

Tôi đang sử dụng máy tính windows và bạn có thể chỉ cần chạy trình phân tích cú pháp bình thường như khi thực hiện từ lệnh tương tự nhưng như trong một thư mục khác, do đó bạn không cần phải chỉnh sửa tệp lexparser.bat. Chỉ cần đưa vào đường dẫn đầy đủ.

cmd = r'java -cp \Documents\stanford_nlp\stanford-parser-full-2015-01-30 edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" \Documents\stanford_nlp\stanford-parser-full-2015-01-30\stanford-parser-3.5.1-models\edu\stanford\nlp\models\lexparser\englishFactored.ser.gz stanfordtemp.txt'
parse_out = os.popen(cmd).readlines()

Phần khó khăn đối với tôi là nhận ra cách chạy một chương trình java từ một đường dẫn khác. Phải có một cách tốt hơn nhưng cách này hoạt động.

Question 13

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Một bản cập nhật nhỏ (hoặc đơn giản là thay thế) về câu trả lời toàn diện của Nguy hiểm89 về việc sử dụng Stanford Parser trong NLTK và Python

Với stanford-parser-full-2015-04-20, JRE 1.8 và nltk 3.0.4 (python 2.7.6), có vẻ như bạn không cần trích xuất englishPCFG.ser.gz từ stanford-parser-xxx-models .jar hoặc thiết lập bất kỳ môi trường nào

from nltk.parse.stanford import StanfordParser

english_parser = StanfordParser('path/stanford-parser.jar', 'path/stanford-parser-3.5.2-models.jar')

s = "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes."

sentences = english_parser.raw_parse_sents((s,))
print sentences #only print <listiterator object> for this version

#draw the tree
for line in sentences:
    for sentence in line:
        sentence.draw()

Question 14

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Đây là phiên bản windows của câu trả lời của alvas

sentences = ('. '.join(['this is sentence one without a period','this is another foo bar sentence '])+'.').encode('ascii',errors = 'ignore')
catpath =r"YOUR CURRENT FILE PATH"

f = open('stanfordtemp.txt','w')
f.write(sentences)
f.close()

parse_out = os.popen(catpath+r"\nlp_tools\stanford-parser-2010-08-20\lexparser.bat "+catpath+r"\stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parse_out if i.strip() if i.strip()[0] == "("] )
bracketed_parse = "\n(ROOT".join(bracketed_parse.split(" (ROOT")).split('\n')
aa = map(lambda x :ParentedTree.fromstring(x),bracketed_parse)

LƯU Ý:

Trong lexparser.bat bạn cần thay đổi tất cả các đường dẫn thành đường dẫn tuyệt đối để tránh các lỗi java như "không tìm thấy lớp"
Tôi thực sự khuyên bạn nên áp dụng phương pháp này trong cửa sổ vì tôi đã thử một số câu trả lời trên trang và tất cả các phương pháp giao tiếp python với Java đều không thành công.
mong muốn nhận được phản hồi từ bạn nếu bạn thành công trên windows và mong bạn có thể cho tôi biết cách bạn vượt qua tất cả những vấn đề này.
tìm kiếm trình bao bọc python cho stanford coreNLP để tải phiên bản python

Question 15

Tôi mất nhiều giờ và cuối cùng đã tìm ra một giải pháp đơn giản cho người dùng Windows. Về cơ bản, phiên bản tóm tắt của nó về một câu trả lời hiện có của alvas, nhưng được thực hiện dễ dàng (hy vọng) đối với những người mới sử dụng stanford NLP và là người dùng Window.

1) Tải xuống mô-đun bạn muốn sử dụng, chẳng hạn như NER, POS, v.v. Trong trường hợp của tôi, tôi muốn sử dụng NER, vì vậy tôi đã tải xuống mô-đun từ http://nlp.stanford.edu/software/stanford-ner-2015- 04-20.zip

2) Giải nén tệp.

3) Đặt các biến môi trường (classpath và stanford_modules) từ thư mục đã giải nén.

import os
os.environ['CLASSPATH'] = "C:/Users/Downloads/stanford-ner-2015-04-20/stanford-ner.jar"
os.environ['STANFORD_MODELS'] = "C:/Users/Downloads/stanford-ner-2015-04-20/classifiers/"

4) đặt các biến môi trường cho JAVA, như ở nơi bạn đã cài đặt JAVA. đối với tôi nó ở bên dưới

os.environ['JAVAHOME'] = "C:/Program Files/Java/jdk1.8.0_102/bin/java.exe"

5) nhập mô-đun bạn muốn

from nltk.tag import StanfordNERTagger

6) gọi mô hình được đào tạo trước có trong thư mục phân loại trong thư mục đã giải nén. thêm ".gz" vào cuối cho phần mở rộng tệp. đối với tôi mô hình tôi muốn sử dụng làenglish.all.3class.distsim.crf.ser

st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')

7) Bây giờ thực thi trình phân tích cú pháp !! và chúng tôi đã hoàn thành !!

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

Question 16

Câu trả lời không được chấp nhận

Câu trả lời bên dưới không được dùng nữa, vui lòng sử dụng giải pháp trên https://stackoverflow.com/a/51981566/610569 cho NLTK v3.3 trở lên.

ĐÃ CHỈNH SỬA

Lưu ý: Câu trả lời sau sẽ chỉ hoạt động trên:

Phiên bản NLTK == 3.2.5
Stanford Tools được biên soạn từ 2016-10-31
Python 2.7, 3.5 và 3.6

Vì cả hai công cụ thay đổi khá nhanh và API có thể trông rất khác sau 3-6 tháng. Vui lòng coi câu trả lời sau là tạm thời và không phải là một bản sửa lỗi vĩnh viễn.

Luôn tham khảo https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software để biết hướng dẫn mới nhất về cách giao diện các công cụ Stanford NLP bằng NLTK !!

TL; DR

Mã theo dõi đến từ https://github.com/nltk/nltk/pull/1735#issuecomment-306091826

Trong thiết bị đầu cuối:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

Trong Python:

>>> from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
>>> from nltk.parse.corenlp import CoreNLPParser

>>> stpos, stner = CoreNLPPOSTagger(), CoreNLPNERTagger()

>>> stpos.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> next(
...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )

>>> parse_fox.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> parse_wolf.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .

>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )

>>> parse_dog.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog

Vui lòng xem tại http://www.nltk.org/_modules/nltk/parse/corenlp.html để biết thêm thông tin về API Stanford. Hãy nhìn vào docstrings!

Question 17

Lưu ý rằng câu trả lời này áp dụng cho NLTK v 3.0 và không áp dụng cho các phiên bản mới hơn.

Tôi không thể để đây là một bình luận vì danh tiếng, nhưng vì tôi đã dành (lãng phí?) Một thời gian để giải quyết vấn đề này, tôi muốn chia sẻ vấn đề / giải pháp của mình để trình phân tích cú pháp này hoạt động trong NLTK.

Trong câu trả lời xuất sắc từ alvas , nó được đề cập rằng:

ví dụ: đối với Trình phân tích cú pháp, sẽ không có thư mục mô hình.

Điều này đã dẫn tôi sai đến:

không cẩn thận với giá trị tôi đặt STANFORD_MODELS (và chỉ quan tâm đến của tôi CLASSPATH)
để lại ../path/tostanford-parser-full-2015-2012-09/models directory* gần như trống rỗng * (hoặc với một tệp jar có tên không khớp với nltk regex)!

Nếu OP, giống như tôi, chỉ muốn sử dụng trình phân tích cú pháp, có thể gây nhầm lẫn rằng khi không tải xuống bất kỳ thứ gì khác (không có POStagger, không có NER, ...) và làm theo tất cả các hướng dẫn này, chúng tôi vẫn gặp lỗi.

Cuối cùng, đối với bất kỳ CLASSPATHcâu trả lời nào đã cho (ví dụ sau và giải thích trong câu trả lời từ chuỗi này), tôi vẫn gặp lỗi:

NLTK không thể tìm thấy trình phân tích cú pháp stanford - (\ d +) (. (\ D +)) + - models.jar! Đặt biến môi trường CLASSPATH. Để biết thêm thông tin, trên stanford-parser - (\ d +) (. (\ D +)) + - models.jar,

xem: http://nlp.stanford.edu/software/lex-parser.shtml

HOẶC LÀ:

NLTK không thể tìm thấy stanford-parser.jar! Đặt biến môi trường CLASSPATH. Để biết thêm thông tin, trên stanford-parser.jar, hãy xem: http://nlp.stanford.edu/software/lex-parser.shtml

Mặc dù vậy , quan trọng là tôi có thể tải và sử dụng trình phân tích cú pháp một cách chính xác nếu tôi gọi hàm với tất cả các đối số và đường dẫn được chỉ định đầy đủ, như trong:

stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanfor-parser-3.5.2-models.jar'    
parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                    path_to_models_jar=stanford_model_jar)

Giải pháp cho Parser một mình:

Do đó, lỗi đến từ NLTKvà cách nó tìm kiếm các lọ bằng cách sử dụng các biến được cung cấp STANFORD_MODELSvà CLASSPATHmôi trường. Để giải quyết vấn đề này, phải *-models.jarcó định dạng đúng (để khớp với regex trong NLTKmã, vì vậy không có jar -corenlp -....) phải được đặt trong thư mục được chỉ định bởi STANFORD_MODELS.

Cụ thể, lần đầu tiên tôi tạo:

mkdir stanford-parser-full-2015-12-09/models

Sau đó được thêm vào .bashrc:

export STANFORD_MODELS=/path/to/stanford-parser-full-2015-12-09/models

Và cuối cùng, bằng cách sao chép stanford-parser-3.6.0-models.jar(hoặc phiên bản tương ứng), vào:

path/to/stanford-parser-full-2015-12-09/models/

Tôi có thể StanfordParsertải trơn tru trong python với cổ điển CLASSPATHchỉ đến stanford-parser.jar. Trên thực tế, như vậy, bạn có thể gọi StanfordParsermà không có tham số, mặc định sẽ chỉ hoạt động.

Question 18

Tôi đang sử dụng nltk phiên bản 3.2.4. Và mã sau đã làm việc cho tôi.

from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize

# Alternatively to setting the CLASSPATH add the jar and model via their 
path:
jar = '/home/ubuntu/stanford-postagger-full-2017-06-09/stanford-postagger.jar'
model = '/home/ubuntu/stanford-postagger-full-2017-06-09/models/english-left3words-distsim.tagger'

pos_tagger = StanfordPOSTagger(model, jar)

# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)

text = pos_tagger.tag(word_tokenize("Open app and play movie"))
print(text)

Đầu ra:

[('Open', 'VB'), ('app', 'NN'), ('and', 'CC'), ('play', 'VB'), ('movie', 'NN')]

Question 19

Một sự phát triển mới của trình phân tích cú pháp Stanford dựa trên mô hình thần kinh, được đào tạo bằng cách sử dụng Tensorflow gần đây đã được cung cấp để sử dụng như một API python. Mô hình này được cho là chính xác hơn nhiều so với mô hình dựa trên Java. Bạn chắc chắn có thể tích hợp với đường dẫn NLTK.

Liên kết với trình phân tích cú pháp. Kho lưu trữ của họ chứa các mô hình phân tích cú pháp được đào tạo trước cho 53 ngôn ngữ.