Mã trăn nhanh nhất để tìm một tập hợp các từ chiến thắng trong trò chơi này

Đây là một trò chơi chữ từ một bộ thẻ hoạt động cho trẻ em. Bên dưới các quy tắc là mã để tìm bộ ba tốt nhất bằng cách sử dụng / usr / share / dict / words. Tôi nghĩ rằng đó là một vấn đề tối ưu hóa thú vị, và tự hỏi liệu mọi người có thể tìm thấy sự cải thiện.

Quy tắc

Chọn một chữ cái từ mỗi bộ dưới đây.
Chọn một từ bằng cách sử dụng các chữ cái đã chọn (và bất kỳ từ nào khác).
Điểm số từ.
- Mỗi chữ cái từ bộ được chọn sẽ có số được hiển thị cùng với bộ (bao gồm lặp lại).
- AEIOU đếm 0
- Tất cả các chữ cái khác là -2
Lặp lại các bước 1-3 ở trên (không sử dụng lại các chữ cái trong bước 1) hai lần nữa.
Điểm cuối cùng là tổng của ba điểm số từ.

Bộ

(đặt 1 điểm 1 điểm, đặt 2 điểm 2 điểm, v.v.)

Mã số:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Phiên bản ma trận là những gì tôi nghĩ ra sau khi viết một bản bằng python thuần (sử dụng từ điển và chấm điểm từng từ một cách độc lập), và một bản khác trong numpy nhưng sử dụng lập chỉ mục thay vì nhân ma trận.

Tối ưu hóa tiếp theo sẽ là loại bỏ hoàn toàn các nguyên âm khỏi việc chấm điểm (và sử dụng ord()chức năng đã được sửa đổi ), nhưng tôi tự hỏi liệu có những cách tiếp cận nào nhanh hơn không.

EDIT : thêm mã timeit.timeit

EDIT : Tôi đang thêm một tiền thưởng, mà tôi sẽ đưa ra bất kỳ cải thiện nào tôi thích nhất (hoặc có thể là nhiều câu trả lời, nhưng tôi sẽ phải tích lũy thêm một số danh tiếng nếu đó là trường hợp).

fastest-code python optimization

— bạn
nguồn

BTW, tôi đã viết mã để cho ba tuổi tám tuổi của tôi ghi nhớ khi anh ấy chơi trò chơi với mẹ. Bây giờ tôi biết xylopyrography có nghĩa là gì.

Đây là một câu hỏi thú vị. Tôi nghĩ rằng bạn có thể nhận được nhiều phản hồi hơn nếu bạn cung cấp các thông tin sau: (1) Liên kết đến danh sách từ trực tuyến để mọi người làm việc với cùng một bộ dữ liệu. (2) Đặt giải pháp của bạn trong một chức năng duy nhất. (3) Chạy chức năng đó bằng mô-đun time-it để hiển thị thời gian. (4) Đảm bảo đặt tải dữ liệu từ điển bên ngoài chức năng để chúng tôi không kiểm tra tốc độ đĩa. Mọi người sau đó có thể sử dụng mã hiện tại làm khung để so sánh các giải pháp của họ.

Tôi sẽ viết lại để sử dụng thời gian, nhưng để so sánh công bằng, tôi phải sử dụng máy của riêng tôi (điều mà tôi rất vui khi làm cho mọi người đăng giải pháp). Một danh sách từ nên có sẵn trên hầu hết các hệ thống, nhưng nếu không, có một số ở đây: wordlist.sourceforge.net

So sánh công bằng có thể có nếu mỗi người dùng lần lượt giải pháp của bạn và bất kỳ giải pháp được đăng nào khác so với chính họ trên máy của họ. Sẽ có một số khác biệt đa nền tảng, nhưng nói chung phương pháp này hoạt động.

Hừm, trong trường hợp đó tôi tự hỏi liệu đây có phải là trang web chính xác không. Tôi nghĩ rằng SO sẽ là phù hợp nhất.

— Joey

Câu trả lời:

Sử dụng ý tưởng của Keith về việc tính toán trước số điểm tốt nhất có thể cho mỗi từ, tôi đã giảm thời gian thực hiện xuống còn khoảng 0,7 giây trên máy tính của mình (sử dụng danh sách 75.228 từ).

Bí quyết là đi qua các tổ hợp từ được chơi thay vì tất cả các tổ hợp chữ cái được chọn. Chúng tôi có thể bỏ qua tất cả trừ một vài kết hợp từ (203 sử dụng danh sách từ của tôi) vì chúng không thể đạt được điểm cao hơn chúng tôi đã tìm thấy. Gần như tất cả thời gian thực hiện được dành trước khi tính điểm từ.

Python 2.7:

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

Điều này trả về giải pháp ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']với số điểm 95. Với các từ từ giải pháp của Keith được thêm vào danh sách từ, tôi nhận được kết quả tương tự như anh ta. Với "xylopyrography" của bạn được thêm vào, tôi nhận được ['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']số điểm 105.

— flornquake
nguồn

Đây là một ý tưởng - bạn có thể tránh kiểm tra nhiều từ bằng cách nhận thấy rằng hầu hết các từ có điểm số khủng khiếp. Giả sử bạn đã tìm thấy một trò chơi ghi bàn khá tốt giúp bạn có được 50 điểm. Sau đó, bất kỳ trò chơi nào có nhiều hơn 50 điểm phải có một từ ít nhất là trần (51/3) = 17 điểm. Vì vậy, bất kỳ từ nào không thể tạo ra 17 điểm có thể bị bỏ qua.

Đây là một số mã làm như trên. Chúng tôi tính toán điểm số tốt nhất có thể cho mỗi từ trong từ điển và lưu trữ nó trong một mảng được lập chỉ mục theo điểm số. Sau đó, chúng tôi sử dụng mảng đó để chỉ kiểm tra các từ có số điểm tối thiểu bắt buộc.

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Điểm tối thiểu nhanh chóng lên tới 100, có nghĩa là chúng ta chỉ cần xem xét hơn 33 từ điểm, đây là một phần rất nhỏ trong tổng số (tôi /usr/share/dict/wordscó 208662 từ hợp lệ, chỉ có 1723 trong số đó là 33+ điểm = 0,8%). Chạy trong khoảng nửa giờ trên máy của tôi và tạo:

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— Keith Randall
nguồn

Đẹp. Tôi sẽ thêm nó vào giải pháp ma trận (loại bỏ các từ khi điểm của chúng giảm quá thấp), nhưng điều này tốt hơn đáng kể so với bất kỳ giải pháp trăn thuần túy nào tôi đã nghĩ ra.

— năm11 lúc 18 giờ 28 phút

Tôi không chắc chắn tôi đã từng thấy rằng nhiều vòng lặp lồng nhau trước đây.

— Peter Olson

Kết hợp ý tưởng của bạn với tính điểm ma trận (và giới hạn trên chặt chẽ hơn về điểm số tốt nhất có thể) giảm thời gian xuống còn khoảng 80 giây trên máy của tôi (từ khoảng một giờ). mã ở đây

— năm11

Một phần tốt của thời gian đó là trong sự tính toán trước điểm số tốt nhất có thể, có thể được thực hiện nhanh hơn rất nhiều.

— năm11