Trộn hai danh sách cùng một lúc với cùng một thứ tự

Question 1

Tôi đang sử dụng kho tài liệu nltkcủa thư viện movie_reviewschứa một số lượng lớn tài liệu. Nhiệm vụ của tôi là dự đoán hiệu suất của các bài đánh giá này với việc xử lý trước dữ liệu và không cần xử lý trước. Nhưng có một vấn đề, trong danh sách documentsvà documents2tôi có các tài liệu giống nhau và tôi cần xáo trộn chúng để giữ cùng thứ tự trong cả hai danh sách. Tôi không thể xáo trộn chúng một cách riêng biệt bởi vì mỗi lần tôi xáo trộn danh sách, tôi sẽ nhận được các kết quả khác. Đó là lý do tại sao tôi cần xáo trộn cùng một lúc với cùng một thứ tự vì cuối cùng tôi cần so sánh chúng (điều đó phụ thuộc vào thứ tự). Tôi đang sử dụng python 2.7

Ví dụ (trong thực tế là các chuỗi được mã hóa, nhưng nó không phải là tương đối):

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

And I need get this result after shuffle both lists:

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

I have this code:

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Question 2

You can do it as:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

Hope it helps. Good Luck.

Question 3

I get a easy way to do this

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

Question 4

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

Question 5

Shuffle an arbitray number of lists simultaneously.

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

Output:

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

Note:
objects returned by shuffle_list() are tuples.

P.S. shuffle_list() can also be applied to numpy.array()

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

Output:

$ (3, 1, 2) (6, 4, 5)

Question 6

Easy and fast way to do this is to use random.seed() with random.shuffle() . It lets you generate same random order many times you want. It will look like this:

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

This also works when you can't work with both lists at the same time, because of memory problems.

Question 7

You can use the second argument of the shuffle function to fix the order of shuffling.

Specifically, you can pass the second argument of shuffle function a zero argument function which returns a value in [0, 1). The return value of this function fixes the order of shuffling. (By default i.e. if you do not pass any function as the second argument, it uses the function random.random(). You can see it at line 277 here.)

This example illustrates what I described:

import random

a = ['a', 'b', 'c', 'd', 'e']
b = [1, 2, 3, 4, 5]

r = random.random()            # randomly generating a real in [0,1)
random.shuffle(a, lambda : r)  # lambda : r is an unary function which returns r
random.shuffle(b, lambda : r)  # using the same function as used in prev line so that shuffling order is same

print a
print b

Output:

['e', 'c', 'd', 'a', 'b']
[5, 3, 4, 1, 2]