152

Tôi biết URL của một hình ảnh trên Internet.

ví dụ: http://www.digimouth.com/news/media/2011/09/google-logo.jpg , có chứa logo của Google.

Bây giờ, làm cách nào tôi có thể tải xuống hình ảnh này bằng Python mà không thực sự mở URL trong trình duyệt và lưu tệp theo cách thủ công.

python web-scraping

— Pankaj Vatsa
nguồn

1

Có thể trùng lặp Làm thế nào để tôi tải xuống một tệp qua HTTP bằng Python?

— Jaydev

316

Con trăn 2

Đây là một cách đơn giản hơn nếu tất cả những gì bạn muốn làm là lưu nó dưới dạng tệp:

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

Đối số thứ hai là đường dẫn cục bộ nơi tệp sẽ được lưu.

Con trăn 3

Như SergO đã đề xuất mã dưới đây sẽ hoạt động với Python 3.

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

— Chất lỏng
nguồn

55

Một cách tốt để có được tên tệp từ liên kết làfilename = link.split('/')[-1]

— heltonbiker

2

với urlretrieve Tôi chỉ nhận được tệp 1KB với văn bản lỗi chính tả và 404 bên trong. Tại sao? Nếu tôi nhập url vào trình duyệt của mình, tôi có thể nhận được hình ảnh

— Yebach

2

@Yebach: Trang web bạn đang tải xuống có thể đang sử dụng cookie, Tác nhân người dùng hoặc các tiêu đề khác để xác định nội dung nào sẽ phục vụ bạn. Chúng sẽ khác nhau giữa trình duyệt của bạn và Python.

— Liquid_Fire

27

Python 3 : import urllib.request vàurllib.request.urlretrieve(), theo đó.

— SergO

1

@SergO - bạn có thể thêm phần Python 3 vào câu trả lời ban đầu không?

— Sreejith Menon

27

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg sẽ chứa hình ảnh của bạn.

— Noufal Ibrahim
nguồn

2

Bạn nên mở tệp ở chế độ nhị phân: open("file01.jpg", "wb")Nếu không, bạn có thể làm hỏng hình ảnh.

— Liquid_Fire

2

urllib.urlretrievecó thể lưu hình ảnh trực tiếp.

— heltonbiker

17

Tôi đã viết một kịch bản chỉ làm điều này , và nó có sẵn trên github của tôi để bạn sử dụng.

Tôi đã sử dụng BeautifulSoup để cho phép tôi phân tích bất kỳ trang web nào cho hình ảnh. Nếu bạn sẽ làm nhiều việc quét web (hoặc có ý định sử dụng công cụ của tôi), tôi đề nghị bạn sudo pip install BeautifulSoup. Thông tin về BeautifulSoup có sẵn ở đây .

Để thuận tiện ở đây là mã của tôi:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

— Vâng
nguồn

11

Điều này có thể được thực hiện với các yêu cầu. Tải trang và kết xuất nội dung nhị phân vào một tệp.

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

— AlexG
nguồn

1

tiêu đề người dùng trong các yêu cầu nếu nhận được yêu cầu xấu :)

— 1UC1F3R616

8

Con trăn 3

urllib.request - Thư viện mở rộng để mở URL

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

— Trung sĩ
nguồn

6

Một giải pháp hoạt động với Python 2 và Python 3:

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

hoặc, nếu yêu cầu bổ sung requestslà chấp nhận được và nếu đó là URL http (s):

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

— Martin Thoma
nguồn

5

Tôi đã tạo một kịch bản mở rộng trên kịch bản của Yup. Tôi đã sửa một số thứ. Bây giờ nó sẽ bỏ qua 403: Các vấn đề bị cấm. Nó sẽ không sụp đổ khi một hình ảnh không được lấy. Nó cố gắng để tránh xem trước bị hỏng. Nó được các url tuyệt đối đúng. Nó cung cấp thêm thông tin. Nó có thể được chạy với một đối số từ dòng lệnh.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

— madprops
nguồn

3

Sử dụng thư viện yêu cầu

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

— Sohan
nguồn

Có vẻ như tiêu đề thực sự quan trọng trong trường hợp của tôi, tôi đã nhận được 403 lỗi. Nó đã làm việc.

— Ishtiyaq Husain

2

Đây là câu trả lời rất ngắn.

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

— OO7
nguồn

2

Phiên bản cho Python 3

Tôi đã điều chỉnh mã của @madprops cho Python 3

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

— Giovanni G. PY
nguồn

1

Một cái gì đó mới mẻ cho Python 3 bằng cách sử dụng Yêu cầu:

Nhận xét trong mã. Sẵn sàng sử dụng chức năng.


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

— Pavel Pančocha
nguồn

0

Trả lời muộn, nhưng đối với python>=3.6bạn có thể sử dụng dload , tức là:

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

nếu bạn cần hình ảnh như bytes, sử dụng:

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

cài đặt bằng pip3 install dload

— Convid19
nguồn

-2

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

— Lewis Mann
nguồn

4

Chào mừng bạn đến với Stack Overflow! Mặc dù bạn có thể đã giải quyết vấn đề của người dùng này, nhưng câu trả lời chỉ có mã không hữu ích cho người dùng đến với câu hỏi này trong tương lai. Vui lòng chỉnh sửa câu trả lời của bạn để giải thích lý do tại sao mã của bạn giải quyết vấn đề ban đầu.

— Joe C

1

TypeError: a bytes-like object is required, not 'Response'. Nó phải làhandler.write(img_data.content)

— TitanFighter 17/03/18

Nó phải handler.write(img_data.read()).

— jdhao

Làm cách nào để lưu hình ảnh cục bộ bằng Python có địa chỉ URL mà tôi đã biết?

Con trăn 2

Con trăn 3

Phiên bản cho Python 3