2025-02-07
编程
00
请注意,本文编写于 80 天前,最后修改于 50 天前,其中某些信息可能已经过时。

目录

selenium安装
selenium基本用法
seleniumgetcookies
seleniummakecookies
selenium_move
selenium_slider

selenium是python的一个扩展包,通常用来自动化web测试或者爬虫

selenium安装

pip install selenium

必须要下载浏览器对应版本的驱动包,放在python.exe同级目录,或者指定目录

selenium基本用法

#!/usr/bin/env python # -*- coding: UTF-8 -*- ''' @Date :2023/12/24 16:21 强制等待, sleep,设置等待多长时间,就要等待多长时间。等待完成后,才会继续下一步: 智能等待, implicitly_wait所谓智能等待,就是在我设置的等待时间范围内,只要满足了我的条件,就会立即结束等待,继续往下进行,如果超时,则抛出异常。 implicitly_wait也称之为隐性等待,不需要导入,直接使用webdriver对象调用。它主要执行两件事情,等待元素找到,执行命令。还以刚才的代码为例子, 如果我们创建了webdriver对象之后,直接设置implicitly_wait等待,等待一旦设置,那么这个等会在浏览器对象的整个生命周期起作用: 显性等待, 相较于隐性等待,这个显性等待要明确等待条件和等待上限。比如隐性等待,只要元素存在,可找到就可以,但显性等待,我要明确条件是我的元素可见 ''' import time from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.wait import WebDriverWait #显性等待 from selenium.webdriver.support import expected_conditions as EC #显性等待 # 创建一个参数对象,用来控制chrome以无界面模式打开 chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') # 打开chrome浏览器 web = webdriver.Chrome(options=chrome_options) # 智能等待,在设置时间范围内,只要条件成立,马上结束等待, implicitly_wait chrome_driver.implicitly_wait(5) # 打开携程网注册页面 web.get('https://passport.ctrip.com/user/reg/home') print(web.page_source)# 打印源码 print(web.get_cookies())# 打印源码 # 使用显性等待, 等待元素id="TANGRAM__PSP_11__changeSmsCodeItem"加载到dom树中,等待上限是10s,每0.8秒去验证一下条件是否成立. # driver:浏览器驱动, # timeout:等待上限,单位是秒, # poll_frequency:检测的轮询间隔,默认值是0.5秒, # ignored_exceptions:超时后的抛出的异常信息,默认抛出NoSuchElementExeception异常 WebDriverWait(chrome_driver, 10, 0.8).until(EC.presence_of_element_located((By.ID, 'TANGRAM__PSP_11__changeSmsCodeItem'))) # 发现点击元素并点击 web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]").click() # 发现输入框先输入,后点击 web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]").send_keys("python",Keys.ENTER) # 切换新窗口 web.switch_to.window(web.window_handles[-1]) #寻找数据,并得到数据 web_list = web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]") for li in web_list: result = li.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]").text print(result) # 切换原来的窗口 web.switch_to.window(web.window_handles[0]) # 定位到下拉列表 sel = web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]") #对元素进行包装 sel_list = Select(sel) for i in range(len(sel_list)): sel_list.select_by_index(i)#按照索引进行切换 time.sleep(2) # 发现输入框先输入,后点击 tabel = web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]") print(tabel.text) #关闭窗口 web.close() #关闭浏览器 web.quit()

selenium_get_cookies

#!/usr/bin/env python # -*- coding: UTF-8 -*- ''' @Date :2023/12/24 19:00 pickle 是Python特有的序列化工具,能够快速高效存储Python数据类型,反序列化读取后返回的仍是原先的python数据类型。而.txt 等都是字符串类型,需要转换。 ''' import os import pickle import time from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait brower = webdriver.Chrome() wait = WebDriverWait(brower, 10) def getTaobaoCookies(): # get login taobao cookies url = "https://www.taobao.com/" brower.get("https://login.taobao.com/member/login.jhtml") while True: print("Please login in taobao.com!") time.sleep(3) # if login in successfully, url jump to www.taobao.com while brower.current_url == url: tbCookies = brower.get_cookies() brower.quit() cookies = {} for item in tbCookies: cookies[item['name']] = item['value'] outputPath = open('taobaoCookies.pickle','wb') pickle.dump(cookies,outputPath) outputPath.close() return cookies

selenium_make_cookies

#!/usr/bin/env python # -*- coding: UTF-8 -*- ''' @Date :2023/12/24 19:01 ''' #读取cookies 信息 def readTaobaoCookies(): # if hava cookies file ,use it # if not , getTaobaoCookies() if os.path.exists('taobaoCookies.pickle'): readPath = open('taobaoCookies.pickle','rb') tbCookies = pickle.load(readPath) else: tbCookies = getTaobaoCookies() return tbCookies #使用cookies tbCookies = readTaobaoCookies() brower.get("https://www.taobao.com") for cookie in tbCookies: brower.add_cookie({ "domain":".taobao.com", "name":cookie, "value":tbCookies[cookie], "path":'/', "expires":None }) brower.get("https://www.taobao.com")

selenium_move

#!/usr/bin/env python # -*- coding: UTF-8 -*- ''' @Date :2023/12/24 19:20 ''' driver = webdriver.Chrome() driver.get('https://www.baidu.com/') driver.implicitly_wait(5) # ------------------------------鼠标移动------------------------------------------- # 定位设置元素 set_ele = driver.find_element_by_xpath("//div[@id='u1']//a[text()='设置']") # 第一步:创建一个鼠标操作的对象 action = ActionChains(driver) # 第二步:进行移动 action.move_to_element(set_ele) # 第三步:执行动作 action.perform() # 三行代码写成一行:支持链式调用 ActionChains(driver).move_to_element(set_ele).perform() # 等待高级设置可点击 WebDriverWait(driver, 5, 0.2).until( EC.element_to_be_clickable((By.XPATH, "//a[text()='高级搜索']")) ).click() # ------------------------------鼠标滑动------------------------------------------- # 选择拖动滑块的节点 sli_ele = driver.find_element_by_id('tcaptcha_drag_thumb') # ------------鼠标滑动操作------------ action = ActionChains(driver) # 第一步:在滑块处按住鼠标左键 action.click_and_hold(sli_ele) # 第二步:相对鼠标当前位置进行移动 action.move_by_offset(100,0) # 第三步:释放鼠标 action.release() # 执行动作 action.perform() # ------------------------------鼠标拖动------------------------------------------- s = WebDriverWait(driver, 30, 0.5).until( EC.visibility_of_element_located((By.ID, 'draggable')) ) t = WebDriverWait(driver, 30, 0.5).until( EC.visibility_of_element_located((By.ID, 'droppable')) ) # ------------鼠标滑动操作------------ action = ActionChains(driver) # 第一步:拖动元素 action.drag_and_drop(s, t) # 执行动作 action.perform()

selenium_slider

#!/usr/bin/env python # -*- coding: UTF-8 -*- ''' @Date :2023/12/22 12:32 ''' import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChains # 打开chrome浏览器 web = webdriver.Chrome() # chrome浏览器窗口最大化 web.maximize_window() #隐性等待,在开头设置过之后,整个的程序运行过程中都会有效 web.implicitly_wait(10) # 打开携程网注册页面 web.get('https://passport.ctrip.com/user/reg/home') web.find_element(By.XPATH,"/html/body/div[5]/div[3]/a[2]").click() # 点击同意并继续 # web.find_element(By.XPATH, '//div[@class="pop_footer"]/a[@class="reg_btn reg_agree"]').click() # 定位到滑块按钮元素 # ele_button = web.find_element(By.XPATH, '//div[@class="cpt-drop-btn"]') ele_button = web.find_element(By.XPATH, '/html/body/div[1]/div[2]/div/div[1]/div/dl[2]/dd/div/div[1]/div[2]') # 打印滑块按钮的宽和高 print('滑块按钮的宽:', ele_button.size['width']) print('滑块按钮的高:', ele_button.size['height']) # 定位到滑块区域元素 # ele = web.find_element(By.XPATH, '//div[@class="cpt-bg-bar"]') ele = web.find_element(By.XPATH, '/html/body/div[1]/div[2]/div/div[1]/div/dl[2]/dd/div') # 打印滑块区域的宽和高 print('滑块区域的宽:', ele.size['width']) print('滑块区域的高:', ele.size['height']) # 拖动滑块 ActionChains(d).drag_and_drop_by_offset(ele_button, ele.size['width'], ele.size['height']).perform() time.sleep(10)

文章如有错误,还望留言指正

参考资料
特殊原因,不便展示,请见谅