优雅的二次封装Selenium(Python基础版)

这次我们使用python对selenium进行一个简单的二次封装~既然是二次封装,那我们就加上自己需要的公共方法~例如:log,异常捕捉,智能等待。先看一下最后使用简单封装后的小爬虫运行结果~

  • 那么接下来咋们来一步一步实现~咋们先把原生的logging 模块封装一个乞丐版方法(非面向对象编程,看起来很简单),代码如下,你可以把logger 看成一个一个记录事件的人,而addHandler 就是告诉logger 需要用什么方式来记录,如果有两种方式记录,可以定义两种方式(例如代码里的变量file,console),并定义在什么情况下记录日志(setLevel),并以什么方式记录(setFormatter),这样理解就不难啦~另外’\033[95m’ 这些字符代表字体颜色~哈哈大家可以自己试试五颜六色的log ~
HEADER = '\033[95m' #颜色代码
OKBLUE = '\033[94m'
OKGREEN = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
ENDC = '\033[0m'

filePath = os.path.dirname(__file__)
fileNamePath = os.path.split(os.path.realpath(__file__))[0]
logPath = os.path.join(fileNamePath,'log.txt') #log文件路径

logger = logging.getLogger(__name__)
logger.handlers.clear() #确保handlers为空
logger.setLevel(level = logging.DEBUG) #设置log输出等级
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # 格式化输出
'''保存到文件'''
file = logging.FileHandler(logPath,"w")
file.setLevel(logging.INFO)
file.setFormatter(formatter)
'''输出到控制台'''
console = logging.StreamHandler()
console.setLevel(logging.INFO)
console.setFormatter(formatter)

logger.addHandler(file)  #添加handler
logger.addHandler(console)

class Log():
    @classmethod
    def info(self,msg):
        logger.info(msg)

    @classmethod
    def debug(self,msg):
        logger.debug(msg)

    @classmethod
    def warning(self, msg):
        logger.warning(msg)

    @classmethod
    def error(self,msg):
        logger.error(WARNING+str(msg)+ENDC)

if __name__ == "__main__":
    Log.info('Test')
    Log.debug('Test')
    Log.warning('Test')
    Log.error('Test')
  • 接下来我们来封装selenium与log,我们根据selenium的api,来设计一种可以可复用的智能等待与异常捕捉,想到复用性,那第一个想到的肯定是装饰器啦~可以省去多个方法中重复的部分,重复的部分有哪些呢? UI自动化最多,且重要的部分是元素定位啦~那么我们来写一个元素定位异常捕捉,智能等待,与Log结合的一个装饰器~先由定位方式(简单介绍三种)写出如下代码~
    if by == "id": #判断操作方法是否为by id
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.ID, value)),message)
    elif by == "xpath": #判断操作方法是否为by xpath
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.XPATH, value)), message)
    elif by == "css": #判断操作方法是否为by css
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, value)), message)
    else:
        raise NameError("keyword error!") #操作方法不在定义内,报错。
  • 大家都知道官方称css定位效率那么是为什么呢?嘻嘻~咋们先观察一下selenium定位源码(例如id),不难看出定位方式最终返回的是find_element这个函数。
    def find_element_by_id(self, id_):
            """Finds an element by id.
    
            :Args:
             - id\_ - The id of the element to be found.
    
            :Returns:
             - WebElement - the element if it was found
    
            :Raises:
             - NoSuchElementException - if the element wasn't found
    
            :Usage:
                element = driver.find_element_by_id('foo')
            """
            return self.find_element(by=By.ID, value=id_)
  • 咋们再来看以下find_element这个函数 这个函数最终将不同种定位方式全部转换成了,By.CSS_SELECTOR ,由此可得 CSS定位效率最高~
    def find_element(self, by=By.ID, value=None):
            """
            Find an element given a By strategy and locator. Prefer the find_element_by_* methods when
            possible.
    
            :Usage:
                element = driver.find_element(By.ID, 'foo')
    
            :rtype: WebElement
            """
            if self.w3c:
                if by == By.ID:
                    by = By.CSS_SELECTOR
                    value = '[id="%s"]' % value
                elif by == By.TAG_NAME:
                    by = By.CSS_SELECTOR
                elif by == By.CLASS_NAME:
                    by = By.CSS_SELECTOR
                    value = ".%s" % value
                elif by == By.NAME:
                    by = By.CSS_SELECTOR
                    value = '[name="%s"]' % value
            return self.execute(Command.FIND_ELEMENT, {
                'using': by,
                'value': value})['value']

     

  • 接着之前所讲我们来设计一个传入值以’=>’为分隔符,例如’xpath=>xxxx’,那么将会用xpath方式查找xxxx元素~代码如下(css为传入值):
    by, value = css.split("=>")[0].strip(), css.split("=>")[1].strip() #以=>为分隔符提取,并去掉头尾空格/换行符。
  • 再加上之前封装过的log模块得到代码如下:
    if "=>" not in css:  #判断传入值是否符合格式规范
        raise NameError("format error!")
    by, value = css.split("=>")[0].strip(), css.split("=>")[1].strip()
    message = f'Element: {css} not found in {OVER_TIME} seconds.'
    if by == "id": 
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.ID, value)),message)
    elif by == "xpath": 
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.XPATH, value)), message)
    elif by == "css": 
        Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, value)), message)
    else:
        raise NameError("keyword error!") 
    Log.info(f'--> RUN TIME: <{time.time() - s}> {func.__name__}==>{css}') #输出运行时间,运行方法名称,以及入参值。
  • 是不是发现漏了点啥~嘻嘻 在log之前我们需要加入执行的方法~此时的log会记录执行时间,执行函数,与操作方法~于此同时我们将其改造成装饰器让它更方便的被调用~这个代码执行逻辑为,获取当前时间,判断传入值格式是否正确,并用隐性等待(隐性等待只针对某个元素,显性针对全局),若在规定时间内找到该元素,就会将其赋值给global 变量,Element,嘻嘻这样,被装饰的函数就可以变相的得到装饰函数返回值啦~
    Element = None
    
    def catch_exception_log(func):
        def fun(self,css):
            global Element #声明全局变量Element
            s = time.time()
            try:
                if "=>" not in css:
                    raise NameError("format error!")
                by, value = css.split("=>")[0].strip(), css.split("=>")[1].strip()
                message = f'Element: {css} not found in {OVER_TIME} seconds.'
                if by == "id":
                    Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.ID, value)),message)
                elif by == "xpath":
                    Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.XPATH, value)), message)
                elif by == "css":
                    Element = WebDriverWait(self.driver, OVER_TIME, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, value)), message)
                else:
                    raise NameError("keyword error!")
                result = func(css) #执行被装饰函数
                Log.info(f'--> RUN TIME: <{time.time() - s}> {func.__name__}==>{css}')
            except  Exception as e:
                Log.error(f'--> RUN TIME: <{time.time() - s}> error==>{e}')  #异常捕捉
            finally:
                return result  #最终返回被装饰函数的结果
        return fun 
  • 同理封装一个针对于js的(更简单的封装,如果js操作元素的话,理论上都要加元素等待判断此次忽略~)
    def js_catch_exception_log(func):
        def fun(self,script):
            s = time.time()
            try:
                result = func(self,script)
                Log.info(f'--> RUN TIME: <{time.time() - s}> {func.__name__}==>{script}')
            except  Exception as e:
                Log.error(f'--> RUN TIME: <{time.time() - s}> error==>{e}')
            finally:
                return  result
        return fun
  • 哈哈这样俩个装饰器就写好啦~那么我们来写base类吧~咋们使用__new__实现一个简单的单例模式如下(此次忽略多线程,若多线程下实现单例,需要加一把Lock自行了解吧~),咋们用刚才写的装饰器装饰上我们的方法~这样以后 封装只需要写一个简单的方法即可,装饰器会自带元素等待判断,log功能~是不是很简单~
    class Driver(object):
    
        def __new__(cls, *args, **kwargs): #单例模式
            if not hasattr(Driver, "_instance"):
                Driver._instance = object.__new__(cls)  
            return Driver._instance
    
        def start(self, url=BASE_URL, driver_name="Chrome"):
            try:
                if driver_name == "Firefox":  #判断所需驱动
                    self.driver = webdriver.Firefox(firefox_driver)
                elif driver_name == "Ie":
                    self.driver = webdriver.Ie(ie_driver)
                else:
                    self.driver = webdriver.Chrome(chrome_driver)
                self.driver.get(url)
                self.driver.maximize_window() #窗口最大化
                Log.info(str(driver_name)+'==>Start Success!')
            except Exception as e:
                Log.error(e)
    
        @catch_exception_log
        def find_element_click(self,css=None):
            Element.click() #根据装饰器返回的Element直接操作,若之前有异常会被捕捉,并不会执行
    
        @js_catch_exception_log
        def excute_js(self,script=None):
            return self.driver.execute_script(script)
  • 那这种方法比一般的封装好在哪里呢?嘻嘻看图,去掉每个方法中的重复部分直接对元素操作即可~是不是很简单~
191,909 次浏览

“优雅的二次封装Selenium(Python基础版)”的11,464个回复


    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20258816 bytes) in /www/wwwroot/ayoc.top/wp-includes/comment-template.php on line 2101