CS188-pj2-Multiagent_心得技巧

心得技巧

html5 HTML/Xhtml CSS XML/XSLT Dreamweaver教程 Frontpage教程心得技巧

上一篇: On Number of Decompositions in... 下一篇:PHP

CS188-pj2-Multiagent

发布时间：2022-07-05 发布网站：脚本宝典

脚本宝典收集整理的这篇文章主要介绍了CS188-pj2-Multiagent，脚本宝典觉得挺不错的，现在分享给大家，也给大家做个参考。

1、对抗性搜索

　　对抗搜索也称为博弈搜索。

　　在人工智能领域可以定义为：在一定规则条件下，有完整信息的、确定性的、轮流行动的、两个游戏者的零和游戏（如象棋）。

　　零和游戏：即你死我活，损人利己，为了我取得最佳结果，对手必须取得最差结果。

　　既然是游戏，那么就可以对其进行建模了：

　　　　初始状态：游戏开始时的初始值。

　　　　后继函数：进行决策，实现状态之间的迁移。

　　　　终止测试：测试判断游戏是否结束，游戏结束的状态称为终止状态。

　　　　评估函数：判断游戏的结果，谁输谁赢，以及赢了多少。

2、MiniMax - 冷酷的上帝

　　Minimax算法常用于棋类等由两方较量的游戏和程序。

　　Minimax算法是一个零总和算法，即一方要在可选的选项中选择将其优势最大化的选择，另一方则选择令对手优势最小化的方法。

　　MiniMax也是一个悲观算法，它假定对手是永不犯错的，在完美对手下寻找最小的损失。

MiniMax算法伪码：function minimax(node, depth)
    if node is a terminal node or depth = 0　　//终止条件
        return the heuristic value of node
    if the adversary is to play at node　　　　 //完美对手，总是选择对其最优的
        let α := +∞
        foreach child of node
            α := min(α, minimax(child, depth-1))
    else {we are to play at node}　　　　　　　　  // 我方选择，对我方最有利的
        let α := -∞
        foreach child of node
            α := max(α, minimax(child, depth-1))
    return α

　　如何将MiniMax应用于多人博弈？

　　　　将博弈参与者抽象为我方和对手。

　　　　多个对手表现为到对手出手时走多次。

　　　　每次对抗，我方走一次，对手走n-1次（n为参与者数量）。

　　　　即是用向量来代替最小化值。

　　CS-188 MiniMax代理实现：

class MinimaxAgent(MultiAgentSeArchAgent):
    """
        Your minimax agent (question 2)
    """
    def getAction(self, gamestate):
        """
        Returns the minimax action From the current gameState using self.depth
        and self.evaluationFunction.

        Here are some method calls that might be useful when implementing minimax.

        gameState.getLegalActions(agentIndex):
        Returns a list of legal actions for an agent
        agentIndex=0 means Pacman, Ghosts are >= 1

        gameState.generateSuccessor(agentIndex, action):
        Returns the successor game state after an agent takes an action

        gameState.getNumAgents():
        Returns the total number of agents in the game

        gameState.isWin():
        Returns whether or not the game state is a winning state

        gameState.isLose():
        Returns whether or not the game state is a losing state
        """
        "*** YOUR CODE HERE ***"

        # 获取agent的index
        Ghosts = [i for i in range(1, gameState.getNumAgents())]
        
        # 判断游戏是否结束或者是否到达根节点
        def terminate(state, d):
            return state.isWin() or state.isLose() or d == self.depth

        # 计算节点的最小化值
        def min_value(state, d, ghost): 
            if terminate(state, d):
                return self.evaluationFunction(state)
            # Min节点值初始化为正无穷大
            v = float('inf')    
            for action in state.getLegalActions(ghost):
                if ghost == Ghosts[-1]:
                    v = min(v, max_value(state.generateSuccessor(ghost, action), d + 1))
                else:
                    v = min(v, min_value(state.generateSuccessor(ghost, action), d, ghost + 1))
            return v
        
        # 计算节点的最小化值
        def max_value(state, d):  
            if terminate(state, d):
                return self.evaluationFunction(state)
            # Min节点值初始化为负无穷大
            v = -float('inf')
            for action in state.getLegalActions(0):
                v = max(v, min_value(state.generateSuccessor(0, action), d, 1))
            return v
        # 获得每一个可行的action和它的收益
        res = [(action, min_value(gameState.generateSuccessor(0, action), 0, 1)) for action in gameState.getLegalActions(0)]
        # 按收益大小从小到大排序
        res.sort(key=lambda k: k[1])
        # 返回最大收益对应的action
        return res[-1][0]

CS188-pj2-Multiagent

3、Alpha-Beta剪枝

　　Alpha-beta剪枝是一种找到最佳minimax步骤的方法，同时可以避免搜索不可能被选择的步骤的子树

　　　　Alpha：可能步骤的最大下界，初始化为正无穷大

　　　　Beta：可能步骤的最小上界，初始化为负无穷大

　　由于MiniMax算法假设对手是完美的，即我方的选择是在对手选出的最小上界下的最大值，

　　因此有效路径满足： α ≤ value ≤ β　　对于不满足该条件的路径，可以把它剪去，不用去扩展了。

　　剪枝只影响计算效率，不影响最终结果。

function alphabeta(node, depth, α, β, maximizingplayer) 　　// node = 节点，depth = 当前层次，maximizingPlayer = 大分玩家
     if depth = 0 or node is a terminal node
         return the heuristic value of the node
     if we are to play at node
         v := -∞
         foreach child of node
             v := max(v, alphabeta(child, depth - 1, α, β, FALSE)) // child = 子節點
             α := max(α, v)
             if β ≤ α
                 break // β裁剪
         return v
     else {the adversary is to play at node}
         v := ∞
         foreach child of node
             v := min(v, alphabeta(child, depth - 1, α, β, TRUE))
             β := min(β, v)
             if β ≤ α
                 break // α裁剪
         return v

CS-188 带剪枝的MiniMax代理实现：

class AlphaBetaAgent(MultiAgentSearchAgent):
    """
    Your minimax agent wITh alpha-beta PRuning (question 3)
    """

    def getAction(self, gameState):
        """
        Returns the minimax action using self.depth and self.evaluationFunction
        """
        "*** YOUR CODE HERE ***"
        
        now_value = -float('inf')
        # α初始化为负无穷大
        alpha = -float('inf')
        # β初始化为正无穷大
        beta = float('inf')
        nextAction = Directions.STOP

        # 在所有的可选节点中，选择最大收益的节点
        legal_actions = gameState.getLegalActions(0).copy()
        for next_action in legal_actions:
            nextState = gameState.generateSuccessor(0, next_action)
            next_value = self.get_node_value(nextState, 0, 1, alpha, beta)
            if next_value > now_value:
                now_value, nextAction = next_value, next_action
            alpha = max(alpha, now_value)
        return nextAction

    def get_node_value(self, gameState, cur_depth=0, agent_index=0, alpha=-1e10, beta=1e10):
        """
        Using self-defined function, alpha_value(), beta_value() to choose the most appropriate action
        Only when it's the final state, can we get the value of each node, using the self.evaluationFunction(gameState)
        Otherwise we just get the alpha/beta value we defined here.
        """
        max_party = [0,]
        min_party = list(range(1, gameState.getNumAgents()))
        
        # 达到尽头，返回评价结果
        if cur_depth == self.depth or gameState.isLose() or gameState.isWin():
            return self.evaluationFunction(gameState)
        # 目前节点是最大化节点，尝试更新α的值
        elif agent_index in max_party:
            return self.alpha_value(gameState, cur_depth, agent_index, alpha, beta)
        # 目前节点是最小化节点，尝试更新β的值
        elif agent_index in min_party:
            return self.beta_value(gameState, cur_depth, agent_index, alpha, beta)
        else:
            print('Errors occur in your party division !!! ')

    def alpha_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')):
        # α初始化为负无穷大
        v = -float('inf')
        legal_actions = gameState.getLegalActions(agent_index)
        for index, action in enumerate(legal_actions):
            next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta)
            v = max(v, next_v)
            # 出现了α>β的情况，剪枝
            if v > beta:
                return v
            alpha = max(alpha, v)
        return v

    def beta_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')):
        # α初始化为正无穷大
        v = float('inf')
        legal_actions = gameState.getLegalActions(agent_index)
        # 多人博弈，一个pacman，多个ghost
        for index, action in enumerate(legal_actions):
            if agent_index == gameState.getNumAgents() - 1:
                # pacman进行决策
                next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth + 1, 0, alpha, beta)
                v = min(v, next_v)
                # 出现了α>β的情况，剪枝
                if v < alpha:
                    return v
            else:
                # ghost进行决策
                next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta)
                v = min(v, next_v)
                # 出现了α>β的情况，剪枝
                if v < alpha:
                    return v
            beta = min(beta, v)
        return v

CS188-pj2-Multiagent

4、ExpectiMax

　　MiniMax假定对手永不犯错，但是，这不太现实。

　　而且，在假设对手会犯错的时候，智能体的决策会更好。

　　ExPEctimax搜索算法是用于最大化期望效用博弈论算法。它是Minimax 算法的一种变体。

　　Minimax 假设对手（最小化者）以最佳方式进行游戏，而 Expectimax 则不然。

　　Expectimax 引入了机会节点，采用机会节点与最大最小化节点交替来使得对手不那么“完美”。

function expectiminimax(node, depth)
    if node is a terminal node or depth = 0
        return the heuristic value of node
    if the adversary is to play at node
        // Return value of minimum-valued child node
        let α := +∞
        foreach child of node
            α := min(α, expectiminimax(child, depth-1))
    else if we are to play at node
        // Return value of maximum-valued child node
        let α := -∞
        foreach child of node
            α := max(α, expectiminimax(child, depth-1))
    else if random event at node
        // Return weighted average of all child nodes' values
        let α := 0
        foreach child of node
            α := α + (Probability[child] × expectiminimax(child, depth-1))
    return α

CS-188 ExpectiMax代理实现：

class ExpectimaxAgent(MultiAgentSearchAgent):
    """
      Your expectimax agent (question 4)
    """
    def getAction(self, gameState):
        """
        Returns the expectimax action using self.depth and self.evaluationFunction

        All ghosts should be modeled as choosing uniforMLy at random from their
        legal moves.
        """
        "*** YOUR CODE HERE ***"

        maxValue = -float('inf')
        maxAction = Directions.STOP

        for action in gameState.getLegalActions(agentIndex=0):
            sucState = gameState.generateSuccessor(action=action, agentIndex=0)
            sucValue = self.expNode(sucState, currentDepth=0, agentIndex=1)
            if sucValue > maxValue:
                maxValue = sucValue
                maxAction = action
        return maxAction
    
    # 判断是否结束
    def terminate(self, state, d):
        return state.isWin() or state.isLose() or d == self.depth
    
    # 计算节点的最大化值
    def maxNode(self, gameState, currentDepth):
        if self.terminate(gameState, currentDepth):
            return self.evaluationFunction(gameState)
        maxValue = -float('inf')
        # 计算节点最大化值从它的后继chance节点来获取
        for action in gameState.getLegalActions(agentIndex=0):
            sucState = gameState.generateSuccessor(action=action, agentIndex=0)
            sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=1)
            if sucValue > maxValue:
                maxValue = sucValue
        return maxValue

    # 计算chance节点值
    def expNode(self, gameState, currentDepth, agentIndex):
        if self.terminate(gameState, currentDepth):
            return self.evaluationFunction(gameState)
        numAction = len(gameState.getLegalActions(agentIndex=agentIndex))
        totalValue = 0.0
        numAgent = gameState.getNumAgents()
        for action in gameState.getLegalActions(agentIndex=agentIndex):
            sucState = gameState.generateSuccessor(agentIndex=agentIndex, action=action)
            if agentIndex == numAgent - 1:
                sucValue = self.maxNode(sucState, currentDepth=currentDepth + 1)
            # 每个ghost都要计算一次（相当于用向量代替了min值）
            else:
                sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=agentIndex + 1)
            totalValue += sucValue
        # 返回后继节点的平均值
        return totalValue / numAction

CS188-pj2-Multiagent

5、betterevaluationFunction

　　除了对抗博弈之外，难道就没有其他的解法了嘛？

　　其实，自己玩一下pacman游戏就会发现，

　　我们要做的，就是在不被ghost碰到的同时尽快地吃完食物（离食物尽量近，离ghost尽量远）。

　　因此，可以对食物和ghost设一些权值，在下一步是食物时，给pacman一些奖励，告诉它往食物方向走。

　　在下一步要碰到ghost时，给pacman一些惩罚，告诉它不能往那边走。

　　通过将食物权值设为正，将ghost权值设为负，就可以轻松实现这一点。

　　权值的大小则对应了食物诱惑的大小和对ghost的恐惧大小，可以进行参数上的调节。

　　同时，不能忘记：pacman吃了大豆子之后，是可以不怕ghost的，这时可以激进一点，主动去吃ghost。

def betterEvaluationFunction(currentGameState):
    """
      Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable
      evaluation function (question 5).
    """
    newPos = currentGameState.getPacmanPosition()
    newFood = currentGameState.getFood()
    newGhostStates = currentGameState.getGhostStates()

    WEIGHT_FOOD = 10.0              # food的权值
    WEIGHT_GHOST = -10.0              # Ghost的权值
    WEIGHT_SCAred_GHOST = 100.0      # Scared ghost的权值
    
    # 基于目前的得分来进行计算
    score = currentGameState.getScore()
    # 计算和食物的距离
    distancesToFoodList = [util.manhattanDistance(newPos, foodPos) for foodPos in newFood.asList()]
    #距离食物越近，奖励越多
    if len(distancesToFoodList) > 0:
        score += WEIGHT_FOOD / min(distancesToFoodList)
    else:
        score += WEIGHT_FOOD

    for ghost in newGhostStates:
        distance = manhattanDistance(newPos, ghost.getPosition())
        if distance > 0:
            # 距离Sacred ghost越近奖励越多，鼓励它去进攻
            if ghost.scaredTimer > 0:
                score += WEIGHT_SCARED_GHOST / distance
            # 否则，躲开
            else:
                score += WEIGHT_GHOST / distance
        # 下一步就要碰到ghost了，赶紧跑！
        else:
            return -float('inf')  # Pacman is dead at this point

    return score

# Abbreviation
better = betterEvaluationFunction