WO2016202009A1

WO2016202009A1 - Road traffic light coordination and control method based on reinforcement learning

Info

Publication number: WO2016202009A1
Application number: PCT/CN2016/075265
Authority: WO
Inventors: 朱斐; 朱海军; 伏玉琛; 刘全; 杨炯; 任勇
Original assignee: 苏州大学张家港工业技术研究院
Priority date: 2015-06-17
Filing date: 2016-03-01
Publication date: 2016-12-22
Also published as: CN105046987A; CN105046987B

Abstract

A road traffic light coordination and control method based on reinforcement learning, comprising: a monitoring device is provided corresponding to each intersection, and each monitoring device is connected to a remote server through a network module. The control method comprises: (1) the remote server calculates a waiting time S by receiving a video signal; (2) the remote server performs analysis to obtain a road congestion condition under each phase state a _i; (3) the remote server obtains a feasible degree ci _ai under the phase state a _i, wherein when a flow of traffic can pass through the road, the road is clear and the feasible degree ci _ai is 1; otherwise, the road is congested and the feasible degree ci _ai is 0; (4) the waiting time S and the feasible degree ci _ai are used to calculate an optimal driving phase state a _i of the intersection; (5) adjust the traffic lights. Based on video information acquired in real time and by means of coordination and control of traffic lights of a plurality of intersections in one area, traffic efficiency is improved, the flow of traffic of the area is maximized, and the road traffic congestion condition is alleviated.

Description

Road traffic signal light coordinated control method based on reinforcement learning

Technical field

[0001] The present invention relates to a road traffic signal light control method, and more particularly to a road traffic signal light coordinated control method based on reinforcement learning.

Background technique

[0002] Transportation is the foundation of modern society and the lifeblood of human society. People's social behavior is closely related to transportation. In a city, the number of motor vehicles and non-motor vehicles is large, and the intersections and road sections are complicated. It is very complicated to deal with such a large-scale, dynamic and highly uncertain distributed system. work. In the absence of new traffic roads, improving traffic utilization efficiency through reasonable traffic control, and thus improving traffic efficiency is an effective way to quickly solve urban traffic problems.

[0003] However, traffic congestion and congestion are now becoming more serious. The reason for the traffic problem is that on the one hand, due to more and more vehicles, traffic planning and design lags behind, on the other hand, many traffic signal control systems are relatively backward, and traffic lights are not well regulated according to actual traffic conditions. To improve the efficiency of traffic. The use of computing technology and machine intelligence to help solve traffic problems has become more and more important, and has become a trend.

[0004] In recent years, a large number of road traffic monitoring devices have been put into use, and traffic video data has been continuously transmitted to the traffic management department. How to make full use of these traffic video data and improve the control of road traffic signals to improve the efficiency of road traffic has attracted more and more attention.

[0005] At present, some intelligent traffic control systems have been applied, but the congestion problem between adjacent intersections in a traffic area faced by actual traffic control has not been well solved. Regional road traffic coordination control can better deal with this problem. Regional road traffic signal control, by considering the traffic traffic conditions of multiple intersections in a traffic area, traffic traffic light control can achieve higher traffic efficiency than traffic signal control considering only the traffic conditions of a single intersection. For example, the "green wave belt" type road traffic signal light control method is on the designated traffic line. When the vehicle speed range of the road section is specified, the signal control machine is required to pass the intersection of the motor vehicle according to the road section distance. At the beginning of the lamp, adjust accordingly, so as to ensure that the motor vehicle arrives at each intersection and encounters a "green light", so that the motor vehicle of the line can obtain the highest traffic efficiency.

[0006] However, this method cannot make actual adjustments according to the actual road traffic conditions, so that the regional road traffic signal control can not exert its advantages and is ineffective. For example, in the morning peaks and late peaks, there are many factors to consider, such as the bus gathering near the bus station, the pedestrians in the vicinity of the school are experiencing school and school, and so on. These factors can cause some road junctions to be poor or even embarrassing. At present, many traffic management departments can only rely on manpower to conduct on-site command and directly control the changes of signal lights. Manually managing traffic lights is easy to cause omissions. Similarly, manual management of traffic lights can only manage the traffic lights of a single intersection. It is difficult to achieve coordinated control of regional traffic lights. It is likely that traffic participants have passed a certain intersection. However, due to the large traffic volume in front, the embarrassing situation of still encountering congestion is still caused. If you consider regional traffic coordination, stopping the release is probably the best option. Therefore, how to maximize the use of the existing actual traffic video data and equipment, achieve regional traffic coordination control, adapt to changes in road conditions, reduce the workload of traffic management departments, ease traffic congestion, is urgently needed to solve problem.

technical problem

Problem solution

Technical solution

[0007] The object of the present invention is to provide a road traffic signal coordinated control method based on reinforcement learning, which automatically adjusts and controls traffic lights of a certain area to improve traffic participation by collecting actual video data and based on vehicle state transition. The efficiency of the traffic, ease the traffic congestion, and thus reduce the workload of the traffic management department.

[0008] The technical solution of the present invention is: a method for coordinated control of road traffic signal lights based on reinforcement learning, comprising monitoring devices corresponding to each intersection, each of the monitoring devices being connected to a remote server via a network module, and a control method thereof For:

[0009] (1) The remote server receives the video signal sent by the monitoring device, and calculates the waiting time s of the vehicle on each road corresponding to the intersection, which is the parking time of the vehicle in the case of red light and green light;

[0010] (2) Taking the combination of each red-green light corresponding to the lane passing mode at the intersection as a phase state , the remote server is in each phase state

Next, obtain the road congestion condition according to the waiting day analysis obtained in step (1);

[0011] (3) according to the current phase state

Next, the green light can pass through the traffic lane of the lane, and the remote server obtains the phase state.

Feasibility

, when the traffic flow can be expressed as fluent through 吋, feasibility

Is 1, otherwise it is congested, feasible

Is 0;

[0012] (4) The remote server obtains the waiting time S obtained in the step (1) and the feasibility obtained in the step (3) '3⁄4

, analyzing and judging the phase states of the intersection

Under the driving situation, through the recording and updating of the driving data of the daytime, the optimal driving phase state at the intersection is calculated by the program software analysis.

[0013] (5) according to the optimal driving phase state

Adjust the red light green light combination of the intersection to get the maximum traffic flow. [0014] in the above technical solution, the phase state

For the road traffic signal, the red light green light combination state of the lanes of each lane, corresponding to the green light lane, the vehicle is allowed to go straight through the intersection to reach the opposite lane, and the right turn lane is also allowed to pass, only when going straight and turning right In the passable state, the feasibility in the step (3) is ^3⁄4:

Is 1, otherwise considered as congestion, feasibility

0; On the lane corresponding to the red light, the vehicle is parked.

[0015] In the above technical solution, the waiting time includes a parking space in a red light state of the vehicle on the lane, and a parking space in a green light state.

[0016] In the above technical solution, the weight value of the corresponding lane is set according to the traffic flow demand of the main road, the secondary road or the bus lane.

: 3⁄4

[0017] In the above technical solution, the “program software analysis calculation” in the step (4) is a kernel function, and the similarity between the existing driving situation and the known driving situation remaining in the database before is compared by the kernel function. Considering the driving situation in multiple phase states of the intersection, the phase state that is not executed between the long turns and the important phase state are preferentially selected. The phase state can be executed so that all the vehicles in the waiting state are waiting for the red light and the green light. The difference between the sum and the maximum; the important phase state is the phase state of the main road or the bus lane, and the weight value of the corresponding lane can be set.

The initial value is implemented.

[0018] In the above technical solution, the network module is an Ethernet cable module or a wireless data transmission network module.

Advantageous effects of the invention

Beneficial effect [0019] Due to the above technical solutions, the present invention has the following advantages over the prior art:

[0020] 1. The invention obtains the traffic flow condition of the different phase signal lights in the video by acquiring the video information recorded by the monitoring device, and the server adjusts the change of the signal light according to the road traffic condition.

To maximize traffic flow at intersections and reduce congestion;

[0021] 2. The server collects the actual video data, and calculates the waiting time of the vehicle based on the vehicle state transition.

Use the kernel algorithm of reinforcement learning to select the phase state, find a phase state that allows all vehicles to wait for the shortest phase, and adjust the changes of the signal light to meet the rapid changes of road traffic conditions;

[0022] 3. In the present invention, the primary and secondary characteristics of various lanes and the particularity of the traveling vehicle are considered, and the weight value is set.

The initial value, that is, the different weight values for each lane, is prioritized for the passage of the lanes, such as the main road or bus lane, to optimize the entire road traffic control system.

Brief description of the drawing

DRAWINGS

1 is a schematic view showing the arrangement of lanes and parking spaces in phase state 1 of the embodiment of the present invention;

2 is a schematic view of a phase state 1-4 of an embodiment of the present invention;

3 is a schematic view of a phase state 5-8 of an embodiment of the present invention;

4 is a topological view of a network structure of a certain traffic area in an embodiment of the present invention;

5 is a network diagram of a certain intersection of an embodiment of the present invention.

Embodiments of the invention

[0008] The present invention is further described below in conjunction with the accompanying drawings and embodiments:

[0029] Embodiment 1: Referring to FIG. 1 to FIG. 5, a method for coordinated control of road traffic signal lights based on reinforcement learning, comprising monitoring devices corresponding to each intersection, each of the monitoring devices passing through an Ethernet wired network module (or wireless network module) Connect to a remote server, the control method is:

[0030] (1) The remote server receives the video signal sent by the monitoring device, and calculates the waiting time s of the vehicle on each road corresponding to the intersection, which is the parking time of the vehicle in the case of red light and green light;

[0031] (2) Taking a combination of each red-green light corresponding to the lane passing mode at the intersection as a phase state , the remote server is in each phase state

[0032] (3) according to the current phase state

Feasibility

, when the traffic flow can be expressed as fluent through 吋, feasibility

Is 1, otherwise congestion, feasibility:

0; as shown in Figure 1, the exit lane is the lane 1, 2, 5, 6, 9, 10, 13, 14, and when they are all clear, the feasibility of phase state 1 is 1. .

[0033] 4) the remote server obtains the waiting time S obtained in step (1) and the feasibility obtained in step (3)

, analyzing and judging the phase states of the intersection

[0034] (5) according to the optimal driving phase state Adjust the red light green light combination of the intersection to get the maximum traffic flow.

[0035] As shown in FIG. 2-3, there are eight kinds of phase state diagrams of a four-lane intersection, a dotted arrow indicates a passable direction, that is, a green light lane, and a solid arrow indicates an impassable direction, that is, a red light. State lane

[0036] The control steps are as follows:

[0037] (1) Initialize the Q value lookup table of all the intersection servers in the road traffic network, and store the Q table fresh: wrap 3⁄4

Value, where

Refers to the vehicle position as shown in Figure 1, and

- 3:+: 5 * ί ; 3⁄4 Guess: 5—Έ

, / refers to the lane as shown in Figure 1. The initial value of the Q table is set to 0. Initialization discount factor

Τ

Learning rate

. Initialize the phase weights of all servers

, randomly initialize each server start action,

And execute. The initial value of the simulation step t is 0. [0038] (2) Each intersection server passes the formula

Calculate all vehicle status

With the Q table

The k value is stored in the K table. among them

Whether or not similar refers to whether the two lanes are similar, for example, lane 3 is similar to lane 11 in FIG.

versus

Whether the lane is rotationally symmetrical, magic

Indicates that the condition in parentheses satisfies the result as 1, otherwise it is 0;

S representation and status

Approximate related state collection.

(3)

, each intersection server observes its entrance lane and updates according to the observation data of the connected intersection

Value, if the exit lane is congested

otherwise,

. According to the formula

囊台囊

Update weight, when

ί

Is an integer multiple of 500, according to the formula

Update the value of the learning rate, where % is the remainder operator.

(4) Each server in the system passes the observed state transition of the vehicle alone, the Q table and the K table,

It is necessary to update the Q value of the state in which the Q value table and the reality coexist and the TO of the phase decomposition to the action of the specific road traffic light. Which

Otherwise

(5) Each server in the system according to the values of the Q table and the table, according to the formula

Select the action with the highest return value

Si

, among them

. Two parameter weights through phase correlation 3⁄4

And congestion parameters

:

Select the phase executions that are not executed and those that are not congested, and the parameters are: '

The server makes decisions and considers congestion at other intersections, achieving collaboration between servers to share road traffic conditions. The phase selection will give priority to the car priority of the car body.

*i

_S represents the vehicle body length, i.e., the bus priority. Yue -: Sl^* ^έ3⁄4ΐ: It means that waiting for the vehicle s to be red light on the road traffic light, and the difference between the road traffic light and the green light income. Taking the phase action is the sum of the difference in the return of all the cars in the waiting state, and indicating that the phase can make the average waiting time of the vehicle the shortest, which is consistent with our ultimate goal, to maximize the traffic flow at the intersection and reduce the congestion.

(6) Each server of the system selects the phase according to the execution.

m■= ^

, adjust the road traffic lights. Turn (3).

Claims

Claim

[Claim 1] A method for coordinated control of road traffic signals based on reinforcement learning, comprising monitoring devices corresponding to each intersection, each of the monitoring devices being connected to a remote server via a network module, and the control method thereof is:

(1) The remote server receives the video signal sent by the monitoring device, and calculates the waiting time _s of the vehicle on each road of the corresponding intersection, which is the parking time of the vehicle in the case of red light and green light;

(2) Take the combination of each red-green light corresponding to the lane traffic mode at the intersection as a phase state

, the remote server is in each phase state

(3) according to the current phase state

The next feasible 3⁄4:

When the traffic flow can be expressed as fluent through 吋, the feasibility

Is 1, otherwise it is congested, feasible

(4) The remote server obtains the waiting time S obtained in step (1) and the available time in step (3). Measure

, analyzing and judging the phase states of the intersection

Under the driving situation, through the recording and updating of the driving data of the daytime, the optimal driving phase state at the intersection is calculated by the program software.

(5) According to the optimal driving phase state

Adjust the red light green light combination of the intersection to get the maximum traffic flow.

[Claim 2] The roadway traffic signal coordinated control method based on reinforcement learning according to claim 1

, characterized by: the phase state

For the road traffic signal, the red light green light combination state of the lanes of each lane, corresponding to the green light lane, the vehicle is allowed to go straight through the intersection to reach the opposite lane, and the right turn lane is also allowed to pass, only when going straight and turning right Feasibility in the step (3) in a passable state

Is 1, otherwise considered as congestion, feasibility:

0; On the lane corresponding to the red light, the vehicle is parked.

[Claim 3] The roadway traffic signal coordinated control method based on reinforcement learning according to claim 1

The characteristic is that: the waiting time includes a parking space in a red light state of the vehicle on the lane, and a parking space in a green light state.

[Claim 4] The roadway traffic signal coordinated control method based on reinforcement learning according to claim 1 , characterized by: setting the weight value of the corresponding lane according to the traffic volume of the main, secondary or bus lanes

[Claim 5] The roadway traffic signal coordinated control method based on reinforcement learning according to claim 1

, characterized in that: in the step (4), the "program software analysis and calculation" is a kernel function, and the similarity between the existing driving situation and the known driving situation remaining in the database before is compared by the kernel function, and the intersection is comprehensively considered. In the case of driving in multiple phase states, the phase state that is not executed between the long turns and the important phase state are preferentially selected, and the phase state is executed so that all the vehicles in the waiting state are in the red light and the green light "waiting for the day". And the maximum phase state of the main phase or the transit lane, which can be set by the weight value of the corresponding lane

The initial value is implemented.

[Claim 6] The roadway traffic signal coordinated control method based on reinforcement learning according to claim 1

The network module is an Ethernet cable module or a wireless data transmission network module.