MIND T3_player MODULE 1

Thi is the set of tools and knowledge, based in artificial intelligence (AI) that Giants in the digital world, supported by web facilities, are using and developing in applications that produce big enterprises, money or power, examples: Google, Facebook, Microsoft, Governments, Social hackers, etc.

“A mathematical agent is a metaphorical idealized actor, that is, an idealized actor in the source domain of a metaphor characterizing some aspect of mathematics”

The good part is that these model agents derives into computer programs that learn to execute by themselves highly complex task in the real world, such as driving a car or playing a brilliant GO game.

A computer program that inhabits some dynamic environment, senses and acts autonomously in this environment, and by using its own learning and memory resources, learn to realize a set of goals or tasks for which it was designed.

T3_player is a set of computer programs modules that finally assemble into a tic tac toe virtual environment, where a clearly defined software agent, constructed with artificial neurons, learn by its own effort to play brilliantly, with human-level ingenuity and creativity. During its learning phase, the agent observes the mathematical principles endorsed by the Bellman Equation. During its operating phase, the agent behaves as an intelligent Markov process.

MIND T3-Player is a computer program capable of exploring and learn brilliant solutions by itself in a tic tac toe environment. This agent oriented software solution easily reaches expert human playing capacity through a combination of:

  1. Artificial Neural Nets.
  2. Gradient Descent.
  3. Reinforcement Learning.

An applied mathematical concept that guarantees maximal obtained value in the control of a sequence of events that occur in a complex environment, with an underlying logic, with rewards scattered in the space-time. In this sense, a Bellman agent must always look into the future during its learning journey. This universal principle is currently applied to relevant applications like a self-driving car, robotics, business management, education, computer system, engineering, animation etc.

They are Complex networks that can learn intricate behaviors through examples or by themselves. 

Learning is guaranteed by proved learning algorithms that in turn produce efficient computer models.

If we know the derivative of a transfer function of an artificial neural network then Gradient Descend makes possible to massively and orderly change connection weights and reduce the error (loss) of the underlying massive global network, providing in its way a most valuable processing ability:  inference capacity or the capacity to give a good answer (outputs) to questions (inputs) never seen during the learning period.

“You make an inference when you use clues from the story to figure out something that the author doesn’t directly tell you”

Reinforcement learning (RL) is essentially the computer numerical solution of the Bellman equation which deals with the optimal control of differential-difference (time-lag) processes. If we write our computer program following its principles, we obtain a solid approximation of the optimal control of any given process. 

Instruction to easily install Borland C++ are available at http://www2.hawaii.edu/~sdunan/study_guides/bcc.html

You can download the installer here 

 

 

 

Below is the main code of Module 1. The external libraries and all the code can be downloaded here:

Download

After compiling «T3_Player_module_1.cpp» using Borland C ++, an .exe file is generated.
The program already loads the training weights.
The «c» key is to load the already trained weights.
The «s» key is to save the weights of the new training.

 

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
/*********************** MIND Research Group**********************************       
T3-Player  
Module_1.  Dynamic Tic-Tac-Toe playing environment
By Oscar Chang, Luis Zhinin, Rafael Valencia 
June 2020 

There exist a trained neural network watching the board
The network detects by itself when a winning positon appears
The network represents a a self-taught agent, that learned to play by itself with a look ahead capacity
In the next modules the agent construction will be explained in detail

 /--------------------------------------------------------------------------*/
// GLOBALS
#define board_size     9   //   board size, 9 tiles  
long int cycles=0;    
int b_flag;
int stop;
char dummy[1];
float f_k_heat;
int f_winner;
int f_cycle_count;
float f_k_increment;
float f_threshold=0.7;  //  Umbral de i_agent
int f_winner_count[board_size];
int f_fractal_count[board_size];
int flip[board_size];  
float femp; 
char board[board_size];
char board_temp[board_size];
int  moves_counter;
int end_game_flag;
int master_brake;
float sparse_input[27];    
int win_O_bell;
int win_X_bell;
int games_played;
int games_won_O;
int games_won_X;
float auto_seal_weight=-0.05;  //  racing neurons can win only once in each episode
int image_in_use;
int play_order[9];
int play_ok_flag_O;
int learn_input_pointer;
int image_pointer;
int p_flag;
int player_selector;
int bars_scale =50; int x_print, y_print=10;
int net_winner=0;
int winner_y=0;
int winner_net;
float story[9][27];     
int story_pointer;
int backpro_count;
float winner_record_O;
float winner_record_X;
char key, color;
int game_open_case;
int zone_pointer;
int max_moves=8;
int target_timer;
float peak_value;
int net_advice;
int target_pointer;
//----------------------------------------)))
//    end globals
//----------------------------------------)))
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include "neural_lib_mmt.h"
#include "f_neural_lib_mmt.h"
#include "plot_net_items.h"
#include "print_console.h"
#include "handle_net_items.h"
#include "check_game_win.h"


#include "load_wights.h"
#include "O_learns.h"

#include "f_fire_flop.h"
#include "f_print_items.h"

/******************* variables globales *********************/
using namespace std;
//usging namespace std;

//-------------------------------------
//    starts graphic mode            
//-------------------------------------    
void grafico(void)
{
    int    GraphDriver;  /* The Graphics device driver */
    int    GraphMode;    /* The Graphics mode value */
    int    ErrorCode;    /* Reports any graphics errors */
   
    initgraph(&GraphDriver, &GraphMode, "");

    ErrorCode = graphresult();            /* Read result of initialization*/
    if( ErrorCode != grOk )              /* Error occured during init    */
    {
       printf(" Graphics System Error: %s", grapherrormsg( ErrorCode ) );
       exit( 1 );
    }
}
//----&&---------------------------------------------
void clean_board(void)
{ 
  int i;
  for (i=0; i<9;i++)   board[i]='-';     

}
//-------------------------------------  
void erase_screen(void)
{
   setcolor(LIGHTGRAY);  
   setfillstyle(SOLID_FILL,LIGHTGRAY);     
   bar(150,100,350,300) ;       
   floodfill(255,205,BLACK);    
}
//------------------------------------                                                 
void clean_game_story(void)
{ 
 //   story[play9][nuron]
  int i,j;
            for(j=0;j<max_moves;j++)
               { 
                for(i=0;i<N_IN;i++)
                 {
                  story[j][i]=0; 
                 }
               }
}
//--------------------------------------------------------
void init_game(void)            //   ---------------->         START GAME
{
 int i; 
 win_O_bell=0; 
 win_X_bell=0;   
 clean_board();
 clean_game_story();   
 player_selector=0;       
 moves_counter=0;  
 fill_inputs();   
    
 fill_game_story();
 image_in_use=0;
 for(i=0;i<9;i++)play_order[i]=10; 
 print_graph_parameters();
}
//-----------------------------------------------------------
void end_game(void)             //       ---------------->       END GAME
{
 games_played++;   
 init_game();    
}
//-----------------------------------------------------------

void fill_board_noise(void)                                    
{
   int i,j;
   for (i=0; i<9;i++) 
   {
    j=random(4); 
    if(j==0) board[i]='-';
    if(j==1) board[i]='-';
    if(j==2) board[i]='O';       //  O
    if(j==3) board[i]='X';       // X
   }  
   for(i=0;i<N_OUT;i++) Target[i]=0.1;   
} 
//-----------------------------------------------------------
void explore_the_future(void)            
{
 int i,j;
    
  for(i=0;i<9;i++)                                   //    explore all the tiles
  {
    if(board[i]=='-')                                //    looking for empty tiles
         {
          board[i]='O';                              //  an O is placed in found empty tile to explore
          plot_game_graphics(); 
          check_game_winner();                       //  check for winning with the placed O 
          if(win_O_bell)                             //  if winning, board state has to be memorized
          {
              board[i]='-';                          //  the placed O is removed
              target_pointer=i;                      // target_pointer point s toward winning tile 
                                                     //   memorizing (learing) process begin    
                for (j=0;j<N_OUT;j++)
                Target[j]=0.1;
                Target[i]=1.0;
                print_graph_parameters();
                plot_targets();
                delay(10);
                fill_inputs();
                feed_forward();
                flash(target_pointer);
//-------------------                                                      //noise balance begins
                for (j=0;j<9;j++)
                board_temp[j]=board[j];
                fill_board_noise();

                fill_inputs();
                feed_forward();   
                for (j=0;j<9;j++)
                board[j]=board_temp[j];                
                plot_targets();      
                
             }     
          delay(500);    
          board[i]='-';
          plot_game_graphics();
          
        }    
  }    
}
//-----------------------------------------------------------
void train(void)  //    Training phase                     
{
   while(1)
    {  
     player_selector=player_selector^1;   // players are alternated      
             
     if(player_selector)  {
                            O_plays();
                          }  
                    
     if(!player_selector) {
                            X_plays();
                          }                                   
     check_game_winner();  
     plot_game_graphics();
     search_winner_neuron();                        
     delay(500);          // delay of 500 mili seeconds   
     moves_counter++;      
     
     if (moves_counter>max_moves) end_game();       


     if(win_O_bell) end_game();
     if(win_X_bell) end_game();
         
     if(stop)break;
     if(kbhit()) break;    
    }
    
}    
//------------------------------------------------------
//===================================================================================================
void main(void)
{ 
 int i;  
 
    for(i=0;i<N_OUT;i++) Target[i]=0.0;
    clean_board();    
    clrscr();                   // clean screen
    grafico();                  // set graphic mode   
    cleardevice();              // clean windows screen
    //inicializar_pesos();
    setcolor(LIGHTGRAY);
    bar(0,0,1400,900);       //    clean working area
    //srand(10);
    init_game(); 
    plot_game_graphics();   
    plot_board_map();    
    CargarPesos();
    randomize();
    games_played=1;
    stop=0;
    b_flag=0;
    //q_flag=0;
    train();
    
    do {
       //play_flag=0; 
       key=getch();
       switch (key) {
           case '1':         {
  
                             }
           break;                    
   
           case 'b': case 'B':  {
                                   b_flag=b_flag^1;
                                   train();
                                }
           break;  
                                
                         
            case ' ':             {                    
                                   stop=stop^1;
                                   train();
                                  }                                     
            break; 
            case 'R': case 'r':  {
                                   init_weights();   
                                   clean_hidden_weights(); 
                                   backpro_count=0; 
                                   train();
                                 }
            break;                     
                                
                                
            case 's': case 'S': {
                                 //Save weigths
                                 train(); 
                                }
	       break;                                                  
           case 'c': case 'C': {//Upload weigths
                                CargarPesos();
                                train(); 
                               }
	       break;    
          
           case 'p': case 'P': {
                                stop=1;  
                                fill_inputs(); 
                                feed_forward();
                                search_winner_neuron();
                                macro_print();         
                                if(win_O_bell) end_game();;
                                train();                              
                               }
	       break;    
           case 'q': case 'Q': {
                                                         
                               }
	       break;  
                 
                               
         }
   } while ((key!='x')&&(key!='X'));

   closegraph();
   clrscr();
}

//---------------------------------------