A major challenge in gene regulatory networks (GRN) of biological systems is to discover when and what in-terventions should be applied to shift them to healthy phenotypes. A set of gene activity profiles, called basin ofattraction (BOA), takes this network to a specific phenotype; therefore, a healthy BOA leads the GRN to a healthyphenotype. However, without the complete observability of the genes, it is not possible to identify whether thecurrent BOA is healthy. In this article we investigate external interventions in GRN with partial observabilityaiming to bring it to healthy BOAs. We propose a new batch reinforcement learning method (BRL), called mSFQI,to define intervention strategies based on the probabilities of the gene activity profiles being in healthy BOAs,which are calculated from a set of previous observed experiences. BRL uses approximation functions and re-peated applications of previous experiences to accelerate learning. Results demonstrate that our proposal canquickly shift a partially observable GRN to healthy BOAs, while reducing the number of interventions. In ad-dition, when observability is poor, mSFQI produces better results when the probabilities for a greater amount ofprevious observations are available.