This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM (ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input layer, convolution layer, max pooling layer, batch normalization layer, bidirectional LSTM layer, attention layer, fully connected layer and softmax layer. In the attention layer, we applied a hierarchical attention networks using word and sentence level attention to observe the level of importance from the content. As comparison, we also experimented with other several classical machine learning and deep learning approaches. Among the classical machine learning, the Linear Regression with unigram achieved the best performance with average accuracy of 0.9647. In addition, our observation with the deep learning models outperformed the traditional machine learning models significantly. Our experiments showed that the ACBiL architecture achieved the best performance among the other deep learning methods with the accuracy of 0.9944.