Purpose: The purpose of this research is to develop an architecture based on the Convolutional Neural Network (CNN) algorithm to detect facial expressions during video conferences. The goal is to address the problem of understanding participants' emotions and expressions during online video conferencing sessions. The aim is to create a system that can analyze facial expressions in images and determine the corresponding emotions.Methods/Study design/approach: Data was collected by capturing facial expression images from 10 students using a webcam. Preprocessing techniques, such as cropping, converting images to grayscale, and data augmentation, were applied to ensure data variation. The CNN model was trained using the processed data and evaluated using test data (a subset of the dataset), new data (external data) and video conference recording. Result/Findings: The CNN model achieved a high training accuracy of 97.5% using an image size of 128x128 and 2000 epochs. The model architecture consists of 2 Conv2D layers, 3 BatchNormalization layers, 2 MaxPooling layers, 2 dropout layers, 1 flat layer, 1 dense layer, and 1 output layer. When tested on facial expression data, the model achieved with 97,5% accuracy on the training data and 93,33% accuracy on the test data. The model was also able to detect the facial expressions of participants in the video conference. Novelty/Originality/Value: The novelty of this research lies in developing a CNN-based system to detect facial expressions in video conferences by analyzing facial images. This approach addresses the challenge of understanding participants' emotions and expressions during online video conferencing sessions, which can contribute to better communication and interaction among participants.