The increasing complexity of forensic genetic samples has driven the development and implementation of probabilistic genotyping software (PGS) to assist experts in quantifying the weight of evidence. A common challenge lies in quantifying the likelihood that a person of interest (PoI) is a contributor to a DNA mixture, especially in samples with low quantity and quality, stochastic effects (drop-in allele, dropout, and/or heterozygotic peaks imbalance), shared alleles, and stutter peaks. PGS can be based on either qualitative models (allele presence/absence) or quantitative (including allele peak heights) to assess the likelihood ratio, which consists of comparing the probability of observing the evidence given two hypotheses: "H1 = The PoI is a contributor to the evidence" and "H2 = The PoI is not a contributor nor genetically related to any contributor". Key parameters included in these calculations regarding population - allele frequency distributions, and co-ancestry coefficient -, analytical factors - drop-in, dropout, analytical threshold -, and stutter presence, can influence the outcomes depending on how they are set. This collaborative exercise, organized by the Spanish and Portuguese-speaking Working Group of the International Society for Forensic Genetics, aimed to assess the current state of knowledge, use, and implementation of PGS among forensic laboratories. The goal was to evaluate how non-binary informatics tools are applied in practice and to understand the methodologies used to statistically interpret complex DNA mixtures.Participants were asked to analyze 30 pairs of samples composed of DNA mixtures (with varying the number of contributors, mixture ratios, and degradation levels) and corresponding reference profiles, selected from the PROVEDIt database. Laboratories employed different tools and approaches to quantify the evidence's weight, including decisions regarding the number of contributors (NoC), population, laboratory, and case-specific parameters, such as coancestry coefficient, minimum allele frequency, dropout frequency, drop-in (frequency and modeling), analytical threshold, and the modeling of stutter peaks and degradation. Even though all laboratories received the same genotypic and frequency data, methodological differences led to different LR results, particularly for more complex samples with low-template and degraded DNA. The greatest differences were observed in the interplay between analytical thresholds and NoC estimation, with discrepancies amplified when alleles from minor contributors overlapped with expected stutter positions. This exercise highlights the importance of expert training and underscores the need for a comprehensive understanding of the statistical models underlying PGS. Ensuring accurate and consistent interpretation of complex DNA evidence requires not only technical proficiency but also an integrated approach to parameter selection and genotypic data evaluation.