Non-biomedical word replacement algorithm: This algorithm outlines the process for replacing non-biomedical words in a corpus using WordNet.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Comparison of mean interconcept distance for embedding with WordNet synonym replacement (WN) and without (PM). The initial number of unique concepts in the total corpus was 3,018,918. The Table summarizes results for different thresholds () and categories of concept/gene sets (M,B,K,G,P). Columns: : Replacement threshold; replaced: Unique Replaced Concepts; Category: M = MeSH, B = Biocarta, K = KEGG, G = GP(bp), P = PID; # sets: Number of concept/gene sets in the categories; #Concepts: number of concept vectors in the category; WN better : The count and percentage of concept/gene sets for which the mean interconcept distance was smaller for WN than for PM. “Winners” are shown in bold.; PM better : Analogous to “WN better” but for PM.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Comparative analysis of WordNet replacement impact on data distribution.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Comparative analysis of WN and PM methodologies: Figure (a) displays the bar chart comparing WN and PM across five distinct concept sets (Methods), highlighting the number of concept sets where the cluster mean distance is significantly lower, indicative of superior embeddings.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Comparison of window size for embedding with Wordnet synonym replacement (WN) and without (PM). While Table 1 compared the effects of different values of using a window size of 10, this Table shows the results for three different window sizes at a =. Abbreviations are the same as for Table 1.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Illustration of the text transformation process before and after synonym replacement: The process begins with a sample initial text segment (Sample text before synonym replacement’), followed by a word frequency count (‘Counter’).Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Text Transformation Pipeline: An example of the multi-stage text transformation pipeline applied to a sample abstract (PMID: 30609739).Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.
Schematic of the approach: This schematic illustrates the entire workflow of the project.Enock Niyonkuru ; Mauricio Soto Gomez ; Elena Casarighi ; et al.