SIMPLEX-CONSTRAINED SEMANTIC EMBEDDING MODELS FOR INFORMATION RETRIEVAL ON THE BEIR/SCIFACT BENCHMARK
Keywords:
Information retrieval, Sentence-BERT, simplex embeddings, semantic search, BEIR benchmark, BM25, neural retrieval, manifold learning, probabilistic geometry, semantic embeddings.Abstract
This paper investigates geometric simplex-constrained representations for dense semantic information retrieval using neural sentence embeddings. The study introduces a mathematical framework for transforming Sentence-BERT embedding vectors into simplex-normalized semantic spaces and truncated simplex search regions. The proposed approach combines probabilistic geometric projection with semantic embedding retrieval in order to analyze how constrained embedding geometry influences retrieval quality. The research employs the BEIR/SciFact benchmark dataset and compares four retrieval approaches: BM25 lexical retrieval, standard Sentence-BERT semantic retrieval, simplex-normalized embeddings, and truncated simplex-constrained retrieval. Experimental evaluation is performed using Precision@K, Recall@K, Mean Average Precision (MAP), statistical significance testing, and t-SNE visualization. Experimental results demonstrate that standard Sentence-BERT achieves the highest retrieval effectiveness with MAP = 0.6757, while simplex-constrained retrieval methods achieve MAP = 0.5581. Statistical significance analysis confirms that the performance degradation introduced by simplex projection is statistically significant with p = 0.000193 < 0.05. The results show that simplex geometry preserves semantic neighborhood structures but introduces measurable distortion affecting retrieval accuracy. The proposed mathematical framework provides a theoretical basis for studying constrained semantic embedding spaces and opens new directions for probabilistic geometric retrieval systems, neural semantic indexing, and constrained manifold learning in information retrieval.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.











