Scholar Articles
Computer Science
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement ...
Personalized Wireless Federated Learning for Large Language Models
Large Language Models (LLMs) have revolutionized natural language processingtasks. However, their deployment in wireless networks still face challenge...
Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction
Document-level Relation Triplet Extraction (DocRTE) is a fundamental task ininformation systems that aims to simultaneously extract entities with sema...
3D Vision-Language Gaussian Splatting
Recent advancements in 3D reconstruction methods and vision-language modelshave propelled the development of multi-modal 3D scene understanding, which...
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Image editing has advanced significantly with the introduction oftext-conditioned diffusion models. Despite this progress, seamlessly addingobjects to...
Raidar: geneRative AI Detection viA Rewriting
We find that large language models (LLMs) are more likely to modifyhuman-written text than AI-generated text when tasked with rewriting. Thistendency ...
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Large Language Models (LLMs) are rapidly surpassing human knowledge in manydomains. While improving these models traditionally relies on costly humand...
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
We present latentSplat, a method to predict semantic Gaussians in a 3D latentspace that can be splatted and decoded by a light-weight generative 2Darc...
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Generative Large Language Models (LLMs) have made significant strides acrossvarious tasks, but they remain vulnerable to backdoor attacks, where speci...
Do language models plan ahead for future tokens?
Do transformers "think ahead" during inference at a given position? It isknown transformers prepare information in the hidden states of the forward pa...
Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation
Comprehensive clinical documentation is crucial for effective healthcaredelivery, yet it poses a significant burden on healthcare professionals,leadin...
Self-Discover: Large Language Models Self-Compose Reasoning Structures
We introduce SELF-DISCOVER, a general framework for LLMs to self-discover thetask-intrinsic reasoning structures to tackle complex reasoning problems ...
HyperFast: Instant Classification for Tabular Data
Training deep learning models and performing hyperparameter tuning can becomputationally demanding and time-consuming. Meanwhile, traditional machinel...
AgentReview: Exploring Peer Review Dynamics with LLM Agents
Peer review is fundamental to the integrity and advancement of scientificpublication. Traditional methods of peer review analyses often rely onexplora...
OpenDataLab: Empowering General Artificial Intelligence with Open Datasets
The advancement of artificial intelligence (AI) hinges on the quality andaccessibility of data, yet the current fragmentation and variability of datas...
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Evaluating large language models (LLM) in clinical scenarios is crucial toassessing their potential clinical utility. Existing benchmarks rely heavily...
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
We introduce Groma, a Multimodal Large Language Model (MLLM) with groundedand fine-grained visual perception ability. Beyond holistic imageunderstandi...
Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding
This work contributes a novel deep navigation policy that enablescollision-free flight of aerial robots based on a modular approach exploitingdeep col...
Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations
The robustness of recent Large Language Models (LLMs) has become increasinglycrucial as their applicability expands across various domains and real-wo...
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
The math abilities of large language models can represent their abstractreasoning ability. In this paper, we introduce and open-source our mathreasoni...
Understanding Robustness of Visual State Space Models for Image Classification
Visual State Space Model (VMamba) has recently emerged as a promisingarchitecture, exhibiting remarkable performance in various computer visiontasks. ...
Ethical and social risks of harm from Language Models
This paper aims to help structure the risk landscape associated withlarge-scale Language Models (LMs). In order to foster advances in responsibleinnov...
Audio Anti-Spoofing Detection: A Survey
The availability of smart devices leads to an exponential increase inmultimedia content. However, the rapid advancements in deep learning have givenri...
WPO: Enhancing RLHF with Weighted Preference Optimization
Reinforcement learning from human feedback (RLHF) is a promising solution toalign large language models (LLMs) more closely with human values. Off-pol...
T3: Transparent Tracking Triggering for Fine-grained Overlap of Compute Collectives
Large Language Models increasingly rely on distributed techniques for theirtraining and inference. These techniques require communication across devic...
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
While many have shown how Large Language Models (LLMs) can be applied to adiverse set of tasks, the critical issues of data contamination andmemorizat...
Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models
Knowledge graph completion (KGC) is a widely used method to tackleincompleteness in knowledge graphs (KGs) by making predictions for missinglinks. Des...
Understanding deep learning requires rethinking generalization
Despite their massive size, successful deep artificial neural networks canexhibit a remarkably small difference between training and test performance....
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
KV cache stores key and value states from previous tokens to avoidre-computation, yet it demands substantial storage space, especially for longsequenc...
CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs
Timely, personalized feedback is essential for students learning programming.LLM-powered tools like ChatGPT offer instant support, but reveal direct a...
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
In this report, we present RT-DETRv2, an improved Real-Time DEtectionTRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-artreal-ti...
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Neural Radiance Field (NeRF) has achieved superior performance for novel viewsynthesis by modeling the scene with a Multi-Layer Perception (MLP) and a...
Gemini: A Family of Highly Capable Multimodal Models
This report introduces a new family of multimodal models, Gemini, thatexhibit remarkable capabilities across image, audio, video, and textunderstandin...
Large Language Model with Graph Convolution for Recommendation
In recent years, efforts have been made to use text information for betteruser profiling and item characterization in recommendations. However, textin...
WildGaussians: 3D Gaussian Splatting in the Wild
While the field of 3D scene reconstruction is dominated by NeRFs due to theirphotorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged...
Language Models for Code Completion: A Practical Evaluation
Transformer-based language models for automatic code completion have showngreat promise so far, yet the evaluation of these models rarely uses real da...
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Self-supervised learning (SSL) has been successful in building patchembeddings of small histology images (e.g., 224x224 pixels), but scaling thesemode...
AI and personalized learning: bridging the gap with modern educational goals
Personalized learning (PL) aspires to provide an alternative to theone-size-fits-all approach in education. Technology-based PL solutions haveshown no...
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Neural machine translation is a relatively new approach to statisticalmachine translation based purely on neural networks. The neural machinetranslati...
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections
Graph transformer has been proven as an effective graph learning method forits adoption of attention mechanism that is capable of capturing expressive...
How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation
Conversational Recommender System (CRS) interacts with users through naturallanguage to understand their preferences and provide personalizedrecommend...
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Prompt ensembling of Large Language Model (LLM) generated category-specificprompts has emerged as an effective method to enhance zero-shot recognition...
Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages
The development of Large Language Models (LLMs) relies on extensive textcorpora, which are often unevenly distributed across languages. This imbalance...
Visibility into AI Agents
Increased delegation of commercial, scientific, governmental, and personalactivities to AI agents – systems capable of pursuing complex goals withlimi...
Compression Represents Intelligence Linearly
There is a belief that learning to compress well will lead to intelligence.Recently, language modeling has been shown to be equivalent to compression,...
Dual Operating Modes of In-Context Learning
In-context learning (ICL) exhibits dual operating modes: task learning, i.e.,acquiring a new skill from in-context samples, and task retrieval, i.e.,l...
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Imitation learning provides an efficient way to teach robots dexterousskills; however, learning complex skills robustly and generalizablely usuallycon...
OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation
Open-sourced, user-friendly tools form the bedrock of scientific advancementacross disciplines. The widespread adoption of data-driven learning has le...
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Large language models (LLMs) achieve remarkable performance in naturallanguage understanding but require substantial computation and memoryresources. ...
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Direct preference optimization (DPO) has shown to be an effective method forlarge language model (LLM) alignment. Recent works have attempted to apply...
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Recent advancements in foundation models (FMs) have unlocked new prospects inautonomous driving, yet the experimental settings of these studies arepre...
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Zero-shot Video Object Segmentation (ZSVOS) aims at segmenting the primarymoving object without any human annotations. Mainstream solutions mainly foc...
Uncertainty Quantification on Clinical Trial Outcome Prediction
The importance of uncertainty quantification is increasingly recognized inthe diverse field of machine learning. Accurately assessing model prediction...
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Representing speech and audio signals in discrete units has become acompelling alternative to traditional high-dimensional feature vectors.Numerous st...
A Survey On Text-to-3D Contents Generation In The Wild
3D content creation plays a vital role in various applications, such asgaming, robotics simulation, and virtual reality. However, the process islabor-...
MileBench: Benchmarking MLLMs in Long Context
Despite the advancements and impressive performance of Multimodal LargeLanguage Models (MLLMs) on benchmarks, their effectiveness in real-world,long-c...
Do Membership Inference Attacks Work on Large Language Models?
Membership inference attacks (MIAs) attempt to predict whether a particulardatapoint is a member of a target model's training data. Despite extensiver...
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Remarkable progress on English instruction tuning has facilitated theefficacy and reliability of large language models (LLMs). However, thereremains a...
Generative Pretrained Hierarchical Transformer for Time Series Forecasting
Recent efforts have been dedicated to enhancing time series forecastingaccuracy by introducing advanced network architectures and self-supervisedpretr...
How to use and interpret activation patching
Activation patching is a popular mechanistic interpretability technique, buthas many subtleties regarding how it is applied and how one may interpret ...
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering
Medical open-domain question answering demands substantial access tospecialized knowledge. Recent efforts have sought to decouple knowledge frommodel ...
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Multimodal sentiment analysis (MSA) aims to understand human sentimentthrough multimodal data. Most MSA efforts are based on the assumption ofmodality...
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Although social media platforms are a prominent arena for users to engage ininterpersonal discussions and express opinions, the facade and anonymityof...
Transformers, parallel computation, and logarithmic depth
We show that a constant number of self-attention layers can efficientlysimulate, and be simulated by, a constant number of communication rounds ofMass...
Datasheet for the Pile
This datasheet describes the Pile, a 825 GiB dataset of human-authored textcompiled by EleutherAI for use in large-scale language modeling. The Pile i...
Benchmarking Vision Language Models for Cultural Understanding
Foundation models and vision-language pre-training have notably advancedVision Language Models (VLMs), enabling multimodal processing of visual andlin...
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
We introduce the Universal Speech Model (USM), a single large model thatperforms automatic speech recognition (ASR) across 100+ languages. This isachi...
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Recent progress in pre-trained diffusion models and 3D generation havespurred interest in 4D content creation. However, achieving high-fidelity 4Dgene...
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler
Accurate monocular metric depth estimation (MMDE) is crucial to solvingdownstream tasks in 3D perception and modeling. However, the remarkableaccuracy...
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions
The recent progression of Large Language Models (LLMs) has witnessed greatsuccess in the fields of data-centric applications. LLMs trained on massivet...
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
The development of multimodal models has significantly advanced multimodalsentiment analysis and emotion recognition. However, in real-worldapplicatio...
KAN 2.0: Kolmogorov-Arnold Networks Meet Science
A major challenge of AI + Science lies in their inherent incompatibility:today's AI is primarily based on connectionism, while science depends onsymbo...
A Survey on Hardware Accelerators for Large Language Models
Large Language Models (LLMs) have emerged as powerful tools for naturallanguage processing tasks, revolutionizing the field with their ability tounder...
A Comprehensive Survey on Kolmogorov Arnold Networks (KAN)
Through this comprehensive survey of Kolmogorov-Arnold Networks(KAN), we havegained a thorough understanding of its theoretical foundation, architectu...
Flow Matching Imitation Learning for Multi-Support Manipulation
Humanoid robots could benefit from using their upper bodies for supportcontacts, enhancing their workspace, stability, and ability to performcontact-r...
CRAG – Comprehensive RAG Benchmark
Retrieval-Augmented Generation (RAG) has recently emerged as a promisingsolution to alleviate Large Language Model (LLM)'s deficiency in lack ofknowle...
Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation
Large language models like GPT-3.5-turbo and GPT-4 hold promise forhealthcare professionals, but they may inadvertently inherit biases duringtheir tra...
The Unreasonable Effectiveness of Eccentric Automatic Prompts
Large Language Models (LLMs) have demonstrated remarkable problem-solving andbasic mathematics abilities. However, their efficacy is highly contingent...
InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
3D Gaussians have recently emerged as an efficient representation for novelview synthesis. This work studies its editability with a particular focus o...
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
The remarkable success of Large Language Models (LLMs) has ushered naturallanguage processing (NLP) research into a new era. Despite their diversecapa...
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Human feedback is central to the alignment of Large Language Models (LLMs).However, open questions remain about methods (how), domains (where), people...
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
Large Language Models (LLMs) trained on large volumes of data excel atvarious natural language tasks, but they cannot handle tasks requiringknowledge ...
Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
This research explores the integration of language embeddings for activelearning in autonomous driving datasets, with a focus on novelty detection.Nov...
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
CLIP, as a vision-language model, has significantly advanced Open-VocabularySemantic Segmentation (OVSS) with its zero-shot capabilities. Despite itss...
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
Mathematical reasoning is an important capability of large languagemodels (LLMs) for real-world applications. To enhance this capability, existingwork...
Probing the Creativity of Large Language Models: Can models produce divergent semantic association?
Large language models possess remarkable capacity for processing language,but it remains unclear whether these models can further generate creativecon...
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Recent advances in Text-to-Video generation (T2V) have achieved remarkablesuccess in synthesizing high-quality general videos from textual description...
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
In recent years, instruction-tuned Large Multimodal Models (LMMs) have beensuccessful at several tasks, including image captioning and visual question...
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Vehicle motion planning is an essential component of autonomous drivingtechnology. Current rule-based vehicle motion planning methods performsatisfact...
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
Large-scale pretraining followed by task-specific finetuning has achievedgreat success in various NLP tasks. Since finetuning all parameters of largep...
Spiral of Silence: How is Large Language Model Killing Information Retrieval? – A Case Study on Open Domain Question Answering
The practice of Retrieval-Augmented Generation (RAG), which integrates LargeLanguage Models (LLMs) with retrieval systems, has become increasinglyprev...
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data
Large Language Models (LLMs) have shown impressive abilities in dataannotation, opening the way for new approaches to solve classic NLP problems.In th...
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve themodel's performance on unseen domains. Existing methods either rely on do...
A Tale of Tails: Model Collapse as a Change of Scaling Laws
As AI model size grows, neural scaling laws have become a crucial tool topredict the improvements of large models when increasing capacity and the siz...
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Vision Language Models (VLMs) demonstrate remarkable proficiency inaddressing a wide array of visual questions, which requires strong perceptionand re...
Enhancing Large Language Models for Text-to-Testcase Generation
Context: Test-driven development (TDD) is a widely employed softwaredevelopment practice that involves developing test cases based on requirementsprio...
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Instruction-tuned Large Language Models (LLMs) have recently showcasedremarkable advancements in their ability to generate fitting responses tonatural...
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
This study investigates the loss of generalization ability in neuralnetworks, revisiting warm-starting experiments from Ash Adams. Our empiricalan...
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
We introduce Generalized Instruction Tuning (called GLAN), a general andscalable method for instruction tuning of Large Language Models (LLMs). Unlike...
Diffusion Models, Image Super-Resolution And Everything: A Survey
Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) fieldand further closed the gap between image quality and human perceptualprefer...
Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
Large language models (LLMs) are increasingly being adopted in a wide rangeof real-world applications. Despite their impressive performance, recentstu...
Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors
Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks forunseen test data. The primary challenge of TTA is limited access to the enti...
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
The emergence of Generative Artificial Intelligence (AI) and Large LanguageModels (LLMs) has marked a new era of Natural Language Processing (NLP),int...
Thinking Tokens for Language Modeling
How much is 56 times 37? Language models often make mistakes in these typesof difficult calculations. This is usually explained by their inability top...
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
Multimodal image fusion aims to integrate information from different imagingtechniques to produce a comprehensive, detail-rich single image for downst...
Low-Rank Few-Shot Adaptation of Vision-Language Models
Recent progress in the few-shot adaptation of Vision-Language Models (VLMs)has further pushed their generalization capabilities, at the expense of jus...
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Garment manipulation (e.g., unfolding, folding and hanging clothes) isessential for future robots to accomplish home-assistant tasks, while highlychal...
Croissant: A Metadata Format for ML-Ready Datasets
Data is a critical resource for machine learning (ML), yet working with dataremains a key friction point. This paper introduces Croissant, a metadataf...
Curriculum reinforcement learning for quantum architecture search under hardware errors
The key challenge in the noisy intermediate-scale quantum era is findinguseful circuits compatible with current device limitations. Variational quantu...
Applications of Deep Neural Networks with Keras
Deep learning is a group of exciting new technologies for neural networks.Through a combination of advanced training techniques and neural networkarch...
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
With the significant development of large models in recent years, LargeVision-Language Models (LVLMs) have demonstrated remarkable capabilities across...
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitativecapabilities with increasing scale. Despite their potentially transformat...
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
The advent of Large Language Models (LLMs) has brought an unprecedented surgein machine-generated text (MGT) across diverse channels. This raises legi...
Versatile Behavior Diffusion for Generalized Traffic Agent Simulation
Existing traffic simulation models often fail to capture the complexities ofreal-world scenarios, limiting the effective evaluation of autonomous driv...
COCONut: Modernizing COCO Segmentation
In recent decades, the vision community has witnessed remarkable progress invisual recognition, partially owing to advancements in dataset benchmarks....
Exploring the Potential of Large Language Models in Self-adaptive Systems
Large Language Models (LLMs), with their abilities in knowledge acquisitionand reasoning, can potentially enhance the various aspects of Self-adaptive...
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
In singing voice synthesis (SVS), generating singing voices from musicalscores faces challenges due to limited data availability. This study proposes ...
Decentralized Multi-Robot Navigation for Autonomous Surface Vehicles with Distributional Reinforcement Learning
Collision avoidance algorithms for Autonomous Surface Vehicles (ASV) thatfollow the Convention on the International Regulations for PreventingCollisio...
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning
This research pioneers the use of fine-tuned Large Language Models (LLMs) toautomate Systematic Literature Reviews (SLRs), presenting a significant an...
Dynamic Prompt Optimizing for Text-to-Image Generation
Text-to-image generative models, specifically those based on diffusion modelslike Imagen and Stable Diffusion, have made substantial advancements. Rec...
What If We Recaption Billions of Web Images with LLaMA-3?
Web-crawled image-text pairs are inherently noisy. Prior studies demonstratethat semantically aligning and enriching textual descriptions of these pai...
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Large, high-capacity models trained on diverse datasets have shown remarkablesuccesses on efficiently tackling downstream applications. In domains fro...
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
We introduce Bonito, an open-source model for conditional task generationthat converts unannotated text into task-specific training datasets forinstru...
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Generalist web agents have demonstrated remarkable potential in autonomouslycompleting a wide range of tasks on real websites, significantly boosting ...
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
LLMs have become the go-to choice for code generation tasks, with anexponential increase in the training, development, and usage of LLMsspecifically f...
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
We propose TRAM, a two-stage method to reconstruct a human's globaltrajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recoverthe ...
A Survey on Kolmogorov-Arnold Network
This systematic review explores the theoretical foundations, evolution,applications, and future potential of Kolmogorov-Arnold Networks (KAN), aneural...
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Instruction-tuned Large Language Models (LLMs) show impressive results innumerous practical applications, but they lack essential safety features that...
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
Imitation learning from human hand motion data presents a promising avenuefor imbuing robots with human-like dexterity in real-world manipulation task...
Kolmogorov-Arnold Networks are Radial Basis Function Networks
This short paper is a fast proof-of-concept that the 3-order B-splines usedin Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussianra...
Spectral Networks and Locally Connected Networks on Graphs
Convolutional Neural Networks are extremely efficient architectures in imageand audio recognition tasks, thanks to their ability to exploit the localt...
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
This paper presents a novel method for exerting fine-grained lighting controlduring text-driven diffusion-based image generation. While existing diffu...
Personalized Language Modeling from Personalized Human Feedback
Personalized large language models (LLMs) are designed to tailor responses toindividual user preferences. While Reinforcement Learning from Human Feed...
Research on Autonomous Robots Navigation based on Reinforcement Learning
Reinforcement learning continuously optimizes decision-making based onreal-time feedback reward signals through continuous interaction with theenviron...
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
The Open-MAGVIT2 project produces an open-source replication of Google'sMAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^18codes)...
Dense Reward for Free in Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) has been credited as thekey advance that has allowed Large Language Models (LLMs) to effectively fol...
Understanding Test-Time Augmentation
Test-Time Augmentation (TTA) is a very powerful heuristic that takesadvantage of data augmentation during testing to produce averaged output.Despite t...
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
This paper introduces a network for volumetric segmentation that learns fromsparsely annotated volumetric images. We outline two attractive use cases ...
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
We introduce MuAViC, a multilingual audio-visual corpus for robust speechrecognition and robust speech-to-text translation providing 1200 hours ofaudi...
Multi-Object Hallucination in Vision-Language Models
Large vision language models (LVLMs) often suffer from object hallucination,producing objects not present in the given images. While current benchmark...
A Novel Paradigm Boosting Translation Capabilities of Large Language Models
This paper presents a study on strategies to enhance the translationcapabilities of large language models (LLMs) in the context of machinetranslation ...
MOMENT: A Family of Open Time-series Foundation Models
We introduce MOMENT, a family of open-source foundation models forgeneral-purpose time series analysis. Pre-training large models on time seriesdata i...
BASS: Batched Attention-optimized Speculative Sampling
Speculative decoding has emerged as a powerful method to improve latency andthroughput in hosting large language models. However, most existingimpleme...
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Mathematical equations have been unreasonably effective in describing complexnatural phenomena across various scientific disciplines. However, discove...
Economics
No articles found.
Electrical Engineering and Systems Science
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Self-Supervised Learning (SSL) has demonstrated promising results in 3Dmedical image analysis. However, the lack of high-level semantics inpre-trainin...
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology
Advancements in artificial intelligence have driven the development ofnumerous pathology foundation models capable of extracting clinically relevantin...
ECGformer: Leveraging transformer for ECG heartbeat arrhythmia classification
An arrhythmia, also known as a dysrhythmia, refers to an irregular heartbeat.There are various types of arrhythmias that can originate from different ...
Mathematics
High-fidelity single-spin shuttling in silicon
The computational power and fault-tolerance of future large-scale quantumprocessors derive in large part from the connectivity between the qubits. One...
Exact Thermal Eigenstates of Nonintegrable Spin Chains at Infinite Temperature
The eigenstate thermalization hypothesis (ETH) plays a major role inexplaining thermalization of isolated quantum many-body systems. However, therehas...
Physics
Precise test of lepton flavour universality in W-boson decays into muons and electrons in pp collisions at s√=13 TeV with the ATLAS detector
The ratio of branching ratios of the W boson to muons and electrons, Rμ/eW=B(W→μν)/B(W→eν), has been measured using 140 fb−1 of pp collision data at s...
Spin-polarized Specular Andreev Reflections in Altermagnets
We show theoretically that specular Andreev reflection occurs stably ataltermagnet–superconductor interfaces, which is a phenomenon that haspreviously...
Parametric multi-element coupling architecture for coherent and dissipative control of superconducting qubits
As systems for quantum computing keep growing in size and number of qubits,challenges in scaling the control capabilities are becoming increasinglyrel...
The Quantum Internet
Quantum networks offer a unifying set of opportunities and challenges acrossexciting intellectual and technical frontiers, including for quantumcomput...
Hardware-efficient quantum error correction via concatenated bosonic qubits
In order to solve problems of practical importance, quantum computers willlikely need to incorporate quantum error correction, where a logical qubit i...
Nonreciprocal Quantum Batteries
Nonreciprocity, arising from the breaking of time-reversal symmetry, hasbecome a fundamental tool in diverse quantum technology applications. Itenable...
High-fidelity single-spin shuttling in silicon
The computational power and fault-tolerance of future large-scale quantumprocessors derive in large part from the connectivity between the qubits. One...
Iterative assembly of ^171Yb atom arrays with cavity-enhanced optical lattices
Assembling and maintaining large arrays of individually addressable atoms isa key requirement for continued scaling of neutral-atom-based quantum comp...
Quantum Melting of a Disordered Wigner Solid
The behavior of two-dimensional electron gas (2DEG) in extreme couplinglimits are reasonably well-understood, but our understanding of intermediatereg...
Solving the strong CP problem without axions
We formulate general conditions under which the strong CP problem is solvedby spontaneous CP violation. Quark-mass matrix elements are polynomials in ...
DESI 2024 IV: Baryon Acoustic Oscillations from the Lyman Alpha Forest
We present the measurement of Baryon Acoustic Oscillations (BAO) from theLyman-α (Lyα) forest of high-redshift quasars with the first-yeardataset of t...
Gravitational entropy is observer-dependent
In quantum gravity, it has been argued that a proper accounting of the roleplayed by an observer promotes the von Neumann algebra of observables in ag...
How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits
We significantly reduce the cost of factoring integers and computing discretelogarithms in finite fields on a quantum computer by combining techniques...
A Review of Gravitational Memory and BMS Frame Fixing in Numerical Relativity
Gravitational memory effects and the BMS freedoms exhibited at future nullinfinity have recently been resolved and utilized in numerical relativitysim...
The Sonora Substellar Atmosphere Models. IV. Elf Owl: Atmospheric Mixing and Chemical Disequilibrium with Varying Metallicity and C/O Ratios
Disequilibrium chemistry due to vertical mixing in the atmospheres of manybrown dwarfs and giant exoplanets is well-established. Atmosphere models for...
Distinguishing oceans of water from magma on mini-Neptune K2-18b
Mildly irradiated mini-Neptunes have densities potentially consistent withthem hosting substantial liquid water oceans (`Hycean' planets). The presenc...
Krylov complexity of density matrix operators
Quantifying complexity in quantum systems has witnessed a surge of interestin recent years, with Krylov-based measures such as Krylov complexity (C_K)...
Quantitative-Biology
Enhancing the efficiency of protein language models with minimal wet-lab data through few-shot learning
Accurately modeling the protein fitness landscapes holds great importance forprotein engineering. Recently, due to their capacity and representationab...
Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2
Protein diffusion models have emerged as a promising approach for proteindesign. One such pioneering model is Genie, a method that asymmetricallyrepre...
Quantitative-Finance
No articles found.
Statistics
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead
Black box machine learning models are currently being used for high stakesdecision-making throughout society, causing problems throughout healthcare,c...
Linear Model and Extensions
I developed the lecture notes based on my “Linear Model” course at theUniversity of California Berkeley over the past seven years. This book providesa...
Fitting Linear Mixed-Effects Models using lme4
Maximum likelihood or restricted maximum likelihood (REML) estimates of theparameters in linear mixed-effects models can be determined using the lmerf...