Scholar Articles

Computer Science

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement ...

Personalized Wireless Federated Learning for Large Language Models

Large Language Models (LLMs) have revolutionized natural language processingtasks. However, their deployment in wireless networks still face challenge...

Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction

Document-level Relation Triplet Extraction (DocRTE) is a fundamental task ininformation systems that aims to simultaneously extract entities with sema...

3D Vision-Language Gaussian Splatting

Recent advancements in 3D reconstruction methods and vision-language modelshave propelled the development of multi-modal 3D scene understanding, which...

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Image editing has advanced significantly with the introduction oftext-conditioned diffusion models. Despite this progress, seamlessly addingobjects to...

Raidar: geneRative AI Detection viA Rewriting

We find that large language models (LLMs) are more likely to modifyhuman-written text than AI-generated text when tasked with rewriting. Thistendency ...

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Large Language Models (LLMs) are rapidly surpassing human knowledge in manydomains. While improving these models traditionally relies on costly humand...

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

We present latentSplat, a method to predict semantic Gaussians in a 3D latentspace that can be splatted and decoded by a light-weight generative 2Darc...

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Generative Large Language Models (LLMs) have made significant strides acrossvarious tasks, but they remain vulnerable to backdoor attacks, where speci...

Do language models plan ahead for future tokens?

Do transformers "think ahead" during inference at a given position? It isknown transformers prepare information in the hidden states of the forward pa...

Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation

Comprehensive clinical documentation is crucial for effective healthcaredelivery, yet it poses a significant burden on healthcare professionals,leadin...

Self-Discover: Large Language Models Self-Compose Reasoning Structures

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover thetask-intrinsic reasoning structures to tackle complex reasoning problems ...

HyperFast: Instant Classification for Tabular Data

Training deep learning models and performing hyperparameter tuning can becomputationally demanding and time-consuming. Meanwhile, traditional machinel...

AgentReview: Exploring Peer Review Dynamics with LLM Agents

Peer review is fundamental to the integrity and advancement of scientificpublication. Traditional methods of peer review analyses often rely onexplora...

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

The advancement of artificial intelligence (AI) hinges on the quality andaccessibility of data, yet the current fragmentation and variability of datas...

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

Evaluating large language models (LLM) in clinical scenarios is crucial toassessing their potential clinical utility. Existing benchmarks rely heavily...

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

We introduce Groma, a Multimodal Large Language Model (MLLM) with groundedand fine-grained visual perception ability. Beyond holistic imageunderstandi...

Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

This work contributes a novel deep navigation policy that enablescollision-free flight of aerial robots based on a modular approach exploitingdeep col...

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

The robustness of recent Large Language Models (LLMs) has become increasinglycrucial as their applicability expands across various domains and real-wo...

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

The math abilities of large language models can represent their abstractreasoning ability. In this paper, we introduce and open-source our mathreasoni...

Understanding Robustness of Visual State Space Models for Image Classification

Visual State Space Model (VMamba) has recently emerged as a promisingarchitecture, exhibiting remarkable performance in various computer visiontasks. ...

Ethical and social risks of harm from Language Models

This paper aims to help structure the risk landscape associated withlarge-scale Language Models (LMs). In order to foster advances in responsibleinnov...

Audio Anti-Spoofing Detection: A Survey

The availability of smart devices leads to an exponential increase inmultimedia content. However, the rapid advancements in deep learning have givenri...

WPO: Enhancing RLHF with Weighted Preference Optimization

Reinforcement learning from human feedback (RLHF) is a promising solution toalign large language models (LLMs) more closely with human values. Off-pol...

T3: Transparent Tracking Triggering for Fine-grained Overlap of Compute Collectives

Large Language Models increasingly rely on distributed techniques for theirtraining and inference. These techniques require communication across devic...

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

While many have shown how Large Language Models (LLMs) can be applied to adiverse set of tasks, the critical issues of data contamination andmemorizat...

Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models

Knowledge graph completion (KGC) is a widely used method to tackleincompleteness in knowledge graphs (KGs) by making predictions for missinglinks. Des...

Understanding deep learning requires rethinking generalization

Despite their massive size, successful deep artificial neural networks canexhibit a remarkably small difference between training and test performance....

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

KV cache stores key and value states from previous tokens to avoidre-computation, yet it demands substantial storage space, especially for longsequenc...

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs

Timely, personalized feedback is essential for students learning programming.LLM-powered tools like ChatGPT offer instant support, but reveal direct a...

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

In this report, we present RT-DETRv2, an improved Real-Time DEtectionTRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-artreal-ti...

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

Neural Radiance Field (NeRF) has achieved superior performance for novel viewsynthesis by modeling the scene with a Multi-Layer Perception (MLP) and a...

Gemini: A Family of Highly Capable Multimodal Models

This report introduces a new family of multimodal models, Gemini, thatexhibit remarkable capabilities across image, audio, video, and textunderstandin...

Large Language Model with Graph Convolution for Recommendation

In recent years, efforts have been made to use text information for betteruser profiling and item characterization in recommendations. However, textin...

WildGaussians: 3D Gaussian Splatting in the Wild

While the field of 3D scene reconstruction is dominated by NeRFs due to theirphotorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged...

Language Models for Code Completion: A Practical Evaluation

Transformer-based language models for automatic code completion have showngreat promise so far, yet the evaluation of these models rarely uses real da...

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Self-supervised learning (SSL) has been successful in building patchembeddings of small histology images (e.g., 224x224 pixels), but scaling thesemode...

AI and personalized learning: bridging the gap with modern educational goals

Personalized learning (PL) aspires to provide an alternative to theone-size-fits-all approach in education. Technology-based PL solutions haveshown no...

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Neural machine translation is a relatively new approach to statisticalmachine translation based purely on neural networks. The neural machinetranslati...

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Graph transformer has been proven as an effective graph learning method forits adoption of attention mechanism that is capable of capturing expressive...

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

Conversational Recommender System (CRS) interacts with users through naturallanguage to understand their preferences and provide personalizedrecommend...

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Prompt ensembling of Large Language Model (LLM) generated category-specificprompts has emerged as an effective method to enhance zero-shot recognition...

Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages

The development of Large Language Models (LLMs) relies on extensive textcorpora, which are often unevenly distributed across languages. This imbalance...

Visibility into AI Agents

Increased delegation of commercial, scientific, governmental, and personalactivities to AI agents – systems capable of pursuing complex goals withlimi...

Compression Represents Intelligence Linearly

There is a belief that learning to compress well will lead to intelligence.Recently, language modeling has been shown to be equivalent to compression,...

Dual Operating Modes of In-Context Learning

In-context learning (ICL) exhibits dual operating modes: task learning, i.e.,acquiring a new skill from in-context samples, and task retrieval, i.e.,l...

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Imitation learning provides an efficient way to teach robots dexterousskills; however, learning complex skills robustly and generalizablely usuallycon...

OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation

Open-sourced, user-friendly tools form the bedrock of scientific advancementacross disciplines. The widespread adoption of data-driven learning has le...

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Large language models (LLMs) achieve remarkable performance in naturallanguage understanding but require substantial computation and memoryresources. ...

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Direct preference optimization (DPO) has shown to be an effective method forlarge language model (LLM) alignment. Recent works have attempted to apply...

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Recent advancements in foundation models (FMs) have unlocked new prospects inautonomous driving, yet the experimental settings of these studies arepre...

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

Zero-shot Video Object Segmentation (ZSVOS) aims at segmenting the primarymoving object without any human annotations. Mainstream solutions mainly foc...

Uncertainty Quantification on Clinical Trial Outcome Prediction

The importance of uncertainty quantification is increasingly recognized inthe diverse field of machine learning. Accurately assessing model prediction...

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Representing speech and audio signals in discrete units has become acompelling alternative to traditional high-dimensional feature vectors.Numerous st...

A Survey On Text-to-3D Contents Generation In The Wild

3D content creation plays a vital role in various applications, such asgaming, robotics simulation, and virtual reality. However, the process islabor-...

MileBench: Benchmarking MLLMs in Long Context

Despite the advancements and impressive performance of Multimodal LargeLanguage Models (MLLMs) on benchmarks, their effectiveness in real-world,long-c...

Do Membership Inference Attacks Work on Large Language Models?

Membership inference attacks (MIAs) attempt to predict whether a particulardatapoint is a member of a target model's training data. Despite extensiver...

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

Remarkable progress on English instruction tuning has facilitated theefficacy and reliability of large language models (LLMs). However, thereremains a...

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

Recent efforts have been dedicated to enhancing time series forecastingaccuracy by introducing advanced network architectures and self-supervisedpretr...

How to use and interpret activation patching

Activation patching is a popular mechanistic interpretability technique, buthas many subtleties regarding how it is applied and how one may interpret ...

To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

Medical open-domain question answering demands substantial access tospecialized knowledge. Recent efforts have sought to decouple knowledge frommodel ...

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Multimodal sentiment analysis (MSA) aims to understand human sentimentthrough multimodal data. Most MSA efforts are based on the assumption ofmodality...

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

Although social media platforms are a prominent arena for users to engage ininterpersonal discussions and express opinions, the facade and anonymityof...

Transformers, parallel computation, and logarithmic depth

We show that a constant number of self-attention layers can efficientlysimulate, and be simulated by, a constant number of communication rounds ofMass...

Datasheet for the Pile

This datasheet describes the Pile, a 825 GiB dataset of human-authored textcompiled by EleutherAI for use in large-scale language modeling. The Pile i...

Benchmarking Vision Language Models for Cultural Understanding

Foundation models and vision-language pre-training have notably advancedVision Language Models (VLMs), enabling multimodal processing of visual andlin...

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model thatperforms automatic speech recognition (ASR) across 100+ languages. This isachi...

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Recent progress in pre-trained diffusion models and 3D generation havespurred interest in 4D content creation. However, achieving high-fidelity 4Dgene...

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

Accurate monocular metric depth estimation (MMDE) is crucial to solvingdownstream tasks in 3D perception and modeling. However, the remarkableaccuracy...

A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions

The recent progression of Large Language Models (LLMs) has witnessed greatsuccess in the fields of data-centric applications. LLMs trained on massivet...

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

The development of multimodal models has significantly advanced multimodalsentiment analysis and emotion recognition. However, in real-worldapplicatio...

KAN 2.0: Kolmogorov-Arnold Networks Meet Science

A major challenge of AI + Science lies in their inherent incompatibility:today's AI is primarily based on connectionism, while science depends onsymbo...

A Survey on Hardware Accelerators for Large Language Models

Large Language Models (LLMs) have emerged as powerful tools for naturallanguage processing tasks, revolutionizing the field with their ability tounder...

A Comprehensive Survey on Kolmogorov Arnold Networks (KAN)

Through this comprehensive survey of Kolmogorov-Arnold Networks(KAN), we havegained a thorough understanding of its theoretical foundation, architectu...

Flow Matching Imitation Learning for Multi-Support Manipulation

Humanoid robots could benefit from using their upper bodies for supportcontacts, enhancing their workspace, stability, and ability to performcontact-r...

CRAG – Comprehensive RAG Benchmark

Retrieval-Augmented Generation (RAG) has recently emerged as a promisingsolution to alleviate Large Language Model (LLM)'s deficiency in lack ofknowle...

Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation

Large language models like GPT-3.5-turbo and GPT-4 hold promise forhealthcare professionals, but they may inadvertently inherit biases duringtheir tra...

The Unreasonable Effectiveness of Eccentric Automatic Prompts

Large Language Models (LLMs) have demonstrated remarkable problem-solving andbasic mathematics abilities. However, their efficacy is highly contingent...

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

3D Gaussians have recently emerged as an efficient representation for novelview synthesis. This work studies its editability with a particular focus o...

Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models

The remarkable success of Large Language Models (LLMs) has ushered naturallanguage processing (NLP) research into a new era. Despite their diversecapa...

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Human feedback is central to the alignment of Large Language Models (LLMs).However, open questions remain about methods (how), domains (where), people...

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Large Language Models (LLMs) trained on large volumes of data excel atvarious natural language tasks, but they cannot handle tasks requiringknowledge ...

Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets

This research explores the integration of language embeddings for activelearning in autonomous driving datasets, with a focus on novelty detection.Nov...

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

CLIP, as a vision-language model, has significantly advanced Open-VocabularySemantic Segmentation (OVSS) with its zero-shot capabilities. Despite itss...

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Mathematical reasoning is an important capability of large languagemodels (LLMs) for real-world applications. To enhance this capability, existingwork...

Probing the Creativity of Large Language Models: Can models produce divergent semantic association?

Large language models possess remarkable capacity for processing language,but it remains unclear whether these models can further generate creativecon...

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Recent advances in Text-to-Video generation (T2V) have achieved remarkablesuccess in synthesizing high-quality general videos from textual description...

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

In recent years, instruction-tuned Large Multimodal Models (LMMs) have beensuccessful at several tasks, including image captioning and visual question...

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

Vehicle motion planning is an essential component of autonomous drivingtechnology. Current rule-based vehicle motion planning methods performsatisfact...

AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning

Large-scale pretraining followed by task-specific finetuning has achievedgreat success in various NLP tasks. Since finetuning all parameters of largep...

Spiral of Silence: How is Large Language Model Killing Information Retrieval? – A Case Study on Open Domain Question Answering

The practice of Retrieval-Augmented Generation (RAG), which integrates LargeLanguage Models (LLMs) with retrieval systems, has become increasinglyprev...

NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data

Large Language Models (LLMs) have shown impressive abilities in dataannotation, opening the way for new approaches to solve classic NLP problems.In th...

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve themodel's performance on unseen domains. Existing methods either rely on do...

A Tale of Tails: Model Collapse as a Change of Scaling Laws

As AI model size grows, neural scaling laws have become a crucial tool topredict the improvements of large models when increasing capacity and the siz...

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Vision Language Models (VLMs) demonstrate remarkable proficiency inaddressing a wide array of visual questions, which requires strong perceptionand re...

Enhancing Large Language Models for Text-to-Testcase Generation

Context: Test-driven development (TDD) is a widely employed softwaredevelopment practice that involves developing test cases based on requirementsprio...

SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Instruction-tuned Large Language Models (LLMs) have recently showcasedremarkable advancements in their ability to generate fitting responses tonatural...

Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

This study investigates the loss of generalization ability in neuralnetworks, revisiting warm-starting experiments from Ash Adams. Our empiricalan...

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

We introduce Generalized Instruction Tuning (called GLAN), a general andscalable method for instruction tuning of Large Language Models (LLMs). Unlike...

Diffusion Models, Image Super-Resolution And Everything: A Survey

Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) fieldand further closed the gap between image quality and human perceptualprefer...

Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing

Large language models (LLMs) are increasingly being adopted in a wide rangeof real-world applications. Despite their impressive performance, recentstu...

Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors

Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks forunseen test data. The primary challenge of TTA is limited access to the enti...

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives

The emergence of Generative Artificial Intelligence (AI) and Large LanguageModels (LLMs) has marked a new era of Natural Language Processing (NLP),int...

Thinking Tokens for Language Modeling

How much is 56 times 37? Language models often make mistakes in these typesof difficult calculations. This is usually explained by their inability top...

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Multimodal image fusion aims to integrate information from different imagingtechniques to produce a comprehensive, detail-rich single image for downst...

Low-Rank Few-Shot Adaptation of Vision-Language Models

Recent progress in the few-shot adaptation of Vision-Language Models (VLMs)has further pushed their generalization capabilities, at the expense of jus...

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Garment manipulation (e.g., unfolding, folding and hanging clothes) isessential for future robots to accomplish home-assistant tasks, while highlychal...

Croissant: A Metadata Format for ML-Ready Datasets

Data is a critical resource for machine learning (ML), yet working with dataremains a key friction point. This paper introduces Croissant, a metadataf...

Curriculum reinforcement learning for quantum architecture search under hardware errors

The key challenge in the noisy intermediate-scale quantum era is findinguseful circuits compatible with current device limitations. Variational quantu...

Applications of Deep Neural Networks with Keras

Deep learning is a group of exciting new technologies for neural networks.Through a combination of advanced training techniques and neural networkarch...

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

With the significant development of large models in recent years, LargeVision-Language Models (LVLMs) have demonstrated remarkable capabilities across...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Language models demonstrate both quantitative improvement and new qualitativecapabilities with increasing scale. Despite their potentially transformat...

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

The advent of Large Language Models (LLMs) has brought an unprecedented surgein machine-generated text (MGT) across diverse channels. This raises legi...

Versatile Behavior Diffusion for Generalized Traffic Agent Simulation

Existing traffic simulation models often fail to capture the complexities ofreal-world scenarios, limiting the effective evaluation of autonomous driv...

COCONut: Modernizing COCO Segmentation

In recent decades, the vision community has witnessed remarkable progress invisual recognition, partially owing to advancements in dataset benchmarks....

Exploring the Potential of Large Language Models in Self-adaptive Systems

Large Language Models (LLMs), with their abilities in knowledge acquisitionand reasoning, can potentially enhance the various aspects of Self-adaptive...

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

In singing voice synthesis (SVS), generating singing voices from musicalscores faces challenges due to limited data availability. This study proposes ...

Decentralized Multi-Robot Navigation for Autonomous Surface Vehicles with Distributional Reinforcement Learning

Collision avoidance algorithms for Autonomous Surface Vehicles (ASV) thatfollow the Convention on the International Regulations for PreventingCollisio...

Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning

This research pioneers the use of fine-tuned Large Language Models (LLMs) toautomate Systematic Literature Reviews (SLRs), presenting a significant an...

Dynamic Prompt Optimizing for Text-to-Image Generation

Text-to-image generative models, specifically those based on diffusion modelslike Imagen and Stable Diffusion, have made substantial advancements. Rec...

What If We Recaption Billions of Web Images with LLaMA-3?

Web-crawled image-text pairs are inherently noisy. Prior studies demonstratethat semantically aligning and enriching textual descriptions of these pai...

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Large, high-capacity models trained on diverse datasets have shown remarkablesuccesses on efficiently tackling downstream applications. In domains fro...

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

We introduce Bonito, an open-source model for conditional task generationthat converts unannotated text into task-specific training datasets forinstru...

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Generalist web agents have demonstrated remarkable potential in autonomouslycompleting a wide range of tasks on real websites, significantly boosting ...

Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM

LLMs have become the go-to choice for code generation tasks, with anexponential increase in the training, development, and usage of LLMsspecifically f...

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

We propose TRAM, a two-stage method to reconstruct a human's globaltrajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recoverthe ...

A Survey on Kolmogorov-Arnold Network

This systematic review explores the theoretical foundations, evolution,applications, and future potential of Kolmogorov-Arnold Networks (KAN), aneural...

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Instruction-tuned Large Language Models (LLMs) show impressive results innumerous practical applications, but they lack essential safety features that...

DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation

Imitation learning from human hand motion data presents a promising avenuefor imbuing robots with human-like dexterity in real-world manipulation task...

Kolmogorov-Arnold Networks are Radial Basis Function Networks

This short paper is a fast proof-of-concept that the 3-order B-splines usedin Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussianra...

Spectral Networks and Locally Connected Networks on Graphs

Convolutional Neural Networks are extremely efficient architectures in imageand audio recognition tasks, thanks to their ability to exploit the localt...

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

This paper presents a novel method for exerting fine-grained lighting controlduring text-driven diffusion-based image generation. While existing diffu...

Personalized Language Modeling from Personalized Human Feedback

Personalized large language models (LLMs) are designed to tailor responses toindividual user preferences. While Reinforcement Learning from Human Feed...

Research on Autonomous Robots Navigation based on Reinforcement Learning

Reinforcement learning continuously optimizes decision-making based onreal-time feedback reward signals through continuous interaction with theenviron...

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

The Open-MAGVIT2 project produces an open-source replication of Google'sMAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^18codes)...

Dense Reward for Free in Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback (RLHF) has been credited as thekey advance that has allowed Large Language Models (LLMs) to effectively fol...

Understanding Test-Time Augmentation

Test-Time Augmentation (TTA) is a very powerful heuristic that takesadvantage of data augmentation during testing to produce averaged output.Despite t...

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

This paper introduces a network for volumetric segmentation that learns fromsparsely annotated volumetric images. We outline two attractive use cases ...

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

We introduce MuAViC, a multilingual audio-visual corpus for robust speechrecognition and robust speech-to-text translation providing 1200 hours ofaudi...

Multi-Object Hallucination in Vision-Language Models

Large vision language models (LVLMs) often suffer from object hallucination,producing objects not present in the given images. While current benchmark...

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

This paper presents a study on strategies to enhance the translationcapabilities of large language models (LLMs) in the context of machinetranslation ...

MOMENT: A Family of Open Time-series Foundation Models

We introduce MOMENT, a family of open-source foundation models forgeneral-purpose time series analysis. Pre-training large models on time seriesdata i...

BASS: Batched Attention-optimized Speculative Sampling

Speculative decoding has emerged as a powerful method to improve latency andthroughput in hosting large language models. However, most existingimpleme...

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Mathematical equations have been unreasonably effective in describing complexnatural phenomena across various scientific disciplines. However, discove...

Economics

No articles found.

Electrical Engineering and Systems Science

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Self-Supervised Learning (SSL) has demonstrated promising results in 3Dmedical image analysis. However, the lack of high-level semantics inpre-trainin...

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Advancements in artificial intelligence have driven the development ofnumerous pathology foundation models capable of extracting clinically relevantin...

ECGformer: Leveraging transformer for ECG heartbeat arrhythmia classification

An arrhythmia, also known as a dysrhythmia, refers to an irregular heartbeat.There are various types of arrhythmias that can originate from different ...

Mathematics

High-fidelity single-spin shuttling in silicon

The computational power and fault-tolerance of future large-scale quantumprocessors derive in large part from the connectivity between the qubits. One...

Exact Thermal Eigenstates of Nonintegrable Spin Chains at Infinite Temperature

The eigenstate thermalization hypothesis (ETH) plays a major role inexplaining thermalization of isolated quantum many-body systems. However, therehas...

Physics

Precise test of lepton flavour universality in W-boson decays into muons and electrons in pp collisions at s√=13 TeV with the ATLAS detector

The ratio of branching ratios of the W boson to muons and electrons, Rμ/eW=B(W→μν)/B(W→eν), has been measured using 140 fb−1 of pp collision data at s...

Spin-polarized Specular Andreev Reflections in Altermagnets

We show theoretically that specular Andreev reflection occurs stably ataltermagnet–superconductor interfaces, which is a phenomenon that haspreviously...

Parametric multi-element coupling architecture for coherent and dissipative control of superconducting qubits

As systems for quantum computing keep growing in size and number of qubits,challenges in scaling the control capabilities are becoming increasinglyrel...

The Quantum Internet

Quantum networks offer a unifying set of opportunities and challenges acrossexciting intellectual and technical frontiers, including for quantumcomput...

Hardware-efficient quantum error correction via concatenated bosonic qubits

In order to solve problems of practical importance, quantum computers willlikely need to incorporate quantum error correction, where a logical qubit i...

Nonreciprocal Quantum Batteries

Nonreciprocity, arising from the breaking of time-reversal symmetry, hasbecome a fundamental tool in diverse quantum technology applications. Itenable...

High-fidelity single-spin shuttling in silicon

The computational power and fault-tolerance of future large-scale quantumprocessors derive in large part from the connectivity between the qubits. One...

Iterative assembly of ^171Yb atom arrays with cavity-enhanced optical lattices

Assembling and maintaining large arrays of individually addressable atoms isa key requirement for continued scaling of neutral-atom-based quantum comp...

Quantum Melting of a Disordered Wigner Solid

The behavior of two-dimensional electron gas (2DEG) in extreme couplinglimits are reasonably well-understood, but our understanding of intermediatereg...

Solving the strong CP problem without axions

We formulate general conditions under which the strong CP problem is solvedby spontaneous CP violation. Quark-mass matrix elements are polynomials in ...

DESI 2024 IV: Baryon Acoustic Oscillations from the Lyman Alpha Forest

We present the measurement of Baryon Acoustic Oscillations (BAO) from theLyman-α (Lyα) forest of high-redshift quasars with the first-yeardataset of t...

Gravitational entropy is observer-dependent

In quantum gravity, it has been argued that a proper accounting of the roleplayed by an observer promotes the von Neumann algebra of observables in ag...

How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits

We significantly reduce the cost of factoring integers and computing discretelogarithms in finite fields on a quantum computer by combining techniques...

A Review of Gravitational Memory and BMS Frame Fixing in Numerical Relativity

Gravitational memory effects and the BMS freedoms exhibited at future nullinfinity have recently been resolved and utilized in numerical relativitysim...

The Sonora Substellar Atmosphere Models. IV. Elf Owl: Atmospheric Mixing and Chemical Disequilibrium with Varying Metallicity and C/O Ratios

Disequilibrium chemistry due to vertical mixing in the atmospheres of manybrown dwarfs and giant exoplanets is well-established. Atmosphere models for...

Distinguishing oceans of water from magma on mini-Neptune K2-18b

Mildly irradiated mini-Neptunes have densities potentially consistent withthem hosting substantial liquid water oceans (`Hycean' planets). The presenc...

Krylov complexity of density matrix operators

Quantifying complexity in quantum systems has witnessed a surge of interestin recent years, with Krylov-based measures such as Krylov complexity (C_K)...

Quantitative-Biology

Enhancing the efficiency of protein language models with minimal wet-lab data through few-shot learning

Accurately modeling the protein fitness landscapes holds great importance forprotein engineering. Recently, due to their capacity and representationab...

Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2

Protein diffusion models have emerged as a promising approach for proteindesign. One such pioneering model is Genie, a method that asymmetricallyrepre...

Quantitative-Finance

No articles found.

Statistics

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

Black box machine learning models are currently being used for high stakesdecision-making throughout society, causing problems throughout healthcare,c...

Linear Model and Extensions

I developed the lecture notes based on my “Linear Model” course at theUniversity of California Berkeley over the past seven years. This book providesa...

Fitting Linear Mixed-Effects Models using lme4

Maximum likelihood or restricted maximum likelihood (REML) estimates of theparameters in linear mixed-effects models can be determined using the lmerf...