Compare commits

...

11 Commits
b9510 ... b9521

Author SHA1 Message Date
Oliver Simons
2154a0fdcf CUDA: enroll mul_mat_vec_q_moe into pdl (#24087)
* Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW

Data collected on a B4500:

Before
```
(llama.cpp) ➜  llama.cpp git:(master) ✗ python mtp-bench.py
  code_python        pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8
  code_cpp           pred= 192 draft= 147 acc= 117 rate=0.796 tok/s=212.8
  explain_concept    pred= 192 draft= 161 acc= 110 rate=0.683 tok/s=196.4
  summarize          pred= 192 draft= 138 acc= 122 rate=0.884 tok/s=226.6
  qa_factual         pred= 192 draft= 138 acc= 121 rate=0.877 tok/s=225.1
  translation        pred= 192 draft= 158 acc= 112 rate=0.709 tok/s=201.5
  creative_short     pred= 192 draft= 160 acc= 110 rate=0.688 tok/s=197.2
  stepwise_math      pred= 192 draft= 150 acc= 115 rate=0.767 tok/s=209.2
  long_code_review   pred= 192 draft= 148 acc= 116 rate=0.784 tok/s=208.9
```
After
```
(llama.cpp) ➜  llama.cpp git:(master) ✗ python mtp-bench.py
  code_python        pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=211.9
  code_cpp           pred= 192 draft= 147 acc= 117 rate=0.796 tok/s=224.6
  explain_concept    pred= 192 draft= 161 acc= 110 rate=0.683 tok/s=207.8
  summarize          pred= 192 draft= 138 acc= 122 rate=0.884 tok/s=240.2
  qa_factual         pred= 192 draft= 138 acc= 121 rate=0.877 tok/s=238.5
  translation        pred= 192 draft= 158 acc= 112 rate=0.709 tok/s=213.4
  creative_short     pred= 192 draft= 160 acc= 110 rate=0.688 tok/s=208.8
  stepwise_math      pred= 192 draft= 150 acc= 115 rate=0.767 tok/s=221.7
  long_code_review   pred= 192 draft= 148 acc= 116 rate=0.784 tok/s=220.7
```

Server launched with:
```
➜  llama.cpp git:(osimons/enroll_mul_mat_vec_q_moe_into_PDL) ✗ ./build-x64-linux-gcc-reldbg/bin/llama-server \
    -m /mnt/share/gguf/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf -dio \
    --spec-type draft-mtp \
    --spec-draft-n-max 2 \
    -ngl all \
    -fa on \
    --host 0.0.0.0 \
    --port 8080 -np 1 --chat-template-kwargs "{\"preserve_thinking\": true}"
```

* LC to overlap with following kernels
2026-06-05 08:37:34 +02:00
Daniel Bevenius
46fa662b1f ci : build-msys job slimming [no ci] (#24157)
This PR attempts to slim down the dependencies for build-msys jobs
making the same changes that we applied in whisper.cpp to reduce the
size of the github actions cache, and should also improve the run time
due to fewer dependencies that need to be installed.

I realize this is a scheduled job but I think it would still make sense
to apply these changes.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3858
2026-06-05 07:57:36 +02:00
Mason Milburn
7fe2ae45ab sycl : port multi-column MMVQ from CUDA backend (#21845)
mmvq:

Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL.
Read weights once per dispatch instead of once per column.
Covers all standard quant types + reorder paths for Q4_0, Q8_0,
Q3_K, Q4_K, Q5_K, Q6_K. IQ types (except IQ4_XS) excluded due to
incompatible vec_dot signatures.

ggml-sycl:

The weight reorder was only bootstrapped on single-token mat-vec
(ne[1] == 1). Speculative / MTP verify issues only multi-column mat-vec,
so it never triggered the reorder and ran on the slower non-reorder
kernel. Bootstrap it on small multi-column batches (ne[1] <= 8) too.
2026-06-05 08:10:31 +03:00
Georgi Gerganov
7c158fbb4a server : disable on-device spec checkpoints (#24108) 2026-06-04 19:30:59 +03:00
Xuan-Son Nguyen
260862b8ca arg: fix double mtp downloads (#24128) 2026-06-04 19:23:48 +03:00
viggy
42b2d60e57 webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (#23132)
* use child snippets for landing and chat message elements

* make ... icon visible in conversation history menu

* conversation history forward tab fix

* add snippet fix for fork icon in conversation history

* focus/keyboard fix for attachment x icon and scroll left/right

* formatting

* fix scroll down issue

* simply Statistics and pointer events in scrolldown

* create storybook tests and move to folder

* improve tests to actually assert on element
2026-06-04 17:59:00 +02:00
Bartowski
e7bcf1c3a8 Move duplicated imatrix code into single common imatrix-loader.cpp (#22445)
* Deduplicate imatrix loading code

* Add back LLAMA_TRACE, early exit on quantize missing metadata
2026-06-04 17:45:40 +02:00
Aleksander Grygier
21444c822e ui: Fixed packages (#24119)
* chore(ui): pin package versions to currently installed

- Update all dependencies and devDependencies to match exactly what's in package-lock.json
- This ensures reproducible builds by locking to specific versions rather than semver ranges

* chore: Update packages

* chore: Move remaining dependencies to devDependencies

* fix: Add missing `mermaid` package

* chore: Update `cookie` package to `v1.1.1`

* chore: Formatting

* test: Update test configs
2026-06-04 16:23:08 +02:00
MagicExists
526977068f ui: added single line reasoning preview (#23601)
* webui: added single line reasoning preview.

* patch: reduce width slightly for the previewing section

* refactor: move formatter constants to the right file

* feat: reimplement reasoning preview with throttled dynamic per-line rendering

* chore: fix spacing

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: refactor to requested changes

* refactor: grouped by capture pattern instead of block-level + inline

* ui: fax interrupt state only trigger for 1st reasoning message

* chore: make reasoning preview respects showThoughtInProgress setting

* chore; newline at EOF

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* fix: thread rawContent so collapsible content can handle compute preview

* patch: showThoughtInProgress accidentally blocks rawContent being passed

* chore: fix lint

* chore: change smoke test

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-06-04 16:09:43 +02:00
forforever73
0dbfa66a1f return filter to save memory (#24125)
Co-authored-by: lvyichen <lvyichen@stepfun.com>
2026-06-04 15:56:33 +02:00
Pedro Cuenca
e8023568d0 convert: Fix Gemma 4 Unified conversion (#24118)
* Fix Gemma 4 Unified conversion

* Set audio hidden size to audio_embed_dim
2026-06-04 15:21:38 +02:00
42 changed files with 3086 additions and 1487 deletions

View File

@@ -27,8 +27,8 @@ jobs:
fail-fast: false
matrix:
include:
- { sys: UCRT64, env: ucrt-x86_64, build: Release }
- { sys: CLANG64, env: clang-x86_64, build: Release }
- { sys: UCRT64, env: ucrt-x86_64, compiler: gcc, build: Release }
- { sys: CLANG64, env: clang-x86_64, compiler: clang, build: Release }
steps:
- name: Clone
@@ -48,9 +48,7 @@ jobs:
update: true
msystem: ${{matrix.sys}}
install: >-
base-devel
git
mingw-w64-${{matrix.env}}-toolchain
mingw-w64-${{matrix.env}}-${{matrix.compiler}}
mingw-w64-${{matrix.env}}-cmake
mingw-w64-${{matrix.env}}-openblas

View File

@@ -78,6 +78,8 @@ add_library(${TARGET}
hf-cache.cpp
hf-cache.h
http.h
imatrix-loader.cpp
imatrix-loader.h
json-partial.cpp
json-partial.h
json-schema-to-grammar.cpp

View File

@@ -446,6 +446,12 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
opts.download_mtp = spec_type_draft_mtp;
opts.download_mmproj = !params.no_mmproj;
// sub-models (draft, mmproj, vocoder) are explicitly specified by the user,
// so we should not auto-discover mtp/mmproj siblings for them
common_download_opts sub_opts = opts;
sub_opts.download_mtp = false;
sub_opts.download_mmproj = false;
try {
auto res = common_params_handle_model(params.model, opts);
if (params.no_mmproj) {
@@ -457,7 +463,7 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
// only download mmproj if the current example is using it
for (const auto & ex : mmproj_examples) {
if (curr_ex == ex) {
common_params_handle_model(params.mmproj, opts);
common_params_handle_model(params.mmproj, sub_opts);
break;
}
}
@@ -470,8 +476,8 @@ bool common_params_handle_models(common_params & params, llama_example curr_ex)
params.speculative.draft.mparams.url.empty()) {
params.speculative.draft.mparams.path = res.mtp.path;
}
common_params_handle_model(params.speculative.draft.mparams, opts);
common_params_handle_model(params.vocoder.model, opts);
common_params_handle_model(params.speculative.draft.mparams, sub_opts);
common_params_handle_model(params.vocoder.model, sub_opts);
return true;
} catch (const common_skip_download_exception &) {
return false;

165
common/imatrix-loader.cpp Normal file
View File

@@ -0,0 +1,165 @@
#include "imatrix-loader.h"
#include "common.h"
#include "log.h"
#include "gguf.h"
#include <cmath>
#include <cstring>
#include <fstream>
static bool common_imatrix_load_legacy(const std::string & fname, common_imatrix & imatrix) {
std::ifstream in(fname, std::ios::binary);
if (!in) {
LOG_ERR("%s: failed to open %s\n", __func__, fname.c_str());
return false;
}
int n_entries;
in.read((char *) &n_entries, sizeof(n_entries));
if (in.fail() || n_entries < 1) {
LOG_ERR("%s: no data in file %s\n", __func__, fname.c_str());
return false;
}
for (int i = 0; i < n_entries; ++i) {
int32_t len = 0;
in.read((char *) &len, sizeof(len));
std::vector<char> name_as_vec(len + 1);
in.read((char *) name_as_vec.data(), len);
if (in.fail()) {
LOG_ERR("%s: failed reading name for entry %d from %s\n", __func__, i + 1, fname.c_str());
return false;
}
name_as_vec[len] = 0;
std::string name{ name_as_vec.data() };
int32_t ncall = 0;
in.read((char *) &ncall, sizeof(ncall));
int32_t nval = 0;
in.read((char *) &nval, sizeof(nval));
if (in.fail() || nval < 1) {
LOG_ERR("%s: failed reading number of values for entry %d\n", __func__, i);
return false;
}
auto & e = imatrix.entries[std::move(name)];
e.sums.resize(nval);
in.read((char *) e.sums.data(), nval * sizeof(float));
if (in.fail()) {
LOG_ERR("%s: failed reading data for entry %d\n", __func__, i);
return false;
}
e.counts.resize(1);
e.counts[0] = ncall;
}
// the trailing data (chunk count + dataset name) is optional
if (in.peek() != EOF) {
int32_t n_calls = 0;
in.read((char *) &n_calls, sizeof(n_calls));
imatrix.chunk_count = n_calls;
if (!in.fail()) {
int32_t len = 0;
in.read((char *) &len, sizeof(len));
if (!in.fail() && len > 0) {
std::vector<char> dataset(len + 1, 0);
in.read(dataset.data(), len);
if (!in.fail()) {
imatrix.datasets.push_back(dataset.data());
}
}
}
}
imatrix.chunk_size = 0;
imatrix.is_legacy = true;
return true;
}
bool common_imatrix_load(const std::string & fname, common_imatrix & imatrix) {
struct ggml_context * ctx = nullptr;
struct gguf_init_params meta_gguf_params = {
/* .no_alloc = */ false,
/* .ctx = */ &ctx,
};
struct gguf_context * ctx_gguf = gguf_init_from_file(fname.c_str(), meta_gguf_params);
if (!ctx_gguf) {
return common_imatrix_load_legacy(fname, imatrix);
}
const int32_t n_entries = gguf_get_n_tensors(ctx_gguf);
if (n_entries < 1) {
LOG_ERR("%s: no data in file %s\n", __func__, fname.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
return false;
}
const int64_t datasets_key = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_DATASETS);
const int64_t chunk_count_key = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_CHUNK_COUNT);
const int64_t chunk_size_key = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_CHUNK_SIZE);
if (datasets_key != -1 && gguf_get_arr_type(ctx_gguf, datasets_key) == GGUF_TYPE_STRING) {
const int64_t n = gguf_get_arr_n(ctx_gguf, datasets_key);
imatrix.datasets.reserve(imatrix.datasets.size() + n);
for (int64_t i = 0; i < n; ++i) {
imatrix.datasets.push_back(gguf_get_arr_str(ctx_gguf, datasets_key, i));
}
}
imatrix.has_metadata = (datasets_key != -1 && chunk_count_key != -1 && chunk_size_key != -1);
imatrix.chunk_count = (chunk_count_key != -1) ? gguf_get_val_u32(ctx_gguf, chunk_count_key) : 0;
imatrix.chunk_size = (chunk_size_key != -1) ? gguf_get_val_u32(ctx_gguf, chunk_size_key) : 0;
const std::string in_sum2_suffix{ ".in_sum2" };
const std::string counts_suffix{ ".counts" };
std::map<std::string, std::pair<struct ggml_tensor *, struct ggml_tensor *>> sums_counts_for;
for (struct ggml_tensor * cur = ggml_get_first_tensor(ctx); cur; cur = ggml_get_next_tensor(ctx, cur)) {
std::string name = cur->name;
if (name.empty()) { continue; }
if (string_remove_suffix(name, in_sum2_suffix)) {
sums_counts_for[std::move(name)].first = cur;
} else if (string_remove_suffix(name, counts_suffix)) {
sums_counts_for[std::move(name)].second = cur;
}
}
for (const auto & sc : sums_counts_for) {
const std::string & name = sc.first;
const struct ggml_tensor * in_sum2 = sc.second.first;
const struct ggml_tensor * counts = sc.second.second;
if (!in_sum2 || !counts) {
LOG_ERR("%s: mismatched sums and counts for %s\n", __func__, name.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
return false;
}
auto & e = imatrix.entries[name];
const int64_t nval = ggml_nelements(in_sum2);
const int64_t ncounts = ggml_nelements(counts);
e.sums.resize(nval);
for (int64_t j = 0; j < nval; ++j) {
e.sums[j] = ((const float *) in_sum2->data)[j];
}
e.counts.resize(ncounts);
for (int64_t j = 0; j < ncounts; ++j) {
e.counts[j] = std::lround(((const float *) counts->data)[j]);
}
}
gguf_free(ctx_gguf);
ggml_free(ctx);
return true;
}

26
common/imatrix-loader.h Normal file
View File

@@ -0,0 +1,26 @@
#pragma once
#include <cstdint>
#include <map>
#include <string>
#include <vector>
inline constexpr const char * LLM_KV_IMATRIX_DATASETS = "imatrix.datasets";
inline constexpr const char * LLM_KV_IMATRIX_CHUNK_COUNT = "imatrix.chunk_count";
inline constexpr const char * LLM_KV_IMATRIX_CHUNK_SIZE = "imatrix.chunk_size";
struct common_imatrix_entry {
std::vector<float> sums;
std::vector<int64_t> counts;
};
struct common_imatrix {
std::map<std::string, common_imatrix_entry> entries;
std::vector<std::string> datasets;
int32_t chunk_count = 0;
int32_t chunk_size = 0;
bool is_legacy = false;
bool has_metadata = false;
};
bool common_imatrix_load(const std::string & fname, common_imatrix & imatrix);

View File

@@ -798,7 +798,8 @@ class Gemma4VisionAudioModel(MmprojModel):
# remap audio hparams
if self.hparams_audio:
self.hparams_audio["feat_in"] = self.hparams_audio.get("input_feat_size", 128)
self.hparams_audio["intermediate_size"] = self.hparams_audio["hidden_size"] * 4
if "hidden_size" in self.hparams_audio:
self.hparams_audio["intermediate_size"] = self.hparams_audio["hidden_size"] * 4
else:
self.has_audio_encoder = False
@@ -872,7 +873,7 @@ class Gemma4UnifiedVisionAudioModel(Gemma4VisionAudioModel):
assert self.hparams_audio is not None
text_embd_dim = self.hparams_vision["mm_embed_dim"]
self.hparams_vision["hidden_size"] = text_embd_dim
self.hparams_audio["hidden_size"] = text_embd_dim
self.hparams_audio["hidden_size"] = self.hparams_audio["audio_embed_dim"]
# this is a transformer-less vision tower, the params below are redundant but set to avoid error
self.hparams_vision["intermediate_size"] = 0
self.hparams_vision["num_layers"] = 0
@@ -897,7 +898,10 @@ class Gemma4UnifiedVisionAudioModel(Gemma4VisionAudioModel):
# ggml im2col outputs in RR..GG..BB.. (CHW) order, but weight expects RGBRGB.. (HWC).
# Permute columns so column i aligns with CHW input position i.
assert self.hparams_vision is not None
p = self.hparams_vision["model_patch_size"]
if "model_patch_size" in self.hparams_vision:
p = self.hparams_vision["model_patch_size"]
else:
p = self.hparams_vision["patch_size"] * self.hparams_vision["pooling_kernel_size"]
i = torch.arange(p * p * 3)
ch = i // (p * p)
row = (i % (p * p)) // p
@@ -908,7 +912,10 @@ class Gemma4UnifiedVisionAudioModel(Gemma4VisionAudioModel):
elif "patch_ln1.weight" in name or "patch_ln1.bias" in name:
# same permutation for patch_ln1 as patch_dense to align with CHW input order
assert self.hparams_vision is not None
p = self.hparams_vision["model_patch_size"]
if "model_patch_size" in self.hparams_vision:
p = self.hparams_vision["model_patch_size"]
else:
p = self.hparams_vision["patch_size"] * self.hparams_vision["pooling_kernel_size"]
i = torch.arange(p * p * 3)
ch = i // (p * p)
row = (i % (p * p)) // p

View File

@@ -175,7 +175,7 @@ int main(int argc, char ** argv) {
llama_memory_seq_pos_max(llama_get_memory(ctx_tgt), seq_id));
if (use_ckpt_dft) {
ckpt.update_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.update_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
}
// generate a new draft
@@ -196,12 +196,12 @@ int main(int argc, char ** argv) {
// this allows us to restore the state if partial draft acceptance occurs
if (!draft.empty()) {
if (use_ckpt_tgt) {
ckpt.update_tgt(ctx_tgt, seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.update_tgt(ctx_tgt, seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
}
}
{
ckpt.load_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
llama_memory_seq_rm(llama_get_memory(ctx_dft.get()), seq_id, ckpt.pos_max + 1, -1);
}
@@ -261,13 +261,13 @@ int main(int argc, char ** argv) {
draft = std::move(ids);
{
ckpt.load_tgt(ctx_tgt, seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_tgt(ctx_tgt, seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
llama_memory_seq_rm(llama_get_memory(ctx_tgt), seq_id, ckpt.pos_max + 1, -1);
}
{
ckpt.load_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_dft(ctx_dft.get(), seq_id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
llama_memory_seq_rm(llama_get_memory(ctx_dft.get()), seq_id, ckpt.pos_max + 1, -1);
}

View File

@@ -682,12 +682,16 @@ static __global__ void mul_mat_vec_q(
template <ggml_type type, int c_rows_per_block>
__launch_bounds__(get_mmvq_mmid_max_batch_for_device<type>()*ggml_cuda_get_physical_warp_size(), 1)
static __global__ void mul_mat_vec_q_moe(
const void * __restrict__ vx, const void * __restrict__ vy, const int32_t * __restrict__ ids,
float * __restrict__ dst,
const void * vx_ptr, const void * vy_ptr, const int32_t * ids_ptr,
float * dst_ptr,
const uint32_t ncols_x, const uint3 nchannels_y, const uint32_t nrows_x,
const uint32_t stride_row_x, const uint32_t stride_col_y, const uint32_t stride_col_dst,
const uint32_t stride_channel_x, const uint32_t stride_channel_y, const uint32_t stride_channel_dst,
const uint32_t ncols_dst, const uint32_t ids_stride) {
const void * GGML_CUDA_RESTRICT vx = vx_ptr;
const void * GGML_CUDA_RESTRICT vy = vy_ptr;
const int32_t * GGML_CUDA_RESTRICT ids = ids_ptr;
float * GGML_CUDA_RESTRICT dst = dst_ptr;
constexpr int qk = ggml_cuda_type_traits<type>::qk;
constexpr int qi = ggml_cuda_type_traits<type>::qi;
@@ -707,6 +711,7 @@ static __global__ void mul_mat_vec_q_moe(
return;
}
ggml_cuda_pdl_sync();
const uint32_t channel_x = ids[channel_dst + token_idx * ids_stride];
const uint32_t channel_y = fastmodulo(channel_dst, nchannels_y);
@@ -726,6 +731,8 @@ static __global__ void mul_mat_vec_q_moe(
}
}
ggml_cuda_pdl_lc();
// Warp-level reduction only - no shared memory needed
#pragma unroll
for (int i = 0; i < c_rows_per_block; ++i) {
@@ -794,8 +801,9 @@ static void mul_mat_vec_q_moe_launch(
const int64_t nblocks_rows = (nrows_x + rows_per_block - 1) / rows_per_block;
const dim3 block_nums(nblocks_rows, nchannels_dst);
const dim3 block_dims(warp_size, ncols_dst);
const ggml_cuda_kernel_launch_params launch_params = ggml_cuda_kernel_launch_params(block_nums, block_dims, 0, stream);
mul_mat_vec_q_moe<type, rows_per_block><<<block_nums, block_dims, 0, stream>>>(
ggml_cuda_kernel_launch(mul_mat_vec_q_moe<type, rows_per_block>, launch_params,
vx, vy, ids, dst, ncols_x, nchannels_y, nrows_x,
stride_row_x, stride_col_y, stride_col_dst,
stride_channel_x, stride_channel_y, stride_channel_dst,

View File

@@ -3971,7 +3971,9 @@ static bool should_reorder_tensor(ggml_backend_sycl_context& ctx, const ggml_ten
return !g_ggml_sycl_disable_optimize && //allow optimize, controlled by $GGML_SYCL_DISABLE_OPT
ctx.opt_feature.reorder && //allow this device due to good perf, skip the devices with bad perf.
dst->op == GGML_OP_MUL_MAT && //limit to some supported cases of Q4_0, to do for more cases.
dst->src[1]->ne[1]==1 && dst->src[1]->ne[2]==1 && dst->src[1]->ne[3]==1;
// ne[1] <= 8 so multi-column decode (spec / MTP verify) also bootstraps the reorder;
// all reorderable types have a _switch_ncols kernel.
dst->src[1]->ne[1] <= 8 && dst->src[1]->ne[2]==1 && dst->src[1]->ne[3]==1;
}
static void opt_for_reorder(ggml_backend_sycl_context * ctx, const ggml_tensor * src0, const ggml_tensor * /* src1 */,

File diff suppressed because it is too large Load Diff

View File

@@ -2112,6 +2112,15 @@ llama_memory_i * llama_model::create_memory(const llama_memory_params & params,
filter = [n_main](int32_t il) { return (uint32_t)il >= n_main; };
}
if (arch == LLM_ARCH_STEP35 && hparams.nextn_predict_layers > 0) {
const uint32_t n_main = hparams.n_layer - hparams.nextn_predict_layers;
if (params.ctx_type == LLAMA_CONTEXT_TYPE_MTP) {
filter = [n_main](int32_t il) { return (uint32_t)il >= n_main; };
} else {
filter = [n_main](int32_t il) { return (uint32_t)il < n_main; };
}
}
if (hparams.swa_type != LLAMA_SWA_TYPE_NONE) {
GGML_ASSERT(hparams.is_swa_any());

View File

@@ -1,5 +1,6 @@
#include "arg.h"
#include "common.h"
#include "imatrix-loader.h"
#include "log.h"
#include "llama.h"
#include "gguf.h"
@@ -34,10 +35,6 @@ static void print_usage(int, char ** argv) {
LOG("\n");
}
static const char * const LLM_KV_IMATRIX_DATASETS = "imatrix.datasets";
static const char * const LLM_KV_IMATRIX_CHUNK_COUNT = "imatrix.chunk_count";
static const char * const LLM_KV_IMATRIX_CHUNK_SIZE = "imatrix.chunk_size";
struct Stats {
std::vector<float> values;
std::vector<int64_t> counts;
@@ -65,7 +62,6 @@ public:
bool collect_imatrix(struct ggml_tensor * t, bool ask, void * user_data);
void save_imatrix_legacy(int32_t ncall = -1) const;
void save_imatrix(int32_t n_chunk = -1) const;
bool load_imatrix_legacy(const char * fname);
bool load_imatrix(const char * file_name);
const std::unordered_map<std::string, Stats> & get_mstats() const { return m_stats; }
private:
@@ -624,204 +620,63 @@ void IMatrixCollector::save_imatrix(int32_t n_chunk) const {
ggml_free(ctx);
}
bool IMatrixCollector::load_imatrix_legacy(const char * fname) {
std::ifstream in(fname, std::ios::binary);
if (!in) {
LOG_ERR("%s: failed to open %s\n", __func__, fname);
return false;
}
int n_entries;
in.read((char *) &n_entries, sizeof(n_entries));
if (in.fail() || n_entries < 1) {
LOG_ERR("%s: no data in file %s\n", __func__, fname);
return false;
}
// Guess the chunk size because it's not stored in the file
const int32_t chunk_size = m_params.n_ctx / m_params.n_parallel;
for (int i = 0; i < n_entries; ++i) {
int32_t len = 0;
in.read((char *) &len, sizeof(len));
std::vector<char> name_as_vec(len + 1);
in.read((char *) name_as_vec.data(), len);
if (in.fail()) {
LOG_ERR("%s: failed reading name for entry %d from %s\n", __func__, i + 1, fname);
return false;
}
name_as_vec[len] = 0;
std::string name{ name_as_vec.data() };
auto & e = m_stats[std::move(name)];
int32_t ncall = 0;
in.read((char *) &ncall, sizeof(ncall));
int32_t nval = 0;
in.read((char *) &nval, sizeof(nval));
if (in.fail() || nval < 1) {
LOG_ERR("%s: failed reading number of values for entry %d\n", __func__, i);
m_stats = {};
return false;
}
if (e.values.empty()) {
e.values.resize(nval, 0.0f);
e.counts.resize(1, 0);
}
std::vector<float> tmp(nval);
in.read((char *) tmp.data(), nval * sizeof(float));
if (in.fail()) {
LOG_ERR("%s: failed reading data for entry %d\n", __func__, i);
m_stats = {};
return false;
}
// Recreate the state as expected by save_imatrix(), and correct for weighted sum.
for (int i = 0; i < nval; i++) {
e.values[i] += tmp[i] * chunk_size;
}
// The legacy format doesn't distinguish the counts for different experts
for (size_t j = 0; j < e.counts.size(); ++j) {
e.counts[j] += ncall * chunk_size;
}
}
{
// TODO: extract into its own method; this is also used by the GGUF-based format
// Calculate the last chunk count
int64_t max_count = 0;
for (const auto & stats : m_stats) {
for (int64_t count : stats.second.counts) {
if (count > max_count) {
max_count = count;
}
}
}
m_last_chunk = max_count / (chunk_size);
}
{
// Read the number of calls the matrix was computed with
int32_t n_calls;
in.read((char *) &n_calls, sizeof(n_calls));
// ignore it because it's not important
}
// Read the dataset path to include it when writing to GGUF
if (!in.fail()){
int32_t len = 0;
in.read((char *) &len, sizeof(len));
if (!in.fail()) {
std::vector<char> dataset;
dataset.resize(len + 1, 0);
in.read(dataset.data(), len);
if (!in.fail()) {
m_datasets.push_back(dataset.data());
}
}
}
return true;
}
// Using GGUF as the file format, for greater extensibility
bool IMatrixCollector::load_imatrix(const char * file_name) {
struct ggml_context * ctx = nullptr;
struct gguf_init_params meta_gguf_params = {
/* .no_alloc = */ false, // the data is needed
/* .ctx = */ &ctx,
};
struct gguf_context * ctx_gguf = gguf_init_from_file(file_name, meta_gguf_params);
if (!ctx_gguf) {
return this->load_imatrix_legacy(file_name);
}
const int32_t n_entries = gguf_get_n_tensors(ctx_gguf);
if (n_entries < 1) {
LOG_ERR("%s: no data in file %s\n", __func__, file_name);
gguf_free(ctx_gguf);
ggml_free(ctx);
common_imatrix loaded;
if (!common_imatrix_load(file_name, loaded)) {
return false;
}
const int64_t datasets_key = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_DATASETS);
if (datasets_key != -1 && gguf_get_arr_type(ctx_gguf, datasets_key) == GGUF_TYPE_STRING) {
const int64_t n = gguf_get_arr_n(ctx_gguf, datasets_key);
m_datasets.reserve(m_datasets.size() + n);
for (int64_t i = 0; i < n; ++i) {
m_datasets.push_back(gguf_get_arr_str(ctx_gguf, datasets_key, i));
}
}
const std::string in_sum2_suffix{ ".in_sum2" };
const std::string counts_suffix{ ".counts" };
// Could re-use m_stats instead, but this allows
// checking for completeness of *each* loaded imatrix file
// and also makes it easier to re-use a similar implementation in quantize.cpp
// Using an ordered map to get a deterministic iteration order.
std::map<std::string, std::pair<struct ggml_tensor *, struct ggml_tensor *>> sums_counts_for;
for (struct ggml_tensor * cur = ggml_get_first_tensor(ctx); cur; cur = ggml_get_next_tensor(ctx, cur)) {
std::string name = cur->name;
if (name.empty()) { continue; }
if (string_remove_suffix(name, in_sum2_suffix)) {
// in_sum2
sums_counts_for[std::move(name)].first = cur;
} else if (string_remove_suffix(name, counts_suffix)) {
// counts
sums_counts_for[std::move(name)].second = cur;
} else {
// ignore other tensors
}
}
for (const auto & sc : sums_counts_for) {
const std::string & name = sc.first;
const struct ggml_tensor * in_sum2 = sc.second.first;
const struct ggml_tensor * counts = sc.second.second;
if (!in_sum2 || !counts) {
LOG_ERR("%s: mismatched sums and counts for %s\n", __func__, name.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
return false;
}
const int32_t chunk_size = m_params.n_ctx / m_params.n_parallel;
const bool is_legacy = loaded.is_legacy;
for (auto & [name, entry] : loaded.entries) {
auto & e = m_stats[name];
int64_t nval = ggml_nelements(in_sum2);
if (e.values.empty()) {
e.values.resize(nval, 0.0f);
} else if ((size_t) nval != e.values.size()) {
LOG_ERR("%s: mismatched sums size for %s: %zu != %zu\n", __func__, name.c_str(), (size_t) nval, e.values.size());
gguf_free(ctx_gguf);
ggml_free(ctx);
return false;
}
if (is_legacy) {
// Legacy format: sums contain (raw_sum/raw_count)*ncall, counts contain {ncall}
// Reconstruct raw form by multiplying by chunk_size
if (e.values.empty()) {
e.values.resize(entry.sums.size(), 0.0f);
e.counts.resize(1, 0);
}
for (size_t j = 0; j < entry.sums.size(); ++j) {
e.values[j] += entry.sums[j] * chunk_size;
}
for (size_t j = 0; j < e.counts.size(); ++j) {
e.counts[j] += entry.counts[0] * chunk_size;
}
} else {
// GGUF format: raw sums and counts, accumulate directly
const int64_t nval = entry.sums.size();
const int64_t ncounts = entry.counts.size();
int64_t ncounts = ggml_nelements(counts);
if (e.counts.empty()) {
e.counts.resize(ncounts, 0);
} else if (e.counts.size() == 1 && ncounts > 1) {
// broadcast, when loading an old imatrix
e.counts.resize(ncounts, e.counts[0]);
} else if ((size_t) ncounts != e.counts.size()) {
LOG_ERR("%s: mismatched counts size for %s: %zu != %zu\n", __func__, name.c_str(), (size_t) ncounts, e.counts.size());
gguf_free(ctx_gguf);
ggml_free(ctx);
return false;
}
if (e.values.empty()) {
e.values.resize(nval, 0.0f);
} else if ((size_t) nval != e.values.size()) {
LOG_ERR("%s: mismatched sums size for %s: %zu != %zu\n", __func__, name.c_str(), (size_t) nval, e.values.size());
return false;
}
// Recreate the state as expected by save_imatrix()
for (int64_t j = 0; j < nval; j++) {
e.values[j] += ((const float *) in_sum2->data)[j];
}
for (int64_t j = 0; j < ncounts; j++) {
e.counts[j] += std::lround(((const float *) counts->data)[j]);
if (e.counts.empty()) {
e.counts.resize(ncounts, 0);
} else if (e.counts.size() == 1 && ncounts > 1) {
e.counts.resize(ncounts, e.counts[0]);
} else if ((size_t) ncounts != e.counts.size()) {
LOG_ERR("%s: mismatched counts size for %s: %zu != %zu\n", __func__, name.c_str(), (size_t) ncounts, e.counts.size());
return false;
}
for (int64_t j = 0; j < nval; ++j) {
e.values[j] += entry.sums[j];
}
for (int64_t j = 0; j < ncounts; ++j) {
e.counts[j] += entry.counts[j];
}
}
}
// TODO: extract into its own method; this is also used by the legacy format
m_datasets.insert(m_datasets.end(), loaded.datasets.begin(), loaded.datasets.end());
// Calculate the last chunk count
int64_t max_count = 0;
for (const auto & stats : m_stats) {
@@ -831,10 +686,8 @@ bool IMatrixCollector::load_imatrix(const char * file_name) {
}
}
}
m_last_chunk = max_count / (m_params.n_ctx / m_params.n_parallel);
m_last_chunk = max_count / chunk_size;
gguf_free(ctx_gguf);
ggml_free(ctx);
return true;
}
@@ -1218,6 +1071,9 @@ int main(int argc, char ** argv) {
return 1;
}
// set_params before show_statistics so load_imatrix has valid n_ctx/n_parallel
g_collector.set_params(params);
if (params.show_statistics) {
if (!show_statistics(params)) {
return 1;

View File

@@ -2,6 +2,7 @@
#include "build-info.h"
#include "common.h"
#include "imatrix-loader.h"
#include "gguf.h"
@@ -14,7 +15,6 @@
#include <vector>
#include <string>
#include <unordered_map>
#include <map>
#include <fstream>
#include <filesystem>
@@ -78,11 +78,6 @@ static const char * const LLM_KV_QUANTIZE_IMATRIX_DATASET = "quantize.imatrix
static const char * const LLM_KV_QUANTIZE_IMATRIX_N_ENTRIES = "quantize.imatrix.entries_count";
static const char * const LLM_KV_QUANTIZE_IMATRIX_N_CHUNKS = "quantize.imatrix.chunks_count";
// TODO: share with imatrix.cpp
static const char * const LLM_KV_IMATRIX_DATASETS = "imatrix.datasets";
static const char * const LLM_KV_IMATRIX_CHUNK_COUNT = "imatrix.chunk_count";
static const char * const LLM_KV_IMATRIX_CHUNK_SIZE = "imatrix.chunk_size";
static bool striequals(const char * a, const char * b) {
while (*a && *b) {
if (std::tolower(*a) != std::tolower(*b)) {
@@ -181,184 +176,84 @@ static void usage(const char * executable) {
exit(1);
}
static int load_legacy_imatrix(const std::string & imatrix_file, std::vector<std::string> & imatrix_datasets, std::unordered_map<std::string, std::vector<float>> & imatrix_data) {
std::ifstream in(imatrix_file.c_str(), std::ios::binary);
if (!in) {
printf("%s: failed to open %s\n",__func__, imatrix_file.c_str());
exit(1);
}
int n_entries;
in.read((char *)&n_entries, sizeof(n_entries));
if (in.fail() || n_entries < 1) {
printf("%s: no data in file %s\n", __func__, imatrix_file.c_str());
exit(1);
}
for (int i = 0; i < n_entries; ++i) {
int len; in.read((char *)&len, sizeof(len));
std::vector<char> name_as_vec(len+1);
in.read((char *)name_as_vec.data(), len);
if (in.fail()) {
printf("%s: failed reading name for entry %d from %s\n", __func__, i+1, imatrix_file.c_str());
exit(1);
}
name_as_vec[len] = 0;
std::string name{name_as_vec.data()};
auto & e = imatrix_data[name];
int ncall;
in.read((char *)&ncall, sizeof(ncall));
int nval;
in.read((char *)&nval, sizeof(nval));
if (in.fail() || nval < 1) {
printf("%s: failed reading number of values for entry %d\n", __func__, i);
imatrix_data = {};
exit(1);
}
e.resize(nval);
in.read((char *)e.data(), nval*sizeof(float));
if (in.fail()) {
printf("%s: failed reading data for entry %d\n", __func__, i);
imatrix_data = {};
exit(1);
}
if (ncall > 0) {
for (auto & v : e) {
v /= ncall;
}
}
if (getenv("LLAMA_TRACE")) {
printf("%s: loaded data (size = %6d, ncall = %6d) for '%s'\n", __func__, int(e.size()), ncall, name.c_str());
}
}
// latest legacy imatrix version contains the dataset filename at the end of the file
int m_last_call = 0;
if (in.peek() != EOF) {
in.read((char *)&m_last_call, sizeof(m_last_call));
int dataset_len;
in.read((char *)&dataset_len, sizeof(dataset_len));
std::vector<char> dataset_as_vec(dataset_len);
in.read(dataset_as_vec.data(), dataset_len);
imatrix_datasets.resize(1);
imatrix_datasets[0].assign(dataset_as_vec.begin(), dataset_as_vec.end());
printf("%s: imatrix dataset='%s'\n", __func__, imatrix_datasets[0].c_str());
}
printf("%s: loaded %d importance matrix entries from %s computed on %d chunks\n", __func__, int(imatrix_data.size()), imatrix_file.c_str(), m_last_call);
return m_last_call;
}
static int load_imatrix(const std::string & imatrix_file, std::vector<std::string> & imatrix_datasets, std::unordered_map<std::string, std::vector<float>> & imatrix_data) {
struct ggml_context * ctx = nullptr;
struct gguf_init_params meta_gguf_params = {
/* .no_alloc = */ false, // the data is needed
/* .ctx = */ &ctx,
};
struct gguf_context * ctx_gguf = gguf_init_from_file(imatrix_file.c_str(), meta_gguf_params);
if (!ctx_gguf) {
fprintf(stderr, "%s: imatrix file '%s' is using old format\n", __func__, imatrix_file.c_str());
return load_legacy_imatrix(imatrix_file, imatrix_datasets, imatrix_data);
}
const int32_t n_entries = gguf_get_n_tensors(ctx_gguf);
if (n_entries < 1) {
fprintf(stderr, "%s: no data in file %s\n", __func__, imatrix_file.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
common_imatrix loaded;
if (!common_imatrix_load(imatrix_file, loaded)) {
fprintf(stderr, "%s: failed to load imatrix from '%s'\n", __func__, imatrix_file.c_str());
exit(1);
}
const int dataset_idx = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_DATASETS);
const int chunk_count_idx = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_CHUNK_COUNT);
const int chunk_size_idx = gguf_find_key(ctx_gguf, LLM_KV_IMATRIX_CHUNK_SIZE);
if (dataset_idx < 0 || chunk_count_idx < 0 || chunk_size_idx < 0) {
if (!loaded.is_legacy && !loaded.has_metadata) {
fprintf(stderr, "%s: missing imatrix metadata in file %s\n", __func__, imatrix_file.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
exit(1);
}
const uint32_t chunk_size = gguf_get_val_u32(ctx_gguf, chunk_size_idx);
const std::string sums_suffix{ ".in_sum2" };
const std::string counts_suffix{ ".counts" };
// Using an ordered map to get a deterministic iteration order.
std::map<std::string, std::pair<struct ggml_tensor *, struct ggml_tensor *>> sums_counts_for;
for (struct ggml_tensor * cur = ggml_get_first_tensor(ctx); cur; cur = ggml_get_next_tensor(ctx, cur)) {
std::string name = cur->name;
if (name.empty()) { continue; }
if (string_remove_suffix(name, sums_suffix)) {
// in_sum2
sums_counts_for[std::move(name)].first = cur;
} else if (string_remove_suffix(name, counts_suffix)) {
// counts
sums_counts_for[std::move(name)].second = cur;
} else {
// ignore other tensors
}
}
for (const auto & sc : sums_counts_for) {
const std::string & name = sc.first;
const struct ggml_tensor * sums = sc.second.first;
const struct ggml_tensor * counts = sc.second.second;
if (!sums || !counts) {
fprintf(stderr, "%s: mismatched sums and counts for %s\n", __func__, name.c_str());
gguf_free(ctx_gguf);
ggml_free(ctx);
exit(1);
}
const int64_t ne0 = sums->ne[0];
const int64_t ne1 = sums->ne[1];
for (const auto & [name, entry] : loaded.entries) {
auto & e = imatrix_data[name];
e.resize(ggml_nelements(sums));
float max_count = 0.0f;
for (int64_t j = 0; j < ne1; ++j) {
const float count = ((const float *) counts->data)[j];
if (count > 0.0f) {
for (int64_t i = 0; i < ne0; ++i) {
e[j*ne0 + i] = ((const float *) sums->data)[j*ne0 + i] / count;
e.resize(entry.sums.size());
if (!loaded.is_legacy) {
// GGUF format: normalize by per-expert counts
const int64_t ncounts = entry.counts.size();
const int64_t ne0 = (int64_t) entry.sums.size() / ncounts;
for (int64_t j = 0; j < ncounts; ++j) {
const float count = (float) entry.counts[j];
if (count > 0.0f) {
for (int64_t i = 0; i < ne0; ++i) {
e[j*ne0 + i] = entry.sums[j*ne0 + i] / count;
}
} else {
for (int64_t i = 0; i < ne0; ++i) {
e[j*ne0 + i] = 1;
}
}
}
if (getenv("LLAMA_TRACE")) {
float max_count = 0.0f;
for (int64_t j = 0; j < ncounts; ++j) {
const float count = (float) entry.counts[j];
if (count > max_count) {
max_count = count;
}
}
printf("%s: loaded data (size = %6d, n_tokens = %6d, n_chunks = %6d) for '%s'\n",
__func__, int(e.size()), int(max_count), int(max_count / loaded.chunk_size), name.c_str());
}
} else {
// Legacy format: sums contain (raw/count)*ncall, divide by ncall
const int64_t ncall = entry.counts.empty() ? 0 : entry.counts[0];
if (ncall > 0) {
for (size_t i = 0; i < entry.sums.size(); ++i) {
e[i] = entry.sums[i] / ncall;
}
} else {
// Partial imatrix data, this tensor never got any input during calibration
for (int64_t i = 0; i < ne0; ++i) {
e[j*ne0 + i] = 1;
for (size_t i = 0; i < entry.sums.size(); ++i) {
e[i] = entry.sums[i];
}
}
if (count > max_count) {
max_count = count;
if (getenv("LLAMA_TRACE")) {
printf("%s: loaded data (size = %6d, ncall = %6d) for '%s'\n",
__func__, int(e.size()), int(ncall), name.c_str());
}
}
if (getenv("LLAMA_TRACE")) {
printf("%s: loaded data (size = %6d, n_tokens = %6d, n_chunks = %6d) for '%s'\n", __func__, int(e.size()), int(max_count), int(max_count / chunk_size), name.c_str());
}
imatrix_datasets = std::move(loaded.datasets);
if (!imatrix_datasets.empty()) {
printf("%s: imatrix datasets=['%s'", __func__, imatrix_datasets[0].c_str());
for (size_t i = 1; i < imatrix_datasets.size(); ++i) {
printf(", '%s'", imatrix_datasets[i].c_str());
}
printf("]\n");
}
int m_last_chunk = gguf_get_val_u32(ctx_gguf, chunk_count_idx);
printf("%s: loaded %d importance matrix entries from %s computed on %d chunks\n", __func__, int(imatrix_data.size()), imatrix_file.c_str(), loaded.chunk_count);
int64_t n_datasets = gguf_get_arr_n(ctx_gguf, dataset_idx);
imatrix_datasets.reserve(n_datasets);
for (int64_t i = 0; i < n_datasets; ++i) {
imatrix_datasets.push_back(gguf_get_arr_str(ctx_gguf, dataset_idx, i));
}
printf("%s: imatrix datasets=['%s'", __func__, imatrix_datasets[0].c_str());
for (size_t i = 1; i < imatrix_datasets.size(); ++i) {
printf(", '%s'", imatrix_datasets[i].c_str());
}
printf("]\n");
printf("%s: loaded %d importance matrix entries from %s computed on %d chunks\n", __func__, int(imatrix_data.size()), imatrix_file.c_str(), m_last_chunk);
gguf_free(ctx_gguf);
ggml_free(ctx);
return m_last_chunk;
return loaded.chunk_count;
}
static int prepare_imatrix(const std::string & imatrix_file,

View File

@@ -2512,7 +2512,7 @@ private:
llama_memory_seq_pos_max(llama_get_memory(ctx_tgt), slot.id));
if (use_ckpt_dft) {
slot.spec_ckpt.update_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
slot.spec_ckpt.update_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
}
slot.spec_prompt = slot.prompt.tokens.get_text_tokens();
@@ -2551,7 +2551,7 @@ private:
if (ctx_dft) {
if (use_ckpt_dft) {
ckpt.load_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
}
common_context_seq_rm(ctx_dft.get(), slot.id, ckpt.pos_max + 1, -1);
@@ -2568,7 +2568,7 @@ private:
if (use_ckpt_tgt) {
//const int64_t t_start = ggml_time_us();
ckpt.update_tgt(ctx_tgt, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.update_tgt(ctx_tgt, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
//const int64_t t_total = ggml_time_us() - t_start;
//printf("checkpoint total: %f ms\n", t_total / 1000.0);
@@ -2580,7 +2580,7 @@ private:
}
if (use_ckpt_dft) {
ckpt.update_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.update_dft(ctx_dft.get(), slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
}
}
}
@@ -3447,13 +3447,13 @@ private:
SLT_DBG(slot, "restoring speculative checkpoint (pos_min = %d, pos_max = %d, size = %zu)\n", ckpt.pos_min, ckpt.pos_max, ckpt.size());
{
ckpt.load_tgt(slot.ctx_tgt, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_tgt(slot.ctx_tgt, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
common_context_seq_rm(slot.ctx_tgt, slot.id, ckpt.pos_max + 1, -1);
}
if (slot.ctx_dft) {
ckpt.load_dft(slot.ctx_dft, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY | LLAMA_STATE_SEQ_FLAGS_ON_DEVICE);
ckpt.load_dft(slot.ctx_dft, slot.id, LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY);
common_context_seq_rm(slot.ctx_dft, slot.id, ckpt.pos_max + 1, -1);
}

File diff suppressed because it is too large Load Diff

View File

@@ -23,75 +23,77 @@
"cleanup": "rm -rf .svelte-kit build node_modules test-results"
},
"devDependencies": {
"@chromatic-com/storybook": "^5.0.0",
"@eslint/compat": "^1.2.5",
"@eslint/js": "^9.18.0",
"@internationalized/date": "^3.10.1",
"@lucide/svelte": "^0.515.0",
"@playwright/test": "^1.49.1",
"@storybook/addon-a11y": "^10.2.4",
"@storybook/addon-docs": "^10.2.4",
"@storybook/addon-svelte-csf": "^5.0.10",
"@storybook/addon-vitest": "^10.2.4",
"@storybook/sveltekit": "^10.2.4",
"@sveltejs/adapter-static": "^3.0.10",
"@sveltejs/kit": "^2.48.4",
"@sveltejs/vite-plugin-svelte": "^6.2.1",
"@tailwindcss/forms": "^0.5.9",
"@tailwindcss/typography": "^0.5.15",
"@tailwindcss/vite": "^4.0.0",
"@chromatic-com/storybook": "5.0.0",
"@eslint/compat": "1.4.1",
"@eslint/js": "9.39.2",
"@internationalized/date": "3.10.1",
"@lucide/svelte": "0.515.0",
"@modelcontextprotocol/sdk": "1.26.0",
"@playwright/test": "1.56.1",
"@storybook/addon-a11y": "10.2.4",
"@storybook/addon-docs": "10.2.4",
"@storybook/addon-svelte-csf": "5.0.10",
"@storybook/addon-vitest": "10.2.4",
"@storybook/sveltekit": "10.2.4",
"@sveltejs/adapter-static": "3.0.10",
"@sveltejs/kit": "2.60.1",
"@sveltejs/vite-plugin-svelte": "6.2.1",
"@tailwindcss/forms": "0.5.10",
"@tailwindcss/typography": "0.5.16",
"@tailwindcss/vite": "4.1.11",
"@types/node": "^24",
"@vitest/browser": "^3.2.3",
"@vitest/coverage-v8": "^3.2.3",
"bits-ui": "^2.14.4",
"clsx": "^2.1.1",
"dexie": "^4.0.11",
"eslint": "^9.18.0",
"eslint-config-prettier": "^10.0.1",
"eslint-plugin-storybook": "^10.2.4",
"eslint-plugin-svelte": "^3.0.0",
"globals": "^16.0.0",
"http-server": "^14.1.1",
"mdast": "^3.0.0",
"mdsvex": "^0.12.3",
"playwright": "^1.56.1",
"prettier": "^3.4.2",
"prettier-plugin-svelte": "^3.3.3",
"prettier-plugin-tailwindcss": "^0.6.11",
"rehype-katex": "^7.0.1",
"remark-math": "^6.0.0",
"sass": "^1.93.3",
"storybook": "^10.2.4",
"svelte": "^5.38.2",
"svelte-check": "^4.0.0",
"tailwind-merge": "^3.3.1",
"tailwind-variants": "^3.2.2",
"tailwindcss": "^4.0.0",
"tw-animate-css": "^1.3.5",
"typescript": "^5.0.0",
"typescript-eslint": "^8.20.0",
"unified": "^11.0.5",
"uuid": "^13.0.0",
"vite": "^7.2.2",
"vite-plugin-devtools-json": "^0.2.0",
"vitest": "^3.2.3",
"vitest-browser-svelte": "^0.1.0"
"@vitest/browser": "4.1.8",
"@vitest/browser-playwright": "4.1.8",
"@vitest/coverage-v8": "4.1.8",
"bits-ui": "2.18.1",
"clsx": "2.1.1",
"dexie": "4.0.11",
"eslint": "9.39.2",
"eslint-config-prettier": "10.1.8",
"eslint-plugin-storybook": "10.2.4",
"eslint-plugin-svelte": "3.15.0",
"globals": "16.3.0",
"highlight.js": "11.11.1",
"http-server": "14.1.1",
"mdast": "3.0.0",
"mdsvex": "0.12.6",
"mermaid": "11.15.0",
"mode-watcher": "1.1.0",
"pdfjs-dist": "5.4.54",
"playwright": "1.56.1",
"prettier": "3.6.2",
"prettier-plugin-svelte": "3.4.0",
"prettier-plugin-tailwindcss": "0.6.14",
"rehype-highlight": "7.0.2",
"rehype-katex": "7.0.1",
"rehype-stringify": "10.0.1",
"remark": "15.0.1",
"remark-breaks": "4.0.0",
"remark-gfm": "4.0.1",
"remark-html": "16.0.1",
"remark-math": "6.0.0",
"remark-rehype": "11.1.2",
"sass": "1.93.3",
"storybook": "10.3.3",
"svelte": "5.55.7",
"svelte-check": "4.3.0",
"svelte-sonner": "1.0.5",
"tailwind-merge": "3.3.1",
"tailwind-variants": "3.2.2",
"tailwindcss": "4.1.11",
"tw-animate-css": "1.3.5",
"typescript": "5.8.3",
"typescript-eslint": "8.56.0",
"unified": "11.0.5",
"unist-util-visit": "5.0.0",
"uuid": "13.0.2",
"vite": "7.3.2",
"vite-plugin-devtools-json": "0.2.1",
"vitest": "4.1.8",
"vitest-browser-svelte": "2.1.1",
"zod": "4.2.1"
},
"dependencies": {
"@modelcontextprotocol/sdk": "^1.25.1",
"highlight.js": "^11.11.1",
"mermaid": "^11.15.0",
"mode-watcher": "^1.1.0",
"pdfjs-dist": "^5.4.54",
"rehype-highlight": "^7.0.2",
"rehype-stringify": "^10.0.1",
"remark": "^15.0.1",
"remark-breaks": "^4.0.0",
"remark-gfm": "^4.0.1",
"remark-html": "^16.0.1",
"remark-rehype": "^11.1.2",
"svelte-sonner": "^1.0.5",
"unist-util-visit": "^5.0.0",
"zod": "^4.2.1"
"overrides": {
"cookie": "1.1.1"
}
}

View File

@@ -35,23 +35,27 @@
<Tooltip.Root>
<Tooltip.Trigger>
<Button
{variant}
{size}
{disabled}
onclick={(e: MouseEvent) => {
if (stopPropagationOnClick) e.stopPropagation();
<!-- prevent another nested button element -->
{#snippet child({ props })}
<Button
{...props}
{variant}
{size}
{disabled}
onclick={(e: MouseEvent) => {
if (stopPropagationOnClick) e.stopPropagation();
onclick?.(e);
}}
class="h-6 w-6 p-0 {className} flex hover:bg-transparent data-[state=open]:bg-transparent!"
aria-label={ariaLabel || tooltip}
>
{#if icon}
{@const IconComponent = icon}
<IconComponent class={iconSize} />
{/if}
</Button>
onclick?.(e);
}}
class="h-6 w-6 p-0 {className} flex hover:bg-transparent data-[state=open]:bg-transparent!"
aria-label={ariaLabel || tooltip}
>
{#if icon}
{@const IconComponent = icon}
<IconComponent class={iconSize} />
{/if}
</Button>
{/snippet}
</Tooltip.Trigger>
<Tooltip.Content side={tooltipSide}>

View File

@@ -1,22 +1,22 @@
<script lang="ts">
import type { Snippet } from 'svelte';
import type { HTMLButtonAttributes } from 'svelte/elements';
interface Props {
interface Props extends HTMLButtonAttributes {
children: Snippet;
class?: string;
icon?: Snippet;
onclick?: () => void;
}
let { children, class: className = '', icon, onclick }: Props = $props();
let { children, class: className = '', icon, ...rest }: Props = $props();
</script>
<button
{...rest}
class={[
'inline-flex cursor-pointer items-center gap-1 rounded-sm bg-muted-foreground/15 px-1.5 py-0.75',
className
]}
{onclick}
>
{#if icon}
{@render icon()}

View File

@@ -97,7 +97,9 @@
{/snippet}
{#snippet removeButton()}
<div class="absolute top-2 right-2 opacity-0 transition-opacity group-hover:opacity-100">
<div
class="absolute top-2 right-2 opacity-0 transition-opacity group-focus-within:opacity-100 group-hover:opacity-100"
>
<ActionIcon icon={X} tooltip="Remove" stopPropagationOnClick onclick={() => onRemove?.(id)} />
</div>
{/snippet}

View File

@@ -51,7 +51,7 @@
{#if !readonly}
<div
class="absolute top-1 right-1 flex items-center justify-center opacity-0 transition-opacity group-hover:opacity-100"
class="absolute top-1 right-1 flex items-center justify-center opacity-0 transition-opacity group-focus-within:opacity-100 group-hover:opacity-100"
>
<ActionIcon
class="text-white"

View File

@@ -31,7 +31,8 @@
agenticPendingPermissionRequest,
agenticResolvePermission,
agenticPendingContinueRequest,
agenticResolveContinue
agenticResolveContinue,
agenticLastError
} from '$lib/stores/agentic.svelte';
import { config } from '$lib/stores/settings.svelte';
@@ -56,6 +57,10 @@
const showToolCallInProgress = $derived(config().showToolCallInProgress as boolean);
const showThoughtInProgress = $derived(config().showThoughtInProgress as boolean);
const hasReasoningError = $derived(
isLastAssistantMessage ? !!agenticLastError(message.convId) : false
);
let permissionDismissed = $state(false);
const pendingPermission = $derived(
@@ -293,11 +298,21 @@
</div>
</CollapsibleContentBlock>
{:else if section.type === AgenticSectionType.REASONING}
{@const reasoningSubtitle = section.wasInterrupted
? hasReasoningError
? 'Error'
: 'Cancelled'
: isStreaming
? ''
: undefined}
<CollapsibleContentBlock
open={isExpanded(index, section)}
class="my-2"
icon={Brain}
title="Reasoning"
subtitle={reasoningSubtitle}
rawContent={section.content}
onToggle={() => toggleExpanded(index, section)}
>
<div class="pt-3">
@@ -308,7 +323,7 @@
</CollapsibleContentBlock>
{:else if section.type === AgenticSectionType.REASONING_PENDING}
{@const reasoningTitle = isStreaming ? 'Reasoning...' : 'Reasoning'}
{@const reasoningSubtitle = isStreaming ? '' : 'incomplete'}
{@const reasoningSubtitle = isStreaming ? '' : hasReasoningError ? 'Error' : 'Cancelled'}
<CollapsibleContentBlock
open={isExpanded(index, section)}
@@ -316,6 +331,7 @@
icon={Brain}
title={reasoningTitle}
subtitle={reasoningSubtitle}
rawContent={section.content}
{isStreaming}
onToggle={() => toggleExpanded(index, section)}
>

View File

@@ -6,6 +6,7 @@
import type { ChatMessageAgenticTimings } from '$lib/types/chat';
import { formatPerformanceTime } from '$lib/utils';
import { MS_PER_SECOND, DEFAULT_PERFORMANCE_TIME } from '$lib/constants';
import type { Component } from 'svelte';
interface Props {
predictedTokens?: number;
@@ -114,101 +115,79 @@
let formattedAgenticTotalTime = $derived(formatPerformanceTime(agenticTotalTimeMs));
</script>
{#snippet viewButton(opts: {
view: ChatMessageStatsView;
icon: Component;
label: string;
tooltipText: string;
disabled?: boolean;
})}
{@const IconComponent = opts.icon}
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
<button
{...props}
type="button"
class="inline-flex h-5 w-5 items-center justify-center rounded-sm transition-colors {activeView ===
opts.view
? 'bg-background text-foreground shadow-sm'
: opts.disabled
? 'cursor-not-allowed opacity-40'
: 'hover:text-foreground'}"
onclick={() => !opts.disabled && (activeView = opts.view)}
disabled={opts.disabled}
>
<IconComponent class="h-3 w-3" />
<span class="sr-only">{opts.label}</span>
</button>
{/snippet}
</Tooltip.Trigger>
<Tooltip.Content>
<p>{opts.tooltipText}</p>
</Tooltip.Content>
</Tooltip.Root>
{/snippet}
<div class="inline-flex items-center text-xs text-muted-foreground">
<div class="inline-flex items-center rounded-sm bg-muted-foreground/15 p-0.5">
{#if hasPromptStats || isLive}
<Tooltip.Root>
<Tooltip.Trigger>
<button
type="button"
class="inline-flex h-5 w-5 items-center justify-center rounded-sm transition-colors {activeView ===
ChatMessageStatsView.READING
? 'bg-background text-foreground shadow-sm'
: 'hover:text-foreground'}"
onclick={() => (activeView = ChatMessageStatsView.READING)}
>
<BookOpenText class="h-3 w-3" />
<span class="sr-only">Reading</span>
</button>
</Tooltip.Trigger>
<Tooltip.Content>
<p>Reading (prompt processing)</p>
</Tooltip.Content>
</Tooltip.Root>
{@render viewButton({
view: ChatMessageStatsView.READING,
icon: BookOpenText,
label: 'Reading',
tooltipText: 'Reading (prompt processing)'
})}
{/if}
<Tooltip.Root>
<Tooltip.Trigger>
<button
type="button"
class="inline-flex h-5 w-5 items-center justify-center rounded-sm transition-colors {activeView ===
ChatMessageStatsView.GENERATION
? 'bg-background text-foreground shadow-sm'
: isGenerationDisabled
? 'cursor-not-allowed opacity-40'
: 'hover:text-foreground'}"
onclick={() => !isGenerationDisabled && (activeView = ChatMessageStatsView.GENERATION)}
disabled={isGenerationDisabled}
>
<Sparkles class="h-3 w-3" />
<span class="sr-only">Generation</span>
</button>
</Tooltip.Trigger>
<Tooltip.Content>
<p>
{isGenerationDisabled
? 'Generation (waiting for tokens...)'
: 'Generation (token output)'}
</p>
</Tooltip.Content>
</Tooltip.Root>
{@render viewButton({
view: ChatMessageStatsView.GENERATION,
icon: Sparkles,
label: 'Generation',
tooltipText: isGenerationDisabled
? 'Generation (waiting for tokens...)'
: 'Generation (token output)',
disabled: isGenerationDisabled
})}
{#if hasAgenticStats}
<Tooltip.Root>
<Tooltip.Trigger>
<button
type="button"
class="inline-flex h-5 w-5 items-center justify-center rounded-sm transition-colors {activeView ===
ChatMessageStatsView.TOOLS
? 'bg-background text-foreground shadow-sm'
: 'hover:text-foreground'}"
onclick={() => (activeView = ChatMessageStatsView.TOOLS)}
>
<Wrench class="h-3 w-3" />
<span class="sr-only">Tools</span>
</button>
</Tooltip.Trigger>
<Tooltip.Content>
<p>Tool calls</p>
</Tooltip.Content>
</Tooltip.Root>
{@render viewButton({
view: ChatMessageStatsView.TOOLS,
icon: Wrench,
label: 'Tools',
tooltipText: 'Tool calls'
})}
{#if !hideSummary}
<Tooltip.Root>
<Tooltip.Trigger>
<button
type="button"
class="inline-flex h-5 w-5 items-center justify-center rounded-sm transition-colors {activeView ===
ChatMessageStatsView.SUMMARY
? 'bg-background text-foreground shadow-sm'
: 'hover:text-foreground'}"
onclick={() => (activeView = ChatMessageStatsView.SUMMARY)}
>
<Layers class="h-3 w-3" />
<span class="sr-only">Summary</span>
</button>
</Tooltip.Trigger>
<Tooltip.Content>
<p>Agentic summary</p>
</Tooltip.Content>
</Tooltip.Root>
{@render viewButton({
view: ChatMessageStatsView.SUMMARY,
icon: Layers,
label: 'Summary',
tooltipText: 'Agentic summary'
})}
{/if}
{/if}
</div>

View File

@@ -21,13 +21,16 @@
{#if tooltipLabel}
<Tooltip.Root>
<Tooltip.Trigger>
<BadgeInfo class={className} onclick={handleClick}>
{#snippet icon()}
<IconComponent class="h-3 w-3" />
{/snippet}
<!-- prevent another nested button element -->
{#snippet child({ props })}
<BadgeInfo {...props} class={className} onclick={handleClick}>
{#snippet icon()}
<IconComponent class="h-3 w-3" />
{/snippet}
{value}
</BadgeInfo>
{value}
</BadgeInfo>
{/snippet}
</Tooltip.Trigger>
<Tooltip.Content>
<p>{tooltipLabel}</p>

View File

@@ -41,16 +41,13 @@
});
</script>
<div
class="pointer-events-{show
? 'auto'
: 'none'} relative z-50 mx-auto mb-4 flex max-w-[48rem] justify-center"
>
<div class="relative z-50 mx-auto mb-4 flex max-w-[48rem] justify-center">
<Button
onclick={scrollToBottom}
variant="secondary"
size="icon"
class="pointer-events-all absolute h-10 w-10 rounded-full bg-background/80 shadow-lg backdrop-blur-sm transition-all duration-200 hover:bg-muted/80"
disabled={!show}
class="pointer-events-auto absolute h-10 w-10 rounded-full bg-background/80 shadow-lg backdrop-blur-sm transition-all duration-200 hover:bg-muted/80"
style="bottom: {buttonBottom}; transform: translateY({show ? '0' : '2rem'}); opacity: {show
? 1
: 0};"

View File

@@ -4,6 +4,9 @@
import { buttonVariants } from '$lib/components/ui/button/index.js';
import { Card } from '$lib/components/ui/card';
import { createAutoScrollController } from '$lib/hooks/use-auto-scroll.svelte';
import { useThrottle } from '$lib/hooks/use-throttle.svelte';
import { formatReasoningPreview } from '$lib/utils';
import { config } from '$lib/stores/settings.svelte';
import type { Snippet } from 'svelte';
import type { Component } from 'svelte';
@@ -14,6 +17,8 @@
iconClass?: string;
title: string;
subtitle?: string;
preview?: string;
rawContent?: string;
isStreaming?: boolean;
onToggle?: () => void;
children: Snippet;
@@ -26,6 +31,8 @@
iconClass = 'h-4 w-4',
title,
subtitle,
preview,
rawContent,
isStreaming = false,
onToggle,
children
@@ -33,6 +40,20 @@
let contentContainer: HTMLDivElement | undefined = $state();
const showThoughtInProgress = $derived(config().showThoughtInProgress as boolean);
let previewKey = useThrottle(() => rawContent ?? preview ?? '', 500);
let displayedPreview = $state('');
let displayedOverflow = $state(0);
$effect(() => {
void previewKey.key;
const content = rawContent ?? preview ?? '';
const result = formatReasoningPreview(content);
displayedPreview = result.preview;
displayedOverflow = result.overflow;
});
const autoScroll = createAutoScrollController();
$effect(() => {
@@ -58,16 +79,31 @@
class={className}
>
<Card class="gap-0 border-muted bg-muted/30 py-0">
<Collapsible.Trigger class="flex w-full cursor-pointer items-center justify-between p-3">
<div class="flex items-center gap-2 text-muted-foreground">
{#if IconComponent}
<IconComponent class={iconClass} />
{/if}
<Collapsible.Trigger class="flex w-full cursor-pointer items-start justify-between gap-2 p-3">
<div class="flex min-w-0 items-center gap-2">
<div class="flex items-center gap-2 text-muted-foreground">
{#if IconComponent}
<IconComponent class={iconClass} />
{/if}
<span class="font-mono text-sm font-medium">{title}</span>
<span class="font-mono text-sm font-medium">{title}</span>
{#if subtitle}
<span class="text-xs italic">{subtitle}</span>
{#if subtitle}
<span class="text-xs italic">{subtitle}</span>
{/if}
</div>
{#if displayedPreview && !showThoughtInProgress}
<div class="flex min-w-0 items-baseline justify-between gap-2">
<div class="w-3/4 truncate text-xs text-muted-foreground/80">
{displayedPreview}
</div>
{#if displayedOverflow > 0}
<span class="shrink-0 text-xs text-muted-foreground/60"
>{displayedOverflow}+ chars</span
>
{/if}
</div>
{/if}
</div>

View File

@@ -55,20 +55,20 @@
}
$effect(() => {
if (scrollContainer) {
setTimeout(() => {
updateScrollButtons();
}, 0);
}
if (!scrollContainer) return;
const observer = new ResizeObserver(() => updateScrollButtons());
observer.observe(scrollContainer);
return () => observer.disconnect();
});
</script>
<div class="relative {className}">
<button
class="absolute top-1/2 left-4 z-10 flex h-6 w-6 -translate-y-1/2 items-center justify-center rounded-full bg-background/25 shadow-md backdrop-blur-xs transition-opacity hover:bg-background/45 {canScrollLeft
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
class="absolute top-1/2 left-4 z-10 flex h-6 w-6 -translate-y-1/2 items-center justify-center rounded-full bg-background/25 shadow-md backdrop-blur-xs transition-opacity hover:bg-background/45 disabled:pointer-events-none disabled:opacity-0"
onclick={scrollLeft}
disabled={!canScrollLeft}
aria-label="Scroll left"
>
<ChevronLeft class="h-4 w-4" />
@@ -83,10 +83,9 @@
</div>
<button
class="absolute top-1/2 right-4 z-10 flex h-6 w-6 -translate-y-1/2 items-center justify-center rounded-full bg-background/25 shadow-md backdrop-blur-xs transition-opacity hover:bg-background/45 {canScrollRight
? 'opacity-100'
: 'pointer-events-none opacity-0'}"
class="absolute top-1/2 right-4 z-10 flex h-6 w-6 -translate-y-1/2 items-center justify-center rounded-full bg-background/25 shadow-md backdrop-blur-xs transition-opacity hover:bg-background/45 disabled:pointer-events-none disabled:opacity-0"
onclick={scrollRight}
disabled={!canScrollRight}
aria-label="Scroll right"
>
<ChevronRight class="h-4 w-4" />

View File

@@ -27,8 +27,8 @@
let shouldShow = $derived(model && (modelProp !== undefined || isModelMode));
</script>
{#snippet badgeContent()}
<BadgeInfo class={className} {onclick}>
{#snippet badgeContent(triggerProps?: Record<string, unknown>)}
<BadgeInfo {...triggerProps ?? {}} class={className} {onclick}>
{#snippet icon()}
<Package class="h-3 w-3" />
{/snippet}
@@ -47,7 +47,10 @@
{#if showTooltip}
<Tooltip.Root>
<Tooltip.Trigger>
{@render badgeContent()}
<!-- prevent another nested button element -->
{#snippet child({ props })}
{@render badgeContent(props)}
{/snippet}
</Tooltip.Trigger>
<Tooltip.Content>

View File

@@ -116,52 +116,54 @@
{#if ms.isRouter}
<DropdownMenu.Root bind:open={isOpen} onOpenChange={ms.handleOpenChange}>
<DropdownMenu.Trigger
class={[
`inline-flex cursor-pointer items-center gap-1.5 rounded-sm bg-background px-1.5 py-1 text-xs shadow-sm transition hover:bg-muted-foreground/20 focus:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-60 dark:bg-muted-foreground/15 dark:text-secondary-foreground`,
!ms.isCurrentModelInCache
? 'bg-red-400/10 !text-red-400 hover:bg-red-400/20 hover:text-red-400'
: forceForegroundText
? 'text-foreground'
: ms.isHighlightedCurrentModelActive
? 'text-foreground'
: 'text-foreground',
isOpen && 'text-foreground',
'max-w-[min(calc(100vw-4rem) md:max-w-[min(calc(100cqw-9rem),25rem)]'
]}
disabled={disabled || ms.updating}
>
<Package class="h-3.5 w-3.5 shrink-0" />
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
<DropdownMenu.Trigger
{...props}
class={[
`inline-grid cursor-pointer grid-cols-[1fr_auto_1fr] items-center gap-1.5 rounded-sm bg-background px-1.5 py-1 text-xs shadow-sm transition hover:bg-muted-foreground/20 focus:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-60 dark:bg-muted-foreground/15 dark:text-secondary-foreground`,
!ms.isCurrentModelInCache
? 'bg-red-400/10 !text-red-400 hover:bg-red-400/20 hover:text-red-400'
: forceForegroundText
? 'text-foreground'
: ms.isHighlightedCurrentModelActive
? 'text-foreground'
: 'text-foreground',
isOpen && 'text-foreground',
'max-w-[min(calc(100vw-4rem) md:max-w-[min(calc(100cqw-9rem),25rem)]'
]}
disabled={disabled || ms.updating}
>
<Package class="h-3.5 w-3.5 shrink-0" />
{#if selectedOption}
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
{#if selectedOption}
<ModelId
modelId={selectedOption.model}
class="min-w-0 overflow-hidden"
hideOrgName={false}
hideQuantization
{...props}
/>
{/snippet}
</Tooltip.Trigger>
{:else}
<span class="min-w-0 font-medium">Select model</span>
{/if}
<Tooltip.Content>
<p class="font-mono">{selectedOption.model}</p>
</Tooltip.Content>
</Tooltip.Root>
{:else}
<span class="min-w-0 font-medium">Select model</span>
{/if}
{#if ms.updating || ms.isLoadingModel}
<Loader2 class="h-3 w-3.5 shrink-0 animate-spin" />
{:else}
<ChevronDown class="h-3 w-3.5 shrink-0" />
{/if}
</DropdownMenu.Trigger>
{/snippet}
</Tooltip.Trigger>
{#if ms.updating || ms.isLoadingModel}
<Loader2 class="h-3 w-3.5 shrink-0 animate-spin" />
{:else}
<ChevronDown class="h-3 w-3.5 shrink-0" />
{#if selectedOption}
<Tooltip.Content>
<p class="font-mono">{selectedOption.model}</p>
</Tooltip.Content>
{/if}
</DropdownMenu.Trigger>
</Tooltip.Root>
<DropdownMenu.Content
align="end"
@@ -234,49 +236,51 @@
</DropdownMenu.Content>
</DropdownMenu.Root>
{:else}
<button
class={[
`inline-flex cursor-pointer items-center gap-1.5 rounded-sm bg-background px-1.5 py-1 text-xs shadow-sm transition hover:bg-muted-foreground/20 focus:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-60 dark:bg-muted-foreground/15 dark:text-secondary-foreground`,
!ms.isCurrentModelInCache
? 'bg-red-400/10 !text-red-400 hover:bg-red-400/20 hover:text-red-400'
: forceForegroundText
? 'text-foreground'
: ms.isHighlightedCurrentModelActive
? 'text-foreground'
: 'text-foreground',
isOpen && 'text-foreground'
]}
style="max-width: min(calc(100cqw - 6.5rem), 32rem)"
onclick={() => ms.handleOpenChange(true)}
disabled={disabled || ms.updating}
>
<Package class="h-3.5 w-3.5 shrink-0" />
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
<button
{...props}
class={[
`inline-flex cursor-pointer items-center gap-1.5 rounded-sm bg-background px-1.5 py-1 text-xs shadow-sm transition hover:bg-muted-foreground/20 focus:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-60 dark:bg-muted-foreground/15 dark:text-secondary-foreground`,
!ms.isCurrentModelInCache
? 'bg-red-400/10 !text-red-400 hover:bg-red-400/20 hover:text-red-400'
: forceForegroundText
? 'text-foreground'
: ms.isHighlightedCurrentModelActive
? 'text-foreground'
: 'text-foreground',
isOpen && 'text-foreground'
]}
style="max-width: min(calc(100cqw - 6.5rem), 32rem)"
onclick={() => ms.handleOpenChange(true)}
disabled={disabled || ms.updating}
>
<Package class="h-3.5 w-3.5 shrink-0" />
{#if selectedOption}
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
{#if selectedOption}
<ModelId
modelId={selectedOption.model}
class="min-w-0 overflow-hidden"
hideOrgName={false}
hideQuantization
{...props}
/>
{/snippet}
</Tooltip.Trigger>
{/if}
<Tooltip.Content>
<p class="font-mono">{selectedOption.model}</p>
</Tooltip.Content>
</Tooltip.Root>
{/if}
{#if ms.updating}
<Loader2 class="h-3 w-3.5 shrink-0 animate-spin" />
{/if}
</button>
{/snippet}
</Tooltip.Trigger>
{#if ms.updating}
<Loader2 class="h-3 w-3.5 shrink-0 animate-spin" />
{#if selectedOption}
<Tooltip.Content>
<p class="font-mono">{selectedOption.model}</p>
</Tooltip.Content>
{/if}
</button>
</Tooltip.Root>
{/if}
{/if}
</div>

View File

@@ -34,24 +34,28 @@
</script>
<DropdownMenu.Root bind:open>
<DropdownMenu.Trigger
class="flex h-6 w-6 cursor-pointer items-center justify-center rounded-md p-0 text-sm font-medium transition-colors hover:bg-accent hover:text-accent-foreground focus:bg-accent focus:text-accent-foreground focus:outline-none disabled:pointer-events-none disabled:opacity-50 data-[state=open]:bg-accent data-[state=open]:text-accent-foreground {triggerClass}"
onclick={(e) => e.stopPropagation()}
>
{#if triggerTooltip}
<Tooltip.Root>
<Tooltip.Trigger>
<Tooltip.Root>
<Tooltip.Trigger>
<!-- prevent another nested button element -->
{#snippet child({ props })}
<DropdownMenu.Trigger
{...props}
class="flex h-6 w-6 cursor-pointer items-center justify-center rounded-md p-0 text-sm font-medium transition-colors hover:bg-accent hover:text-accent-foreground focus:bg-accent focus:text-accent-foreground focus:outline-none disabled:pointer-events-none disabled:opacity-50 data-[state=open]:bg-accent data-[state=open]:text-accent-foreground {triggerClass}"
onclick={(e) => e.stopPropagation()}
>
{@render iconComponent(triggerIcon, 'h-3 w-3')}
<span class="sr-only">{triggerTooltip}</span>
</Tooltip.Trigger>
<Tooltip.Content>
<p>{triggerTooltip}</p>
</Tooltip.Content>
</Tooltip.Root>
{:else}
{@render iconComponent(triggerIcon, 'h-3 w-3')}
{#if triggerTooltip}
<span class="sr-only">{triggerTooltip}</span>
{/if}
</DropdownMenu.Trigger>
{/snippet}
</Tooltip.Trigger>
{#if triggerTooltip}
<Tooltip.Content>
<p>{triggerTooltip}</p>
</Tooltip.Content>
{/if}
</DropdownMenu.Trigger>
</Tooltip.Root>
<DropdownMenu.Content {align} class="z-[999999] w-48">
{#each actions as action, index (action.label)}

View File

@@ -105,6 +105,12 @@
onclick={handleSelect}
onmouseover={handleMouseOver}
onmouseleave={handleMouseLeave}
onfocusin={handleMouseOver}
onfocusout={(e) => {
if (!e.currentTarget.contains(e.relatedTarget as Node | null)) {
handleMouseLeave();
}
}}
>
<div
class="flex min-w-0 flex-1 items-center gap-2"
@@ -113,12 +119,16 @@
{#if depth > 0}
<Tooltip.Root>
<Tooltip.Trigger>
<a
href={RouterService.chat(conversation.forkedFromConversationId)}
class="flex shrink-0 items-center text-muted-foreground transition-colors hover:text-foreground"
>
<GitBranch class="h-3.5 w-3.5" />
</a>
<!-- prevent another nested button element -->
{#snippet child({ props })}
<a
{...props}
href={RouterService.chat(conversation.forkedFromConversationId)}
class="flex shrink-0 items-center text-muted-foreground transition-colors hover:text-foreground"
>
<GitBranch class="h-3.5 w-3.5" />
</a>
{/snippet}
</Tooltip.Trigger>
<Tooltip.Content>
@@ -195,7 +205,8 @@
opacity: 0;
}
&:is(:hover) :global([data-slot='dropdown-menu-trigger']) {
&:is(:hover) :global([data-slot='dropdown-menu-trigger']),
&:focus-within :global([data-slot='dropdown-menu-trigger']) {
opacity: 1;
}
@media (max-width: 768px) {

View File

@@ -6,3 +6,30 @@ export const MEDIUM_DURATION_THRESHOLD = 10;
/** Default display value when no performance time is available */
export const DEFAULT_PERFORMANCE_TIME = '0s';
/** Max length before reasoning preview is truncated */
export const MAX_PREVIEW_LENGTH = 120;
export const STRIP_MARKDOWN_CAPTURE_PATTERNS: [RegExp, string][] = [
[/^```(.*)/gm, '$1'],
[/(.*)```$/gm, '$1'],
[/`([^`]*)`/g, '$1'],
[/\*\*(.*?)\*\*/g, '$1'],
[/__(.*?)__/g, '$1'],
[/\*(.*?)\*/g, '$1'],
[/_(.*?)_/g, '$1']
];
/* eslint-disable no-misleading-character-class */
export const STRIP_MARKDOWN_INLINE_REGEX = new RegExp(
[
'<[^>]*>',
'^>\\s*',
'^#{1,6}\\s+',
'^[\\s]*[-*+]\\s+',
'^[\\s]*\\d+[.)]\\s+',
'[\\u{1F600}-\\u{1F64F}\\u{1F300}-\\u{1F5FF}\\u{1F680}-\\u{1F6FF}\\u{1F1E0}-\\u{1F1FF}\\u{2600}-\\u{26FF}\\u{2700}-\\u{27BF}\\u{FE00}-\\u{FE0F}\\u{1F900}-\\u{1F9FF}\\u{1FA00}-\\u{1FA6F}\\u{1FA70}-\\u{1FAFF}\\u{200D}\\u{20E3}\\u{231A}-\\u{231B}\\u{23E9}-\\u{23F3}\\u{23F8}-\\u{23FA}\\u{25AA}-\\u{25AB}\\u{25B6}\\u{25C0}\\u{25FB}-\\u{25FE}\\u{2934}-\\u{2935}\\u{2B05}-\\u{2B07}\\u{2B1B}-\\u{2B1C}\\u{2B50}\\u{2B55}\\u{3030}\\u{303D}\\u{3297}\\u{3299}]'
].join('|'),
'gmu'
);
/* eslint-enable no-misleading-character-class */

View File

@@ -0,0 +1,32 @@
/**
* Creates a reactive throttle key that increments when `getValue()` changes
* and the throttle window has elapsed since the last increment.
*
* Useful for throttling animations that should not fire on every rapid update.
*
* @param getValue - A reactive getter for the value to watch
* @param ms - Throttle window in milliseconds
* @returns A reactive number that increments when the throttled value changes
*/
export function useThrottle(getValue: () => string | undefined, ms: number) {
let key = $state(0);
let throttleEnd = $state(0);
let lastValue: string | undefined = getValue();
$effect(() => {
const value = getValue();
if (value === lastValue) return;
const now = Date.now();
if (now >= throttleEnd) {
lastValue = value;
key++;
throttleEnd = now + ms;
}
});
return {
get key() {
return key;
}
};
}

View File

@@ -18,6 +18,7 @@ export interface AgenticSection {
toolArgs?: string;
toolResult?: string;
toolResultExtras?: DatabaseMessageExtra[];
wasInterrupted?: boolean;
}
/**
@@ -51,7 +52,8 @@ function deriveSingleTurnSections(
const isPending = isStreaming && !hasContentAfterReasoning;
sections.push({
type: isPending ? AgenticSectionType.REASONING_PENDING : AgenticSectionType.REASONING,
content: message.reasoningContent
content: message.reasoningContent,
wasInterrupted: !isStreaming && !hasContentAfterReasoning
});
}

View File

@@ -3,7 +3,11 @@ import {
SECONDS_PER_MINUTE,
SECONDS_PER_HOUR,
SHORT_DURATION_THRESHOLD,
MEDIUM_DURATION_THRESHOLD
MEDIUM_DURATION_THRESHOLD,
MAX_PREVIEW_LENGTH,
STRIP_MARKDOWN_INLINE_REGEX,
STRIP_MARKDOWN_CAPTURE_PATTERNS,
NEWLINE_SEPARATOR
} from '$lib/constants';
/**
@@ -151,3 +155,33 @@ export function formatAttachmentText(
const header = extra ? `${name} (${extra})` : name;
return `\n\n--- ${label}: ${header} ---\n${content}`;
}
export function formatReasoningPreview(content: string): { preview: string; overflow: number } {
if (!content) return { preview: '', overflow: 0 };
const lines = content.split(NEWLINE_SEPARATOR);
let lastLine = '';
for (let i = lines.length - 1; i >= 0; i--) {
let cleaned = lines[i].trim();
if (!cleaned) continue;
cleaned = cleaned.replace(STRIP_MARKDOWN_INLINE_REGEX, '');
for (const [pattern, replacement] of STRIP_MARKDOWN_CAPTURE_PATTERNS) {
cleaned = cleaned.replace(pattern, replacement);
}
if (cleaned.length > 0) {
lastLine = cleaned;
break;
}
}
const fullLength = lastLine.length;
const overflow = Math.max(0, fullLength - MAX_PREVIEW_LENGTH);
if (fullLength > MAX_PREVIEW_LENGTH) {
lastLine = lastLine.slice(0, MAX_PREVIEW_LENGTH) + '...';
}
return { preview: lastLine, overflow };
}

View File

@@ -76,7 +76,8 @@ export {
formatJsonPretty,
formatTime,
formatPerformanceTime,
formatAttachmentText
formatAttachmentText,
formatReasoningPreview
} from './formatters';
// IME utilities

View File

@@ -58,10 +58,12 @@
name="Default"
play={async () => {
const { conversationsStore } = await import('$lib/stores/conversations.svelte');
waitFor(() => setTimeout(() => {
conversationsStore.conversations = mockConversations;
}, 0));
waitFor(() =>
setTimeout(() => {
conversationsStore.conversations = mockConversations;
}, 0)
);
}}
>
<Sidebar.Provider bind:open={sidebarOpen}>
@@ -76,11 +78,13 @@
name="SearchActive"
play={async ({ userEvent }) => {
const { conversationsStore } = await import('$lib/stores/conversations.svelte');
waitFor(() => setTimeout(() => {
conversationsStore.conversations = mockConversations;
}, 0));
waitFor(() =>
setTimeout(() => {
conversationsStore.conversations = mockConversations;
}, 0)
);
const searchTrigger = screen.getByText('Search');
userEvent.click(searchTrigger);
}}

View File

@@ -0,0 +1,34 @@
<script module lang="ts">
import { defineMeta } from '@storybook/addon-svelte-csf';
import { Copy } from '@lucide/svelte';
import ActionIcon from '$lib/components/app/actions/ActionIcon.svelte';
import { expect } from 'storybook/test';
const { Story } = defineMeta({
title: 'Components/ActionIcon/Accessibility',
component: ActionIcon,
parameters: {
layout: 'centered'
},
tags: ['!dev']
});
</script>
<Story
asChild
name="SingleTabStop"
play={async ({ canvas, userEvent }) => {
const before = await canvas.findByRole('button', { name: 'before' });
const target = await canvas.findByRole('button', { name: 'Copy' });
before.focus();
await userEvent.tab();
await expect(target).toHaveFocus();
}}
>
<div>
<button type="button">before</button>
<ActionIcon icon={Copy} tooltip="Copy" onclick={() => {}} />
</div>
</Story>

View File

@@ -0,0 +1,50 @@
<script module lang="ts">
import { defineMeta } from '@storybook/addon-svelte-csf';
import ChatMessageStatistics from '$lib/components/app/chat/ChatMessages/ChatMessageStatistics/ChatMessageStatistics.svelte';
import { expect } from 'storybook/test';
const { Story } = defineMeta({
title: 'Components/ChatMessageStatistics/Accessibility',
component: ChatMessageStatistics,
parameters: {
layout: 'centered'
},
tags: ['!dev']
});
</script>
<Story
name="ViewButtonsSingleTabStop"
args={{
promptTokens: 100,
promptMs: 500,
predictedTokens: 200,
predictedMs: 1000,
agenticTimings: {
turns: 1,
toolCallsCount: 1,
toolsMs: 500,
llm: { predicted_n: 200, predicted_ms: 1000, prompt_n: 100, prompt_ms: 500 }
},
hideSummary: false,
isLive: false
}}
play={async ({ canvas, userEvent }) => {
const reading = await canvas.findByRole('button', { name: 'Reading' });
const generation = await canvas.findByRole('button', { name: 'Generation' });
const tools = await canvas.findByRole('button', { name: 'Tools' });
const summary = await canvas.findByRole('button', { name: 'Summary' });
reading.focus();
await expect(reading).toHaveFocus();
await userEvent.tab();
await expect(generation).toHaveFocus();
await userEvent.tab();
await expect(tools).toHaveFocus();
await userEvent.tab();
await expect(summary).toHaveFocus();
}}
/>

View File

@@ -0,0 +1,69 @@
<script module lang="ts">
import { defineMeta } from '@storybook/addon-svelte-csf';
import HorizontalScrollCarousel from '$lib/components/app/misc/HorizontalScrollCarousel.svelte';
import { expect, waitFor } from 'storybook/test';
const { Story } = defineMeta({
title: 'Components/HorizontalScrollCarousel/Accessibility',
component: HorizontalScrollCarousel,
parameters: {
layout: 'centered'
},
tags: ['!dev']
});
</script>
<Story
asChild
name="ArrowsNotInTabOrderWhenNotScrollable"
play={async ({ canvas, userEvent }) => {
const before = await canvas.findByRole('button', { name: 'before' });
const after = await canvas.findByRole('button', { name: 'after' });
const leftArrow = await canvas.findByRole('button', { name: 'Scroll left' });
await waitFor(() => {
expect(leftArrow).toBeDisabled();
});
before.focus();
await userEvent.tab();
await expect(after).toHaveFocus();
}}
>
<div>
<button type="button">before</button>
<HorizontalScrollCarousel class="w-96">
<div class="h-12 w-12 shrink-0 bg-muted"></div>
<div class="h-12 w-12 shrink-0 bg-muted"></div>
</HorizontalScrollCarousel>
<button type="button">after</button>
</div>
</Story>
<Story
asChild
name="ArrowsInTabOrderWhenScrollable"
play={async ({ canvas, userEvent }) => {
const before = await canvas.findByRole('button', { name: 'before' });
const rightArrow = await canvas.findByRole('button', { name: 'Scroll right' });
await waitFor(() => {
expect(rightArrow).not.toBeDisabled();
});
before.focus();
await userEvent.tab();
await expect(rightArrow).toHaveFocus();
}}
>
<div>
<button type="button">before</button>
<HorizontalScrollCarousel class="w-48">
{#each [...Array(20).keys()] as i (i)}
<div class="h-12 w-24 shrink-0 bg-muted">{i}</div>
{/each}
</HorizontalScrollCarousel>
</div>
</Story>

View File

@@ -0,0 +1,36 @@
<script module lang="ts">
import { defineMeta } from '@storybook/addon-svelte-csf';
import SidebarNavigationConversationItem from '$lib/components/app/navigation/SidebarNavigation/SidebarNavigationConversationItem.svelte';
import { expect } from 'storybook/test';
const mockForkedConversation: DatabaseConversation = {
id: 'conv-2',
name: 'Forked Conversation',
lastModified: Date.now(),
currNode: 'msg-2',
forkedFromConversationId: 'conv-1'
};
const { Story } = defineMeta({
title: 'Components/SidebarNavigationConversationItem/Accessibility',
component: SidebarNavigationConversationItem,
parameters: {
layout: 'centered'
},
tags: ['!dev']
});
</script>
<Story
name="ForkIconSingleTabStop"
args={{ conversation: mockForkedConversation, depth: 1 }}
play={async ({ canvas, userEvent }) => {
const row = await canvas.findByRole('button', { name: /Forked Conversation/ });
const forkIcon = await canvas.findByRole('link');
row.focus();
await userEvent.tab();
await expect(forkIcon).toHaveFocus();
}}
/>

View File

@@ -7,11 +7,23 @@ import { defineConfig, searchForWorkspaceRoot } from 'vite';
import devtoolsJson from 'vite-plugin-devtools-json';
import { storybookTest } from '@storybook/addon-vitest/vitest-plugin';
import { llamaCppBuildPlugin } from './scripts/vite-plugin-llama-cpp-build';
import { playwright } from '@vitest/browser-playwright';
const __dirname = dirname(fileURLToPath(import.meta.url));
const SERVER_ORIGIN = import.meta.env?.VITE_PUBLIC_SERVER_ORIGIN || 'http://localhost:8080';
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const browserBaseConfig: any = {
enabled: true,
provider: playwright({
launchOptions: {
args: ['--no-sandbox']
}
}),
instances: [{ browser: 'chromium' }]
};
export default defineConfig({
resolve: {
alias: {
@@ -33,12 +45,7 @@ export default defineConfig({
extends: './vite.config.ts',
test: {
name: 'client',
environment: 'browser',
browser: {
enabled: true,
provider: 'playwright',
instances: [{ browser: 'chromium' }]
},
browser: browserBaseConfig,
include: ['tests/client/**/*.svelte.{test,spec}.{js,ts}'],
setupFiles: ['./vitest-setup-client.ts']
}
@@ -57,13 +64,7 @@ export default defineConfig({
extends: './vite.config.ts',
test: {
name: 'ui',
environment: 'browser',
browser: {
enabled: true,
provider: 'playwright',
instances: [{ browser: 'chromium', headless: true }]
},
include: ['tests/stories/**/*.stories.{js,ts,svelte}'],
browser: { ...browserBaseConfig, instances: [{ browser: 'chromium', headless: true }] },
setupFiles: ['./.storybook/vitest.setup.ts']
},
plugins: [