Support saving and loading 8-bit block weights by mryab · Pull Request #273 · bigscience-workshop/petals

mryab · 2023-02-25T15:35:24Z

This PR relies on bitsandbytes-foundation/bitsandbytes#159 and makes it possible to call convert_model with the int8 data type and later on download the 8-bit checkpoint instead of 16-bit if serving the model with load_in_8bit=True. This can save up to 2x bandwidth on starting a server, as shown by this comparison of model sizes for bloom-560m:

~/petals$ du -sh converted_model*
802M    converted_model
515M    converted_model_int8

The command that was used for conversion is python -m petals.cli.convert_model --model bigscience/bloom-560m --output_path ./converted_model_int8 --torch_dtype int8 --resize_token_embeddings 50000 --block_branch_prefix int8_block. To test that the checkpoint loads correctly, you need to install bitsandbytes from the branch in the PR above and run python -m petals.cli.run_server bigscience/test-bloomd --new_swarm --skip_reachability_check --throughput 100 --device cuda (pay attention that I had to change BLOCK_BRANCH_PREFIX in this branch for the sake of testing).

mryab · 2023-02-25T15:35:49Z

src/petals/bloom/from_pretrained.py


 CLIENT_BRANCH = "main"
-BLOCK_BRANCH_PREFIX = "block_"
+BLOCK_BRANCH_PREFIX = "int8_block"


We'll roll that back before merging

mryab · 2023-02-25T15:36:41Z

src/petals/bloom/from_pretrained.py

+    if load_in_8bit:
+        block = replace_8bit_linear(block)
+        block = block.to(device)


I moved replace_8bit_linear here because it's not possible to correctly load the quantized Linear8bitLt checkpoint into the model before it's converted and quantized

mryab · 2023-02-25T15:37:52Z

src/petals/utils/convert_block.py

-    from petals.utils.linear8bitlt_patch import CustomLinear8bitLt
-
    for n, module in model.named_children():
        if len(list(module.children())) > 0:
            replace_8bit_linear(module, threshold)

        if isinstance(module, torch.nn.Linear) and n not in ["lm_head", "score"]:
            assert module.weight.device.type == "cpu", f"expected linear layers on CPU, got {module.weight.device}"
-            model._modules[n] = CustomLinear8bitLt(
+            model._modules[n] = bnb.nn.Linear8bitLt(


Not strictly necessary, but it'd be good to get rid of all bitsandbytes-related code that got into upstream before merging this

Done in #297.

justheuristic

Gentle reminder: please update BNB before merging. This is not covered by tests

borzunov · 2023-06-06T21:38:33Z

src/petals/bloom/from_pretrained.py

    use_auth_token: Optional[str] = None,
    cache_dir: Optional[str] = None,
    max_disk_space: Optional[int] = None,
+    load_in_8bit=False,


Suggested change

load_in_8bit=False,

load_in_8bit: bool = False,

borzunov

Please defer this until #323 is merged, since it changes block loading code.

borzunov · 2023-08-03T00:33:23Z

We discussed that we may revive this feature for loading NF4-pre-quantized weights for Llama 2 and Stable Beluga 2.

mryab requested a review from justheuristic February 25, 2023 15:35

mryab commented Feb 25, 2023

View reviewed changes

justheuristic approved these changes Feb 26, 2023

View reviewed changes

justheuristic mentioned this pull request Feb 28, 2023

Update dependency versions #277

Closed

4 tasks

mryab added 4 commits April 18, 2023 14:27

Support saving and loading 8-bit block weights

b624f90

Fix formatting

d70019f

Set device_map only for int8

556f0fa

Remove load_in_8bit from convert_block

a610f4d

mryab force-pushed the download_8bit_weights branch from 56a3bee to a610f4d Compare April 18, 2023 12:30

borzunov reviewed Jun 6, 2023

View reviewed changes

borzunov requested changes Jun 8, 2023

View reviewed changes

borzunov force-pushed the main branch from dd9aa94 to ddcda02 Compare July 20, 2023 08:55

borzunov mentioned this pull request Aug 5, 2023

Swarm balancing logic issues #389

Open

ghost mentioned this pull request Aug 20, 2023

MeZo Forward Pass Implementation huggingface/peft#601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Support saving and loading 8-bit block weights#273

Support saving and loading 8-bit block weights#273
mryab wants to merge 4 commits intomainfrom
download_8bit_weights

mryab commented Feb 25, 2023

Uh oh!

mryab Feb 25, 2023

Uh oh!

mryab Feb 25, 2023

Uh oh!

mryab Feb 25, 2023

Uh oh!

borzunov Mar 29, 2023

Uh oh!

justheuristic left a comment •

edited

Loading

Uh oh!

borzunov Jun 6, 2023

Uh oh!

borzunov left a comment

Uh oh!

borzunov commented Aug 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

mryab commented Feb 25, 2023

Uh oh!

mryab Feb 25, 2023

Choose a reason for hiding this comment

Uh oh!

mryab Feb 25, 2023

Choose a reason for hiding this comment

Uh oh!

mryab Feb 25, 2023

Choose a reason for hiding this comment

Uh oh!

borzunov Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

borzunov Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

borzunov left a comment

Choose a reason for hiding this comment

Uh oh!

borzunov commented Aug 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

justheuristic left a comment •

edited

Loading