Depth of auto-generated MSAs #39

max-overath · 2024-11-21T13:55:47Z

The MSAs that got generated for my predictions only contain a couple of sequences.
Is this due to limitations of the MMSeq2 sever or can this be adjusted?

jwohlwend · 2024-11-21T17:13:33Z

Yeah that d be due to the server. Maybe try this (https://zhanggroup.org/DeepMSA/) and see if you're getting a much deeper MSA? Maybe your sequence just doesn't have many homologues?

Overathed · 2024-11-21T19:39:55Z

Thanks for the answer! Yes when using the DeepMSA I get a much deeper MSA. However, it is also deeper when using colabfold which should also use a MMSeq2 server if I'm not mistaken?

jwohlwend · 2024-11-21T19:41:16Z

Would you mind sharing your input config? I can take a look

heol1 · 2024-11-21T21:56:20Z

Is the query sequence a hetero-multimeric protein? If so, I had the same issue.
In ColabFold, it queries MMseqs2 API twice: one for each chain and the other for the "pair" mode.
https://github.com/sokrypton/ColabFold/blob/e2ca9e8f992cd65c986de5b64885d5572d8b8ad9/colabfold/batch.py#L817-L857
In contrast, the current implementation of Boltz, compute_msa, calls the API only once for the "pair" mode.

boltz/src/boltz/main.py

Line 178 in e43f910

msa = run_mmseqs2(list(data.values()), msa_dir, use_pairing=len(data) > 1)

This might the reason why you have a shallow MSA...

jadolfbr · 2024-11-21T22:10:42Z

So should it do the multiples then? Does that match the benchmarking?

…

On Thu, Nov 21, 2024 at 4:56 PM Lim Heo ***@***.***> wrote: Is the query sequence a hetero-multimeric protein? If so, I had the same issue. In ColabFold, it queries MMseqs2 API twice: one for each chain and the other for the "pair" mode. https://github.com/sokrypton/ColabFold/blob/e2ca9e8f992cd65c986de5b64885d5572d8b8ad9/colabfold/batch.py#L817-L857 In contrast, the current implementation of Boltz, compute_msa, calls the API only once for the "pair" mode. https://github.com/jwohlwend/boltz/blob/e43f9101886ea6c290d6b1e0ada3796f0e798d88/src/boltz/main.py#L178 This might the reason why you have a shallow MSA... — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZDHRAN6EUFIP5T432MGG32BZJJVAVCNFSM6AAAAABSHCAH3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJSGQYTQOJRGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jwohlwend · 2024-11-21T22:13:10Z

Looking into it, will report back

max-overath · 2024-11-21T22:13:29Z

@heol1 yes exactly it's a hetero multimer. When I run the chains individually I get much deeper MSAs

max-overath · 2024-11-21T22:18:32Z

@jwohlwend input fasta for reference:

>A|protein
QVQLQESGGGLVQAGGSLRLSCAGSGDALGSYTMGWFRQAPGGGRDLVAQISVDGSSTYHLDSVRGRFTASRDNAKNTVYLEMNSLNSEDTAVYYCAAAPLLRGNYDYWGQGTQVTVSS
>B|protein
IRCFITPDITSKDCPNGHVCYTKTWCDAFCSIRGKRVDLGCAATCPTVKTGVDIQCCSTDNCNPFPTRKRP

amelie-iska · 2024-11-26T19:53:27Z

I'm having similar issues. I tried this:

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: A
      sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDGIKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI
  - protein:
      id: B
      sequence: MGTFEEVP

with the command:

boltz predict examples/multimer.yaml --recycling_steps 20  --diffusion_samples 10 --use_msa_server

and both MSAs for the individual, as well as the pair, are single sequences only. Using ColabFold, I get much deeper MSAs (and much better predictions).

paul-goldsmith · 2024-11-27T14:49:30Z

Just adding another voice to this - I'm also finding the auto-generated MSA to be very shallow (single sequence) using two proteins and the --use_msa_server flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depth of auto-generated MSAs #39

Depth of auto-generated MSAs #39

max-overath commented Nov 21, 2024

jwohlwend commented Nov 21, 2024

Overathed commented Nov 21, 2024

jwohlwend commented Nov 21, 2024

heol1 commented Nov 21, 2024

jadolfbr commented Nov 21, 2024 via email

jwohlwend commented Nov 21, 2024

max-overath commented Nov 21, 2024

max-overath commented Nov 21, 2024

amelie-iska commented Nov 26, 2024

paul-goldsmith commented Nov 27, 2024

Depth of auto-generated MSAs #39

Depth of auto-generated MSAs #39

Comments

max-overath commented Nov 21, 2024

jwohlwend commented Nov 21, 2024

Overathed commented Nov 21, 2024

jwohlwend commented Nov 21, 2024

heol1 commented Nov 21, 2024

jadolfbr commented Nov 21, 2024 via email

jwohlwend commented Nov 21, 2024

max-overath commented Nov 21, 2024

max-overath commented Nov 21, 2024

amelie-iska commented Nov 26, 2024

paul-goldsmith commented Nov 27, 2024