觅知网 AIPPT 的 .mz 数据链与离线转 PPTX（逆向记录）

背景#

觅知网对外提供 AIPPT 等能力：用户在浏览器里在线改稿时，页面背后维护的是一套 画布语义模型，落盘或接口里往往不是直接的 .pptx，而是自有的 .mz 载荷。
若要在本地用 PowerPoint 继续编辑，需要先把 在线侧拿到的 .mz 还原成 结构化 JSON，再映射到 Office 对象模型。

本文是个人 逆向与学习记录：只讨论 传输外壳、JSON 形态与转换实现，不涉及破解会员、绕过付费或批量爬取版权素材；素材与成稿的版权以平台协议为准。

分析过程#

1）在线修改时数据从哪来#

典型路径有两类（具体以你当时抓包为准）：

保存 / 同步接口：编辑器在「保存」「导出」或自动同步时，向服务端提交或拉取一段 文本型载荷（常见为 base64 或已是 JSON 片段），浏览器侧再拼进业务状态。
本地落盘：若你已把响应体或剪贴板内容存成文件，只要扩展名或内容特征能识别为 .mz，后续处理与「从 URL 下载再转」相同。

核心观察：.mz 不是 PPTX 的 ZIP 包，而是 「编码 + 压缩 + JSON」 的叠层；在线修改的本质是 反复读写这份 JSON 所描述的多页画板（boards）。

2）把 .mz 还原成 JSON#

文件常见两种入口：

整段 base64：解码得到 zlib / gzip 压缩的字节流，解压后为 UTF-8 JSON。实现上可对 zlib.decompress 的 wbits 尝试多种常见取值（如 47 / 31 / 15），直到 json.loads 成功。
已是 JSON：若文件去掉空白后以 { 开头，可直接 json.loads。

另有经验规则：若原始字节去掉空白后以 eJ 开头，多为 zlib 压缩后再 base64 的典型前缀（base64 对 zlib 魔数的编码），可按「先按 base64 解压管线走」处理。

还原后建议在本地 json.dump(..., indent=2, ensure_ascii=False) 存一份 .json，便于用 IDE 或 jq 对照字段，再写转换逻辑。

3）从 JSON 到 PPTX#

根对象里分页一般由 boards 驱动：每个 board 对应一页，含 bgColor、bgImage、elements 等。元素类型常见包括 img / font / shape / line / group；画布逻辑坐标多按 1280×720，再按比例映射到幻灯片宽高（例如 16:9 的英寸尺寸）。

实现要点简述：

group：递归叠加父级 position，展平后再画到幻灯片上。
文字：段落树（type: paragraph）+ text run，映射到 python-pptx 的段落与 run（字号、粗体、颜色、themeColors 命名色等）。
图片：JSON 里常为 相对路径，需与 CDN 根 URL 拼接；图床可能校验 Referer，仅改 User-Agent 往往不够。
旁路 1.txt：每行一个完整图片 URL，按「路径、去域名与 query」建索引，在拼接 URL 失败时 优先按列表顺序重试（例如带 imageView2 的七牛处理 URL）。

4）依赖与运行方式#

1
pip install python-pptx requests
2
python mz_to_pptx.py your.mz -o out.pptx
3
# 或：python mz_to_pptx.py --url 'https://.../xxx.mz'
4
# 图床前缀可多次 --image-base；同目录 1.txt 会自动作为图片 URL 列表

完整脚本（mz → pptx）#

下面为当前自用的 mz_to_pptx.py 全文：入口负责读本地或 --url 下载；load_mz 完成 .mz → dict（JSON 根对象）；convert 遍历 boards 写入 .pptx。若你实际站点的 CDN 或 Referer 与默认值不一致，请改 DEFAULT_IMAGE_BASES 与 argparse 中的 --referer 默认值。

1
#!/usr/bin/env python3
2
# -*- coding: utf-8 -*-
3
"""
4
将在线编辑器 .mz（base64 + zlib/gzip 包装的 JSON）转为可编辑的 .pptx。
5
图片在 JSON 中为相对路径（如 588552/ppt/media/image3.png），默认用 CDN 前缀拼接；
6
也可提供每行一个完整 URL 的列表文件（如 1.txt）优先尝试下载；失败则插入灰色占位框。
7
"""
8
from __future__ import annotations
9

10
import argparse
11
import base64
12
import io
13
import json
14
import re
15
import sys
16
import zlib
17
from pathlib import Path
18
from typing import Any, Dict, Iterable, List, Optional, Tuple
19
from urllib.parse import urljoin, urlparse
20

21
import requests
22
from pptx import Presentation
23
from pptx.dml.color import RGBColor
24
from pptx.enum.shapes import MSO_CONNECTOR, MSO_SHAPE
25
from pptx.enum.text import MSO_ANCHOR, MSO_AUTO_SIZE, PP_ALIGN
26
from pptx.util import Inches, Pt
27

28
DEFAULT_IMAGE_BASES = [
29
    "https://ppt-qn.molishe.com/",
30
    "https://imgs-qn.molishe.com/",
31
]
32

33
CANVAS_W = 1280.0
34
CANVAS_H = 720.0
35

36

37
def decompress_mz_bytes(raw: bytes) -> dict:
38
    if raw[:1] == b"{":
39
        return json.loads(raw.decode("utf-8"))
40
    b64 = raw.decode("ascii").strip()
41
    data = base64.b64decode(b64)
42
    for wbits in (47, 31, 15):
43
        try:
44
            return json.loads(zlib.decompress(data, wbits).decode("utf-8"))
45
        except Exception:
46
            continue
47
    raise ValueError("无法解压 .mz 内容（尝试 zlib/gzip 包装）")
48

49

50
def load_mz(path: Path) -> dict:
51
    raw = path.read_bytes()
52
    if raw.lstrip().startswith(b"eJ"):
53
        return decompress_mz_bytes(raw)
54
    try:
55
        return json.loads(raw.decode("utf-8"))
56
    except Exception:
57
        return decompress_mz_bytes(raw)
58

59

60
def theme_map(root: dict) -> Dict[str, str]:
61
    tc = root.get("themeColors") or {}
62
    m: Dict[str, str] = {}
63
    for k, v in tc.items():
64
        if isinstance(v, str) and re.fullmatch(r"[0-9A-Fa-f]{6}", v):
65
            m[k] = v.upper()
66
    return m
67

68

69
def resolve_color(
70
    value: str,
71
    theme: Dict[str, str],
72
) -> Optional[RGBColor]:
73
    if not value or value == "transparent":
74
        return None
75
    if re.fullmatch(r"[0-9A-Fa-f]{6}", value):
76
        h = value
77
        return RGBColor(int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
78
    if value in theme:
79
        h = theme[value]
80
        return RGBColor(int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
81
    return None
82

83

84
def iter_paragraphs(texts: Any) -> Iterable[dict]:
85
    if not isinstance(texts, list):
86
        return
87
    for p in texts:
88
        if isinstance(p, dict) and p.get("type") == "paragraph":
89
            yield p
90

91

92
def extract_runs(paragraph: dict) -> List[dict]:
93
    runs: List[dict] = []
94
    for ch in paragraph.get("children") or []:
95
        if isinstance(ch, dict) and ch.get("type") == "text":
96
            runs.append(ch)
97
        elif isinstance(ch, dict) and "text" in ch:
98
            runs.append(ch)
99
    return runs
100

101

102
def map_align(s: Optional[str]) -> int:
103
    if s == "center":
104
        return PP_ALIGN.CENTER
105
    if s == "right":
106
        return PP_ALIGN.RIGHT
107
    if s == "justify":
108
        return PP_ALIGN.JUSTIFY
109
    return PP_ALIGN.LEFT
110

111

112
def map_valign(s: Optional[str]) -> int:
113
    if s in ("mid", "middle", "center"):
114
        return MSO_ANCHOR.MIDDLE
115
    if s in ("down", "bottom"):
116
        return MSO_ANCHOR.BOTTOM
117
    return MSO_ANCHOR.TOP
118

119

120
def load_image_url_list_file(path: Path) -> Dict[str, List[str]]:
121
    """
122
    读取「每行一个图片 URL」的文件，按路径（不含域名、不含 query）建索引，
123
    例如 https://ppt-qn.molishe.com/588552/ppt/media/image3.png?... -> 键 588552/ppt/media/image3.png
124
    同一键下保留文件中的顺序并去重。
125
    """
126
    out: Dict[str, List[str]] = {}
127
    text = path.read_text(encoding="utf-8", errors="replace")
128
    for line in text.splitlines():
129
        line = line.strip()
130
        if not line or line.startswith("#"):
131
            continue
132
        parsed = urlparse(line)
133
        if parsed.scheme not in ("http", "https") or not parsed.netloc:
134
            continue
135
        key = parsed.path.lstrip("/")
136
        if not key:
137
            continue
138
        out.setdefault(key, []).append(line)
139
    for k in list(out.keys()):
140
        seen: set = set()
141
        uniq: List[str] = []
142
        for u in out[k]:
143
            if u not in seen:
144
                seen.add(u)
145
                uniq.append(u)
146
        out[k] = uniq
147
    return out
148

149

150
def try_fetch_image(
151
    src: str,
152
    bases: List[str],
153
    session: requests.Session,
154
    referer: Optional[str],
155
    url_list_by_path: Optional[Dict[str, List[str]]] = None,
156
) -> Optional[bytes]:
157
    if not src:
158
        return None
159
    urls: List[str] = []
160
    seen: set = set()
161

162
    def add(u: str) -> None:
163
        if u and u not in seen:
164
            seen.add(u)
165
            urls.append(u)
166

167
    if src.startswith("http://") or src.startswith("https://"):
168
        add(src)
169
    else:
170
        norm = src.split("?", 1)[0].lstrip("/")
171
        if url_list_by_path and norm in url_list_by_path:
172
            for u in url_list_by_path[norm]:
173
                add(u)
174
        for b in bases:
175
            add(urljoin(b.rstrip("/") + "/", src.lstrip("/")))
176
    headers = {
177
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
178
    }
179
    if referer:
180
        headers["Referer"] = referer
181
    for u in urls:
182
        try:
183
            r = session.get(u, headers=headers, timeout=20)
184
            if r.status_code == 200 and r.content[:8] not in (b"{", b"<!DOC"):
185
                ct = (r.headers.get("content-type") or "").lower()
186
                if "json" in ct:
187
                    continue
188
                return r.content
189
        except Exception:
190
            continue
191
    return None
192

193

194
def flatten_elements(
195
    elements: List[dict],
196
    ox: float = 0.0,
197
    oy: float = 0.0,
198
) -> List[Tuple[dict, float, float]]:
199
    out: List[Tuple[dict, float, float]] = []
200
    for el in elements or []:
201
        if not el.get("show", True):
202
            continue
203
        pos = el.get("position") or {}
204
        x = float(pos.get("x", 0)) + ox
205
        y = float(pos.get("y", 0)) + oy
206
        if el.get("type") == "group":
207
            inner = el.get("elements") or []
208
            out.extend(flatten_elements(inner, x, y))
209
        else:
210
            out.append((el, x, y))
211
    return out
212

213

214
def emu_xywh(
215
    slide_w: int,
216
    slide_h: int,
217
    x: float,
218
    y: float,
219
    w: float,
220
    h: float,
221
) -> Tuple[int, int, int, int]:
222
    left = int(x / CANVAS_W * slide_w)
223
    top = int(y / CANVAS_H * slide_h)
224
    width = max(1, int(w / CANVAS_W * slide_w))
225
    height = max(1, int(h / CANVAS_H * slide_h))
226
    return left, top, width, height
227

228

229
def apply_text_frame(
230
    text_frame: Any,
231
    texts: Any,
232
    theme: Dict[str, str],
233
    default_align: str = "left",
234
    default_valign: str = "up",
235
) -> None:
236
    text_frame.clear()
237
    text_frame.auto_size = MSO_AUTO_SIZE.NONE
238
    first = True
239
    paras = list(iter_paragraphs(texts))
240
    if not paras:
241
        p0 = text_frame.paragraphs[0]
242
        p0.text = ""
243
        text_frame.vertical_anchor = map_valign(default_valign)
244
        return
245
    for para in paras:
246
        p = text_frame.paragraphs[0] if first else text_frame.add_paragraph()
247
        first = False
248
        p.alignment = map_align(para.get("textAlign") or default_align)
249
        for run_data in extract_runs(para):
250
            run = p.add_run()
251
            run.text = str(run_data.get("text", ""))
252
            fs = run_data.get("fontSize")
253
            if isinstance(fs, (int, float)) and fs > 0:
254
                run.font.size = Pt(float(fs))
255
            fam = str(run_data.get("fontFamily") or "")
256
            if "Bold" in fam or str(run_data.get("fontWeight")).lower() == "bold":
257
                run.font.bold = True
258
            fc = run_data.get("fontColor")
259
            rgb = resolve_color(str(fc), theme) if fc else None
260
            if rgb is not None:
261
                run.font.color.rgb = rgb
262
            if fam:
263
                name = fam.split("-")[0] if "-" in fam else fam
264
                if name.startswith("SourceHan"):
265
                    run.font.name = "微软雅黑"
266
                else:
267
                    run.font.name = name[:31]
268
    text_frame.vertical_anchor = map_valign(default_valign)
269

270

271
def add_board(
272
    slide: Any,
273
    board: dict,
274
    theme: Dict[str, str],
275
    image_bases: List[str],
276
    session: requests.Session,
277
    referer: Optional[str],
278
    url_list_by_path: Optional[Dict[str, List[str]]] = None,
279
) -> None:
280
    prs = slide.part.package.presentation_part.presentation
281
    slide_w, slide_h = prs.slide_width, prs.slide_height
282

283
    bg = board.get("bgColor")
284
    if bg and bg != "transparent":
285
        rgb = resolve_color(bg, theme)
286
        if rgb is not None:
287
            try:
288
                fill = slide.background.fill
289
                fill.solid()
290
                fill.fore_color.rgb = rgb
291
            except Exception:
292
                pass
293

294
    bg_img = board.get("bgImage") or {}
295
    src = bg_img.get("src")
296
    if src:
297
        data = try_fetch_image(
298
            str(src),
299
            image_bases,
300
            session,
301
            referer,
302
            url_list_by_path,
303
        )
304
        if data:
305
            bio = io.BytesIO(data)
306
            slide.shapes.add_picture(bio, 0, 0, width=slide_w, height=slide_h)
307

308
    items = flatten_elements(board.get("elements") or [])
309

310
    for el, x, y in items:
311
        et = el.get("type")
312
        size = el.get("size") or {}
313
        w = float(size.get("width", 1))
314
        h = float(size.get("height", 1))
315
        left, top, width, height = emu_xywh(slide_w, slide_h, x, y, w, h)
316

317
        if et == "img":
318
            raw = try_fetch_image(
319
                str(el.get("src") or ""),
320
                image_bases,
321
                session,
322
                referer,
323
                url_list_by_path,
324
            )
325
            if raw:
326
                pic = slide.shapes.add_picture(io.BytesIO(raw), left, top, width=width, height=height)
327
                pic.rotation = float(el.get("rotate") or 0)
328
            else:
329
                shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
330
                shp.fill.solid()
331
                shp.fill.fore_color.rgb = RGBColor(230, 230, 230)
332
                shp.line.fill.background()
333
                tf = shp.text_frame
334
                tf.clear()
335
                p = tf.paragraphs[0]
336
                p.text = "[图片无法加载]"
337
                p.alignment = PP_ALIGN.CENTER
338
                tf.vertical_anchor = MSO_ANCHOR.MIDDLE
339
            continue
340

341
        if et == "line":
342
            ls = el.get("lineStyle") or {}
343
            lw = float(ls.get("lineWidth") or 1)
344
            color = resolve_color(str(ls.get("lineColor") or "000000"), theme)
345
            x2 = left + width
346
            y2 = top + height
347
            conn = slide.shapes.add_connector(MSO_CONNECTOR.STRAIGHT, left, top, x2, y2)
348
            ln = conn.line
349
            ln.width = Pt(max(0.25, lw * 72 / 96))
350
            if color is not None:
351
                ln.color.rgb = color
352
            conn.rotation = float(el.get("rotate") or 0)
353
            continue
354

355
        if et == "shape":
356
            dst = (el.get("defaultShape") or {}).get("type") or "rect"
357
            if dst == "rect" or dst == "custom":
358
                shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
359
            else:
360
                shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
361
            fill_rgb = resolve_color(str(el.get("color") or "transparent"), theme)
362
            if fill_rgb is not None:
363
                shp.fill.solid()
364
                shp.fill.fore_color.rgb = fill_rgb
365
            else:
366
                shp.fill.background()
367
            shp.line.fill.background()
368
            shp.rotation = float(el.get("rotate") or 0)
369
            tc = el.get("textContent")
370
            if tc and tc.get("show", True):
371
                texts = tc.get("texts")
372
                valign = tc.get("verticalAlign") or "up"
373
                apply_text_frame(
374
                    shp.text_frame,
375
                    texts,
376
                    theme,
377
                    default_valign=str(valign),
378
                )
379
            continue
380

381
        if et == "font":
382
            box = slide.shapes.add_textbox(left, top, width, height)
383
            box.rotation = float(el.get("rotate") or 0)
384
            fs = el.get("fontStyle") or {}
385
            align = fs.get("textAlign") or "left"
386
            valign = el.get("verticalAlign") or "up"
387
            apply_text_frame(
388
                box.text_frame,
389
                el.get("texts"),
390
                theme,
391
                default_align=str(align),
392
                default_valign=str(valign),
393
            )
394
            continue
395

396

397
def convert(
398
    mz_path: Path,
399
    out_path: Path,
400
    image_bases: List[str],
401
    referer: Optional[str],
402
    url_list_by_path: Optional[Dict[str, List[str]]] = None,
403
) -> None:
404
    root = load_mz(mz_path)
405
    boards = root.get("boards") or []
406
    if not boards:
407
        raise SystemExit("JSON 中无 boards，无法生成幻灯片")
408

409
    theme = theme_map(root)
410
    prs = Presentation()
411
    prs.slide_width = int(Inches(13.333))
412
    prs.slide_height = int(Inches(7.5))
413

414
    blank = prs.slide_layouts[6]
415
    session = requests.Session()
416

417
    for board in boards:
418
        slide = prs.slides.add_slide(blank)
419
        add_board(
420
            slide,
421
            board,
422
            theme,
423
            image_bases,
424
            session,
425
            referer,
426
            url_list_by_path,
427
        )
428

429
    out_path.parent.mkdir(parents=True, exist_ok=True)
430
    prs.save(str(out_path))
431
    print(f"已写入: {out_path}（共 {len(boards)} 页）")
432

433

434
def main() -> None:
435
    ap = argparse.ArgumentParser(description=".mz 转 .pptx")
436
    ap.add_argument(
437
        "input",
438
        type=Path,
439
        nargs="?",
440
        default=None,
441
        help=".mz 文件路径（与 --url 二选一）",
442
    )
443
    ap.add_argument(
444
        "--url",
445
        default=None,
446
        help="从该 URL 下载 .mz 再转换",
447
    )
448
    ap.add_argument("-o", "--output", type=Path, default=None, help="输出 .pptx 路径")
449
    ap.add_argument(
450
        "--image-base",
451
        action="append",
452
        default=[],
453
        help="图片相对路径的前缀 URL，可多次指定",
454
    )
455
    ap.add_argument(
456
        "--image-urls-file",
457
        type=Path,
458
        default=None,
459
        help="每行一个完整图片 URL 的文本文件（如 1.txt）；未指定时若与 .mz 同目录存在 1.txt 则自动加载",
460
    )
461
    ap.add_argument("--referer", default="https://www.molishe.com/", help="请求图片时的 Referer")
462
    args = ap.parse_args()
463

464
    if args.url:
465
        sess = requests.Session()
466
        r = sess.get(args.url, timeout=60)
467
        r.raise_for_status()
468
        tmp = Path.cwd() / "_mz_download_temp.mz"
469
        tmp.write_bytes(r.content)
470
        inp = tmp
471
        stem = Path(args.url.split("?", 1)[0]).stem
472
        out = args.output or (Path.cwd() / f"{stem}.pptx")
473
    else:
474
        if not args.input:
475
            ap.error("请提供 input 路径或使用 --url")
476
        inp = args.input.expanduser().resolve()
477
        out = args.output or inp.with_suffix(".pptx")
478

479
    bases = args.image_base if args.image_base else list(DEFAULT_IMAGE_BASES)
480

481
    url_list_path: Optional[Path] = None
482
    if args.image_urls_file is not None:
483
        url_list_path = args.image_urls_file.expanduser().resolve()
484
    else:
485
        sidecar = inp.resolve().parent / "1.txt"
486
        if sidecar.is_file():
487
            url_list_path = sidecar
488

489
    url_list_by_path: Optional[Dict[str, List[str]]] = None
490
    if url_list_path is not None and url_list_path.is_file():
491
        url_list_by_path = load_image_url_list_file(url_list_path)
492
        print(f"已加载图片 URL 列表: {url_list_path}（{len(url_list_by_path)} 个路径）")
493

494
    convert(
495
        inp,
496
        out.expanduser().resolve(),
497
        bases,
498
        args.referer or None,
499
        url_list_by_path,
500
    )
501
    if args.url:
502
        try:
503
            inp.unlink()
504
        except OSError:
505
            pass
506

507

508
if __name__ == "__main__":
509
    main()

合规与小结#

AIPPT 产出与模板素材的著作权、许可范围以觅知网及权利人声明为准；本文仅作格式与工程向记录，不鼓励未授权传播或商用爬取。
流水线可概括为：在线编辑 → 取得 .mz → 解压/解析为 JSON → 下载资源并按 boards 映射为 PPTX。若你也遇到类似「专有外壳 + JSON 语义树」的交付物，优先把 外壳规则、根对象分页字段、资源 URL 规则 固定下来，再写转换会省事很多。