Conversation
Bumps [immutable](https://github.com/immutable-js/immutable-js) from 5.1.4 to 5.1.5. - [Release notes](https://github.com/immutable-js/immutable-js/releases) - [Changelog](https://github.com/immutable-js/immutable-js/blob/main/CHANGELOG.md) - [Commits](immutable-js/immutable-js@v5.1.4...v5.1.5) --- updated-dependencies: - dependency-name: immutable dependency-version: 5.1.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
add src/lib/agents-chart/core/color-decisions.ts, undated corresponding Echarts code
…ble-5.1.5 Bump immutable from 5.1.4 to 5.1.5
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.5.4 to 6.5.5. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](tornadoweb/tornado@v6.5.4...v6.5.5) --- updated-dependencies: - dependency-name: tornado dependency-version: 6.5.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [pyjwt](https://github.com/jpadilla/pyjwt) from 2.11.0 to 2.12.0. - [Release notes](https://github.com/jpadilla/pyjwt/releases) - [Changelog](https://github.com/jpadilla/pyjwt/blob/master/CHANGELOG.rst) - [Commits](jpadilla/pyjwt@2.11.0...2.12.0) --- updated-dependencies: - dependency-name: pyjwt dependency-version: 2.12.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
fix color setting of echarts and chart.js
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Remove user: "0:0" override in docker-compose.yml — the Dockerfile already creates /home/appuser/.data_formulator and chowns it to appuser before switching to USER appuser, so the override was causing the app to run as root and write to /root/.data_formulator, bypassing the mounted volume entirely. Pass --user with host uid:gid to docker run in DockerSandbox so the sandbox container UID matches the host user that created the bind-mounted output directory. Without this, the non-root sandbox user cannot write the output parquet file, silently breaking all Docker sandbox executions.
update colors problem
Bumps [pyasn1](https://github.com/pyasn1/pyasn1) from 0.6.2 to 0.6.3. - [Release notes](https://github.com/pyasn1/pyasn1/releases) - [Changelog](https://github.com/pyasn1/pyasn1/blob/main/CHANGES.rst) - [Commits](pyasn1/pyasn1@v0.6.2...v0.6.3) --- updated-dependencies: - dependency-name: pyasn1 dependency-version: 0.6.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
…K encoding support Implement cross-platform file encoding handling, including: 1. Add readFileText function on frontend to handle UTF-8 and GBK encodings
…ion logic - Add trusted encoding detection set, optimize GBK-first strategy - Add integration tests to verify Chinese CSV file processing - Improve encoding detection fallback chain, finally fallback to latin-1
…but different extensions Add test cases to verify the following scenarios: 1. Mismatch between frontend preview ID and backend workspace table name 2. Whether files with same name but different extensions can coexist after upload 3. Index mismatch between file array and table array
Explicitly specify the minimum version requirement for openpyxl in both requirements.txt and pyproject.toml to ensure dependency compatibility
…me processing logic - Add original table name field to preserve table names before backend processing and display them in the frontend - Refactor table name processing logic, centralizing it into the table_names.py module - Update frontend interface to display original table names and source information
…mbed to support interactive features - Remove static SVG caching and rendering logic, use vega-embed instead to enable chart interactivity - Delete no longer needed PNG export and Vega editor open features - Simplify component state management, focusing on interactive chart display
Add a unified diagnostic information builder AgentDiagnostics for all agent pipelines to centrally manage the returned JSON structure, ensuring a single schema definition is used across frontend and backend. Refactored the diagnostic information generation logic for DataRecAgent , DataTransformationAgent , and DataLoadAgent , removed duplicate code, and added diagnostic support for DataLoadAgent . Also added relevant unit tests to verify functionality.
…to avoid redundant computation Change system_prompt from a local variable to an instance variable self.system_prompt , avoiding repeated string concatenation in both the constructor and run method, improving code reusability
…taTransformationAgent - Introduced ensure_output_variable_in_code function to automatically append missing output variables in generated code. - Updated logging to provide clearer diagnostics on whether the output variable was patched or not. - Modified AgentDiagnostics to include a new code_patched field for better tracking of code modifications.
…utput variable detection - Updated supplement_missing_block function to request only the missing JSON or code piece, improving success rates for smaller models. - Enhanced ensure_output_variable_in_code function to provide a deterministic local fix for output variable assignment, optimizing performance before sandbox execution.
…ement, and reliability This update consolidates the main 0.7 work after 48a2b11, covering data ingestion, table management, agent robustness, frontend UX, internationalization, server-side model management, and broader automated test coverage. It improves file parsing and encoding support, adds safer filename and metadata handling, strengthens derive/refine error recovery and diagnostics, upgrades visualization and upload interactions, introduces Chinese/English language support across the app, and enables globally managed server-side model configuration with better security boundaries. In addition, this range significantly expands both frontend and backend tests to protect key workflows such as Excel parsing, Unicode table names, multimodal fallback behavior, JSON serialization, global model APIs, and rendering safety. Overall, the changes move 0.7 from a set of isolated feature additions into a more complete, stable, and deployment-ready release
… language instructions - Add multilingual prompt message "Maximum exploration steps reached" for exploration feature - Change data agent's recommended sub-agent language instruction from full mode to concise mode - Fix status display issue in SimpleChartRecBox when maximum steps are reached - Fix language issue when user clarification is needed
Added 13 new language supports including Japanese, Korean, French, German, etc., and supplemented special rules for Japanese
Avoid unnecessary chart insight requests when the auto chart insights configuration is turned off
…resh to be visible when generating reports Add capturedImages cache to resolve React 18 batch rendering issues During report generation, temporarily store captured chart images in the capturedImages object, and update them to Redux state in a batch at the end, ensuring React 18 can process these updates in batches
### Detailed Changes 1. Internationalization Enhancements - Added multilingual prompt message "Maximum exploration steps reached" for exploration feature - Changed data agent's recommended sub-agent language instruction from full mode to concise mode - Added 13 new language supports including Japanese, Korean, French, German, etc. - Supplemented special rules for Japanese - Fixed language issue when user clarification is needed - Fixed status display issue in SimpleChartRecBox when maximum steps are reached 2. Performance Optimization - Avoid unnecessary chart insight requests when the auto chart insights configuration is turned off 3. Bug Fixes - Fixed issue where charts sometimes require browser refresh to be visible when generating reports - Added capturedImages cache to resolve React 18 batch rendering issues - During report generation, temporarily store captured chart images in the capturedImages object, and update them to Redux state in a batch at the end, ensuring React 18 can process these updates properly
Add error message sanitization across multiple routes to prevent sensitive information leakage Remove duplicate custom sanitization logic and unify usage of the new sanitize_error_message function Replace json.dumps with jsonify to maintain consistent response formatting
- Remove package-lock.json in favor of yarn.lock - Update yarn.lock with latest dependency resolutions
Prevent npm lock file from being committed to version control as we use yarn
chore: migrate to yarn and enhance security error handling
| result = {'status': 'error'} | ||
|
|
||
| return json.dumps(result) | ||
| return jsonify(result) |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix this category of problem you should avoid returning exception messages or stack traces (even “sanitized” ones) directly to clients. Instead, log the full exception on the server for debugging, and send only a generic, pre-defined error message, optionally with a simple status or code that does not contain implementation details.
For this specific case, the problematic flow is in test_model:
except Exception as e:
logger.warning(f"Error testing model {content['model'].get('id', '')}: {e}")
is_global = content['model'].get('is_global', False)
result = {
"model": content['model'],
"status": 'error',
"message": "Connection failed, please check server configuration" if is_global
else sanitize_model_error(str(e)),
}The best fix with minimal functional change is:
- Keep logging the error server-side (possibly upgrade to
logger.exceptionso the stack trace is recorded in logs). - Stop passing
str(e)throughsanitize_model_errorto the client. - Replace the exception-derived user message with a generic message that does not depend on
e. We can still differentiate global vs non-global if needed, but both should use safe, static text.
We do not need to change sanitize_error_message or the alias sanitize_model_error themselves; they may be used safely elsewhere. We only need to change the construction of result in the except block of test_model in py-src/data_formulator/agent_routes.py. No new imports or helpers are required.
| @@ -234,13 +234,15 @@ | ||
| "message": "" | ||
| } | ||
| except Exception as e: | ||
| logger.warning(f"Error testing model {content['model'].get('id', '')}: {e}") | ||
| # Log full details server-side, but return only a generic message to the client. | ||
| logger.exception(f"Error testing model {content['model'].get('id', '')}: {e}") | ||
| is_global = content['model'].get('is_global', False) | ||
| result = { | ||
| "model": content['model'], | ||
| "status": 'error', | ||
| "message": "Connection failed, please check server configuration" if is_global | ||
| else sanitize_model_error(str(e)), | ||
| "message": "Connection failed, please check server configuration" | ||
| if is_global | ||
| else "Model test failed due to an internal error.", | ||
| } | ||
| else: | ||
| result = {'status': 'error'} |
| except Exception as e: | ||
| logger.error(f"Failed to open workspace: {e}") | ||
| return jsonify(status="error", message=str(e)), 500 | ||
| return jsonify(status="error", message=sanitize_error_message(str(e))), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix this kind of issue you should avoid returning exception details to the client. Instead, log the full exception (including stack trace) on the server and send back a generic, non-sensitive error message such as “An internal error has occurred” or something similarly high-level. If you want to expose some context to the client, it should be a controlled, static message, not derived from Exception text.
For this specific code, the simplest, non-breaking change is to adjust the open_workspace route’s except block. We should keep logging the rich error detail on the server, but stop using sanitize_error_message(str(e)) in the HTTP response. Instead, return a fixed, generic error string. Because other endpoints already use a more structured sanitize_db_error_message pattern, this endpoint can simply respond with a generic message like "Failed to open workspace" or "An internal server error occurred while opening the workspace." without affecting upstream logic (clients are already checking status and possibly message as a human-readable string). We do not need to modify sanitize_error_message itself for this finding.
Concretely:
- In
py-src/data_formulator/tables_routes.py, insideopen_workspace, update theexceptblock at lines 179–181:- Keep
logger.error(f"Failed to open workspace: {e}")so developers see the details. - Change the
return jsonify(status="error", message=sanitize_error_message(str(e))), 500call to instead return a fixed string, e.g.message="Failed to open workspace".
- Keep
- No additional imports or helper methods are required; we are only changing the error message content.
| @@ -178,7 +178,7 @@ | ||
| return jsonify(status="ok", path=home_path) | ||
| except Exception as e: | ||
| logger.error(f"Failed to open workspace: {e}") | ||
| return jsonify(status="error", message=sanitize_error_message(str(e))), 500 | ||
| return jsonify(status="error", message="Failed to open workspace"), 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/list-tables', methods=['GET']) |
| df = pd.DataFrame(json.loads(raw_data)) | ||
| except Exception as e: | ||
| return jsonify({"status": "error", "message": f"Invalid JSON data: {str(e)}, it must be a list of dictionaries"}), 400 | ||
| return jsonify({"status": "error", "message": f"Invalid JSON data: {sanitize_error_message(str(e))}, it must be a list of dictionaries"}), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix information exposure via exceptions, you should avoid returning raw (or semi-sanitized) exception messages to clients. Instead, log the detailed exception, including stack trace, on the server, and send back a generic, stable error message that does not depend on str(e) or any other implementation detail. If you wish to surface some context (e.g., “invalid JSON”), use a static message or one derived from validated input, not from the exception object.
For this specific case in create_table in py-src/data_formulator/tables_routes.py, the best minimal fix is:
- In the
except Exception as e:block aroundjson.loads(raw_data), stop embeddingsanitize_error_message(str(e))into the client-facing message. - Instead:
- Log the full exception and stack trace on the server using the module logger, e.g.
logger.exception(...)orlogger.error(..., exc_info=True). - Return a generic error string such as
"Invalid JSON data, it must be a list of dictionaries", without includingstr(e)at all.
- Log the full exception and stack trace on the server using the module logger, e.g.
- This preserves existing functionality (the route still returns a 400 error indicating invalid JSON) while removing any dependence on attacker-influenced exception text.
No changes are needed to sanitize_error_message itself for this fix.
Concretely:
- Edit the
except Exception as e:block around lines 464–467 increate_tableto:- Add a logging call to record the exception.
- Replace the existing
jsonify({"status": "error", "message": f"...{sanitize_error_message(str(e))}..."})with a variant that does not referenceeorsanitize_error_message.
No additional imports or new helper methods are required; you can reuse the existing logger instance.
| @@ -464,7 +464,11 @@ | ||
| try: | ||
| df = pd.DataFrame(json.loads(raw_data)) | ||
| except Exception as e: | ||
| return jsonify({"status": "error", "message": f"Invalid JSON data: {sanitize_error_message(str(e))}, it must be a list of dictionaries"}), 400 | ||
| logger.exception("Failed to parse raw_data as JSON when creating table.") | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": "Invalid JSON data, it must be a list of dictionaries", | ||
| }), 400 | ||
| workspace.write_parquet(df, sanitized_table_name) | ||
| row_count = len(df) | ||
| columns = list(df.columns) |
|
|
||
| except Exception as e: | ||
| logger.error("Error parsing file", exc_info=True) | ||
| return jsonify({"status": "error", "message": sanitize_error_message(str(e))}), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
General approach: avoid sending any content derived from the exception object back to the client. Instead, log the exception (with stack trace) on the server and return a generic, user-friendly error message that does not depend on e. The sanitizer can still be used elsewhere if needed, but for this endpoint we should not expose the parsed exception text.
Concrete best fix here: in parse_file’s except block (lines 540–542 in py-src/data_formulator/tables_routes.py), keep the logging statement as is (it already logs with exc_info=True), but replace the JSON response so that "message" is a fixed, generic string such as "Failed to parse file", independent of e. This change preserves existing functionality (client still receives a 400 with an error status) while eliminating any residual risk of leaking stack-trace or internal details via the exception string.
Required changes:
- File:
py-src/data_formulator/tables_routes.py- In the
parse_filefunction’sexcept Exception as e:block, update the return statement on line 542 to use a static message instead ofsanitize_error_message(str(e)).
- In the
- No changes are required in
py-src/data_formulator/sanitize.pyfor this specific issue. - No new imports or helper methods are needed.
| @@ -539,7 +539,7 @@ | ||
|
|
||
| except Exception as e: | ||
| logger.error("Error parsing file", exc_info=True) | ||
| return jsonify({"status": "error", "message": sanitize_error_message(str(e))}), 400 | ||
| return jsonify({"status": "error", "message": "Failed to parse file"}), 400 | ||
|
|
||
|
|
||
| @tables_bp.route('/sync-table-data', methods=['POST']) |
Add openpyxl and xlrd packages for Excel file read/write functionality, and add pytest as dev dependency to support testing
Update i18next from ^25.8.13 to pinned version 25.8.19 to ensure dependency consistency
build: update i18next dependency to pinned version 25.8.19 Update i18next from ^25.8.13 to pinned version 25.8.19 to ensure dependency consistency
This pull request introduces comprehensive Docker support for Data Formulator, improves developer experience, and updates documentation and configuration for easier deployment and internationalization. The main changes include adding Docker and Docker Compose files, updating documentation to guide users on Docker usage, enhancing environment and ignore files for containerization, and adding i18n-related dependencies. There are also updates to default model configurations and improvements to the README for clarity and developer onboarding.
Dockerization & Deployment:
Dockerfileanddocker-compose.ymlfor building and running Data Formulator with persistent workspace storage and health checks. [1] [2].dockerignorefile to optimize Docker build context and exclude unnecessary files.Documentation & Developer Experience:
DEVELOPMENT.mdand updatedREADME.mdwith detailed Docker usage instructions, quickstart guides, and clarified installation options. Also improved developer onboarding messaging. [1] [2] [3] [4].vscode/settings.jsonto streamline Python development and common terminal tasks.Configuration & Environment:
.env.templatewith new options for logging, data directory, and UI languages. Updated default LLM model lists for OpenAI, Azure, and Ollama providers. [1] [2] [3]Frontend Internationalization:
i18next,i18next-browser-languagedetector, andreact-i18nextto dependencies inpackage.jsonto support future UI localization. [1] [2]package.jsonscripts to includevitestfor testing.This pull request introduces Docker support for the Data Formulator project, making it easier to run the application without local Python or Node.js setup. It also refactors CORS handling for better security and configuration, and includes several improvements and fixes to agent logic and logging. The changes span new Docker-related files, Python backend adjustments, and updates to agent code for more accurate metadata and logging.Dockerization and Development Environment:
Dockerfilewith a multi-stage build to bundle the frontend and backend, and adocker-compose.ymlfor easy orchestration and persistent workspace data..dockerignoreis included to optimize builds. [1] [2] [3]DEVELOPMENT.mdwith Docker usage instructions and caveats about sandboxing in containerized environments.Backend API and CORS Handling:
@after_requesthandler inagent_routes.py, removing duplicated and insecureAccess-Control-Allow-Origin: *headers from individual endpoints. Now, CORS is controlled via theCORS_ORIGINenvironment variable. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Agent Logic and Metadata Improvements:
agent_data_rec.pyandagent_data_transform.pyto include LLM token usage and clarify timing breakdowns. [1] [2]Other Improvements:
These changes collectively improve deployment flexibility, security, and developer experience, while also enhancing the correctness and observability of agent operations.